Crawl Settings & Configuration

12 min read
Updated Jan 25, 2026
Version 1.0+
Intermediate
Quick Answer

Configure crawl depth, speed limits, user agents, and exclusions to customize how NitroShock audits your website.

When running site audits in NitroShock, you need more than just a "start crawl" button. Configure crawl depth, speed limits, user agents, and exclusions to customize how NitroShock audits your website—ensuring you get accurate data while respecting your server resources and avoiding unnecessary credit usage on irrelevant pages.

Proper crawl configuration prevents common issues like overwhelming your server, auditing duplicate content, or wasting credits on admin pages and PDFs. Whether you're auditing a small business site with 50 pages or an enterprise e-commerce platform with thousands of products, these settings give you precise control over what gets crawled and how.

This guide covers the essential crawl settings available in NitroShock's Site Audit feature and how to configure them for different scenarios.

Crawl Depth

Crawl depth determines how far NitroShock will follow links from your starting URL. Think of it as the number of "clicks away" from your entry point the crawler will go.

Understanding Depth Levels

A depth of 0 means only the starting URL gets audited—no links are followed. This is useful when you want to audit a single specific page without burning credits on the entire site.

A depth of 1 audits your starting page plus all pages directly linked from it. For example, if you start at your homepage, depth 1 would include your homepage and all pages in your main navigation.

A depth of 2 goes one level deeper—auditing your starting page, all directly linked pages, and all pages linked from those pages. This typically covers most small to medium websites completely.

Higher depths (3-5) are necessary for larger sites with deep hierarchies, but they exponentially increase the number of pages crawled. A site with 50 links per page could theoretically crawl 50 pages at depth 1, but 2,500 pages at depth 2.

Choosing the Right Depth

For most projects, start with depth 2 as your baseline. This captures the majority of important pages on standard websites without crawling your entire site map every time.

Use depth 0 or 1 when:

  • You're testing specific pages after making changes
  • You want to audit only your main navigation pages
  • You're monitoring a landing page or campaign page
  • You need quick results without using many credits

Use depth 3 or higher when:

  • Auditing large e-commerce sites with category hierarchies
  • Working with news or blog sites with extensive archives
  • You need comprehensive coverage of enterprise websites
  • Running quarterly deep-dive audits (rather than regular monitoring)

 

Setting Crawl Depth

To configure crawl depth:

  1. Navigate to your project → Site Audit tab
  2. Click Configure Crawl before starting a new audit
  3. Locate the Crawl Depth setting
  4. Enter your desired depth (0-5)
  5. Review the estimated page count based on your selection
  6. Confirm the credit cost before proceeding

Remember that each page audited uses credits. Starting with a conservative depth helps you avoid unexpected credit usage on your first crawl.

Speed Limits

Speed limits control how aggressively NitroShock crawls your website. While faster crawls complete quickly, they can strain your server resources and trigger security protections.

Requests Per Second

The requests per second setting determines how many pages NitroShock attempts to fetch simultaneously. The available range typically runs from 1 request per second (very slow and polite) to 10+ requests per second (fast but potentially aggressive).

At 1-2 requests per second, the crawler acts extremely conservatively. This is appropriate for:

  • Shared hosting environments with limited resources
  • Sites with aggressive DDoS protection
  • Development or staging servers
  • Sites that have previously blocked crawlers

At 3-5 requests per second, you achieve moderate speed without overwhelming most servers. This works well for:

  • Standard business websites on quality hosting
  • WordPress sites with caching enabled
  • Sites on VPS or dedicated servers
  • Regular monitoring audits

At 6-10 requests per second, crawls complete quickly but demand significant server resources. Consider this for:

  • Large sites where crawl time matters
  • Enterprise infrastructure built to handle traffic spikes
  • Internal testing environments
  • One-time comprehensive audits where speed is priority

 

Timeout Settings

Request timeout determines how long NitroShock waits for a page to respond before giving up. The standard timeout is 30 seconds, but you can adjust this based on your site's performance.

Increase timeout (45-60 seconds) for:

  • Sites with slow server response times
  • Pages with heavy JavaScript that takes time to render
  • International sites where network latency is a factor
  • Database-heavy pages that generate content dynamically

Decrease timeout (15-20 seconds) for:

  • Fast, well-optimized sites
  • Static sites or sites with aggressive caching
  • When you want to identify slow-loading pages as issues

 

Rate Limiting Best Practices

Start conservatively with your first audit of any site. Use 2-3 requests per second and monitor how your server handles the load. Check your server logs or monitoring tools during the crawl.

If your server handles the initial crawl easily (CPU and memory stay in normal ranges), you can increase speed for subsequent audits. If you notice performance degradation or receive alerts, reduce the crawl speed.

For WordPress sites with caching plugins like WP Rocket or W3 Total Cache, you can typically use moderate to fast speeds (4-6 requests per second) since cached pages serve quickly without database queries.

If NitroShock encounters multiple timeouts or errors during a crawl, it automatically reduces speed to prevent overwhelming your server.

User Agents

The user agent string identifies NitroShock's crawler to your web server. Proper user agent configuration ensures you get accurate audit results that reflect real user experiences.

Default User Agent

NitroShock uses a clearly identified crawler user agent by default: NitroShock-Bot/1.0. This identifies the crawler in your server logs and allows you to:

  • Monitor crawl activity in analytics
  • Create specific server rules for the crawler
  • Differentiate audit traffic from real visitors
  • Comply with web standards for crawler identification

 

Desktop vs. Mobile User Agents

You can configure NitroShock to crawl using either desktop or mobile user agents, which is crucial since many sites serve different content or styling based on device type.

Desktop user agent crawling:

  • Simulates a standard desktop browser (Chrome, Firefox, or Safari)
  • Appropriate for desktop-focused business sites
  • Shows desktop-specific layout and SEO elements
  • Matches desktop search engine crawlers

Mobile user agent crawling:

  • Simulates mobile browsers (typically mobile Chrome or Safari)
  • Essential for mobile-first or responsive sites
  • Reflects Google's mobile-first indexing approach
  • Reveals mobile-specific issues hidden on desktop

 

Choosing the Right User Agent

For most modern websites, run mobile user agent audits as your primary monitoring tool. Google predominantly uses mobile crawlers for indexing, so your mobile experience matters most for SEO.

Run periodic desktop audits to ensure desktop users still get a quality experience, especially if your analytics show significant desktop traffic.

For specialized scenarios:

  • E-commerce sites: Crawl both mobile and desktop, as shopping behavior differs significantly
  • B2B websites: May prioritize desktop if analytics show business users on desktop
  • Local service sites: Focus on mobile, as most local searches happen on phones
  • News or blog sites: Mobile-first, reflecting how most readers consume content

 

Custom User Agents

Advanced users can set custom user agent strings to:

  • Test how their site responds to specific browsers
  • Simulate particular devices for testing
  • Match their CDN or caching rules for specific user agents
  • Debug user agent-specific issues

To set a custom user agent:

  1. Navigate to your project → Site Audit tab
  2. Click Configure Crawl
  3. Select Custom User Agent from the dropdown
  4. Enter your custom user agent string
  5. Save and run your audit

Exclusions

Exclusion rules prevent NitroShock from crawling and auditing pages that don't need analysis, saving you credits and focusing results on pages that matter.

URL Pattern Exclusions

URL pattern exclusions use pattern matching to skip entire categories of pages. Common patterns to exclude include:

Admin and system pages:

/wp-admin/*
/wp-login.php
/admin/*
/dashboard/*

User account pages:

/account/*
/login/*
/register/*
/checkout/*

Non-HTML resources:

*.pdf
*.jpg
*.png
*.zip
*.doc

Pagination and filters:

?page=
?sort=
/page/[2-9]/
/page/[0-9][0-9]/

Query Parameter Exclusions

Query parameters often create duplicate content variations that waste crawl budget. Common parameters to exclude:

  • utm_source, utm_medium, utm_campaign - Tracking parameters
  • fbclid, gclid - Platform click IDs
  • sessionid, PHPSESSID - Session identifiers
  • sort, filter, color, size - E-commerce filters
  • page, offset, limit - Pagination parameters

Configure parameter exclusions to treat URLs as identical regardless of these parameters. For example, excluding utm_source means:

  • example.com/product and
  • example.com/product?utm_source=facebook

are treated as the same page, and only one gets crawled.

Robots.txt Respect

NitroShock can either respect or ignore your robots.txt file during crawls.

Respecting robots.txt (default):

  • Follows your site's crawler directives
  • Skips pages marked as disallowed
  • Matches how search engines crawl your site
  • Appropriate for production sites

Ignoring robots.txt:

  • Crawls everything regardless of robots.txt rules
  • Useful for staging sites with overly restrictive robots.txt
  • Helps audit pages accidentally blocked from search engines
  • Identifies discrepancies between your intentions and robots.txt rules

 

If you're unsure why certain pages aren't being crawled, try running one audit with robots.txt ignored to see if it's blocking the crawler.

 

Creating Exclusion Rules

To configure exclusions:

  1. Navigate to your project → Site Audit tab
  2. Click Configure Crawl
  3. Scroll to the Exclusions section
  4. Add URL patterns in the Exclude URLs matching field
  5. Add query parameters in the Ignore parameters field
  6. Toggle Respect robots.txt as needed
  7. Save your configuration

Exclusion rules persist across audits, so you only need to configure them once per project unless your site structure changes.

Exclusion Strategy by Site Type

Small business sites (10-100 pages):

  • Minimal exclusions needed
  • Exclude only admin pages and media files
  • Let everything else crawl for comprehensive coverage

E-commerce sites:

  • Exclude filter combinations and sort variations
  • Keep category and product pages
  • Exclude user account and checkout flows
  • Consider excluding old/discontinued product pages

News or blog sites:

  • Exclude deep pagination (beyond page 3-5)
  • Keep recent articles and important archive pages
  • Exclude author archives if they duplicate content
  • Exclude tag pages with minimal content

Enterprise or large sites:

  • Develop comprehensive exclusion rules to focus on priority pages
  • Exclude test environments and staging areas
  • Exclude duplicate content variations
  • Consider separate audits for different site sections

 

Advanced Settings

Beyond basic crawl configuration, NitroShock offers advanced settings for specialized audit scenarios and precise control over crawler behavior.

JavaScript Rendering

Modern websites often rely heavily on JavaScript to generate content. The JavaScript rendering setting determines whether NitroShock executes JavaScript before analyzing pages.

JavaScript rendering disabled (faster, uses fewer credits):

  • Analyzes raw HTML only
  • Appropriate for server-rendered sites
  • Faster crawl completion
  • May miss content loaded via JavaScript

JavaScript rendering enabled (slower, uses more credits):

  • Executes JavaScript like a real browser
  • Captures dynamically loaded content
  • Reflects actual user and search engine experience
  • Necessary for React, Vue, Angular, or other JavaScript frameworks

Enable JavaScript rendering if:

  • Your site uses client-side rendering frameworks
  • Important content loads via AJAX or fetch requests
  • You need to audit single-page applications
  • Previous audits showed missing content

 

Cookie Handling

Some sites display different content based on cookie consent or user preferences. Configure how NitroShock handles cookies:

Accept all cookies:

  • Simulates a user who accepts all cookie consents
  • Shows complete site functionality
  • Appropriate when cookie banners hide content

Reject all cookies:

  • Simulates strict privacy settings
  • Shows minimum functionality experience
  • Tests if essential content appears without consent

Custom cookie values:

  • Set specific cookie values for testing
  • Useful for bypassing age gates or region checks
  • Can simulate logged-in states for permitted testing

 

Custom Headers

Add custom HTTP headers to crawl requests for specialized scenarios:

  • Authorization headers - Audit password-protected staging sites
  • API keys - Include authentication for headless CMS systems
  • Custom cache headers - Test CDN behavior with specific cache rules
  • Accept-Language headers - Audit specific language versions

To add custom headers:

  1. Navigate to Site AuditConfigure Crawl
  2. Expand Advanced Settings
  3. Click Add Custom Header
  4. Enter header name and value
  5. Add multiple headers as needed

Follow External Links

By default, NitroShock only crawls internal links within your domain. The follow external links setting changes this behavior:

Disabled (default):

  • Crawls only your domain
  • Keeps audits focused and efficient
  • Prevents credit usage on external sites

Enabled:

  • Follows and checks external links
  • Validates outbound links aren't broken
  • Useful for affiliate sites or resource directories
  • Significantly increases pages crawled

 

Sitemap Integration

Rather than discovering pages through link crawling, you can direct NitroShock to use your XML sitemap:

Sitemap-based crawling:

  • Crawls pages listed in your sitemap
  • Faster than discovery-based crawling
  • Ensures all important pages get audited
  • Matches search engine priority signals

Combined approach:

  • Uses sitemap as starting point
  • Also follows links to discover unlisted pages
  • Identifies pages missing from your sitemap
  • Most comprehensive audit method

Configure sitemap settings:

  1. Navigate to Site AuditConfigure Crawl
  2. Enable Use sitemap
  3. Enter your sitemap URL (usually /sitemap.xml)
  4. Choose whether to also follow links from sitemap pages
  5. Run your audit

Schedule and Automation

Set up automated crawls to monitor your site continuously without manual intervention:

One-time crawl:

  • Runs immediately when triggered
  • Uses credits once
  • Appropriate for ad-hoc audits

Scheduled crawls:

  • Daily, weekly, or monthly schedules
  • Automatically uses credits per schedule
  • Monitors site health over time
  • Generates trend data for improvement tracking

Configure scheduled audits:

  1. Navigate to your project → Site Audit tab
  2. Click Schedule Crawl
  3. Select frequency (daily, weekly, monthly)
  4. Choose time of day (typically off-peak hours)
  5. Configure notification preferences
  6. Confirm credit authorization for recurring use

Scheduled crawls automatically use the same configuration settings each time. Update your crawl configuration to affect future scheduled crawls.

Common Questions

How many pages will my crawl audit?

The exact number depends on your crawl depth, exclusion rules, and site structure. Before confirming any audit, NitroShock estimates the page count based on your settings and shows the expected credit cost. Start with conservative settings (depth 2, moderate exclusions) for your first crawl, then adjust based on results.

Why isn't NitroShock crawling some of my pages?

Check these common causes: your robots.txt file may be blocking the crawler, your exclusion rules might be too broad, pages may not be linked from other pages within your crawl depth, or your server might be returning errors or timeouts for those pages. Run an audit with robots.txt ignored and minimal exclusions to diagnose the issue.

Can I crawl a staging or password-protected site?

Yes, use custom headers to include authentication credentials. Navigate to Site AuditConfigure CrawlAdvanced Settings and add an Authorization header with your credentials. For HTTP basic authentication, use the format Basic [base64-encoded-credentials]. Alternatively, if your staging site uses IP whitelisting, contact NitroShock support to whitelist the crawler IP addresses.

Should I run mobile or desktop audits?

Run mobile audits as your primary monitoring tool since Google uses mobile-

Was this article helpful?