Crawl Settings & Configuration

12 min read

Updated Jan 25, 2026

Version 1.0+

Intermediate

Quick Answer

Configure crawl depth, speed limits, user agents, and exclusions to customize how NitroShock audits your website.

When running site audits in NitroShock, you need more than just a "start crawl" button. Configure crawl depth, speed limits, user agents, and exclusions to customize how NitroShock audits your website—ensuring you get accurate data while respecting your server resources and avoiding unnecessary credit usage on irrelevant pages.

Proper crawl configuration prevents common issues like overwhelming your server, auditing duplicate content, or wasting credits on admin pages and PDFs. Whether you're auditing a small business site with 50 pages or an enterprise e-commerce platform with thousands of products, these settings give you precise control over what gets crawled and how.

This guide covers the essential crawl settings available in NitroShock's Site Audit feature and how to configure them for different scenarios.

Crawl Depth

Crawl depth determines how far NitroShock will follow links from your starting URL. Think of it as the number of "clicks away" from your entry point the crawler will go.

Understanding Depth Levels

A depth of 0 means only the starting URL gets audited—no links are followed. This is useful when you want to audit a single specific page without burning credits on the entire site.

A depth of 1 audits your starting page plus all pages directly linked from it. For example, if you start at your homepage, depth 1 would include your homepage and all pages in your main navigation.

A depth of 2 goes one level deeper—auditing your starting page, all directly linked pages, and all pages linked from those pages. This typically covers most small to medium websites completely.

Higher depths (3-5) are necessary for larger sites with deep hierarchies, but they exponentially increase the number of pages crawled. A site with 50 links per page could theoretically crawl 50 pages at depth 1, but 2,500 pages at depth 2.

Choosing the Right Depth

For most projects, start with depth 2 as your baseline. This captures the majority of important pages on standard websites without crawling your entire site map every time.

Use depth 0 or 1 when:

You're testing specific pages after making changes
You want to audit only your main navigation pages
You're monitoring a landing page or campaign page
You need quick results without using many credits

Use depth 3 or higher when:

Auditing large e-commerce sites with category hierarchies
Working with news or blog sites with extensive archives
You need comprehensive coverage of enterprise websites
Running quarterly deep-dive audits (rather than regular monitoring)

Setting Crawl Depth

To configure crawl depth:

Navigate to your project → Site Audit tab
Click Configure Crawl before starting a new audit
Locate the Crawl Depth setting
Enter your desired depth (0-5)
Review the estimated page count based on your selection
Confirm the credit cost before proceeding

Remember that each page audited uses credits. Starting with a conservative depth helps you avoid unexpected credit usage on your first crawl.

Speed Limits

Speed limits control how aggressively NitroShock crawls your website. While faster crawls complete quickly, they can strain your server resources and trigger security protections.

Requests Per Second

The requests per second setting determines how many pages NitroShock attempts to fetch simultaneously. The available range typically runs from 1 request per second (very slow and polite) to 10+ requests per second (fast but potentially aggressive).

At 1-2 requests per second, the crawler acts extremely conservatively. This is appropriate for:

Shared hosting environments with limited resources
Sites with aggressive DDoS protection
Development or staging servers
Sites that have previously blocked crawlers

At 3-5 requests per second, you achieve moderate speed without overwhelming most servers. This works well for:

Standard business websites on quality hosting
WordPress sites with caching enabled
Sites on VPS or dedicated servers
Regular monitoring audits

At 6-10 requests per second, crawls complete quickly but demand significant server resources. Consider this for:

Large sites where crawl time matters
Enterprise infrastructure built to handle traffic spikes
Internal testing environments
One-time comprehensive audits where speed is priority

Timeout Settings

Request timeout determines how long NitroShock waits for a page to respond before giving up. The standard timeout is 30 seconds, but you can adjust this based on your site's performance.

Increase timeout (45-60 seconds) for:

Sites with slow server response times
Pages with heavy JavaScript that takes time to render
International sites where network latency is a factor
Database-heavy pages that generate content dynamically

Decrease timeout (15-20 seconds) for:

Fast, well-optimized sites
Static sites or sites with aggressive caching
When you want to identify slow-loading pages as issues

Rate Limiting Best Practices

Start conservatively with your first audit of any site. Use 2-3 requests per second and monitor how your server handles the load. Check your server logs or monitoring tools during the crawl.

If your server handles the initial crawl easily (CPU and memory stay in normal ranges), you can increase speed for subsequent audits. If you notice performance degradation or receive alerts, reduce the crawl speed.

For WordPress sites with caching plugins like WP Rocket or W3 Total Cache, you can typically use moderate to fast speeds (4-6 requests per second) since cached pages serve quickly without database queries.

If NitroShock encounters multiple timeouts or errors during a crawl, it automatically reduces speed to prevent overwhelming your server.

User Agents

The user agent string identifies NitroShock's crawler to your web server. Proper user agent configuration ensures you get accurate audit results that reflect real user experiences.

Default User Agent

NitroShock uses a clearly identified crawler user agent by default: NitroShock-Bot/1.0. This identifies the crawler in your server logs and allows you to:

Monitor crawl activity in analytics
Create specific server rules for the crawler
Differentiate audit traffic from real visitors
Comply with web standards for crawler identification

Desktop vs. Mobile User Agents

You can configure NitroShock to crawl using either desktop or mobile user agents, which is crucial since many sites serve different content or styling based on device type.

Desktop user agent crawling:

Simulates a standard desktop browser (Chrome, Firefox, or Safari)
Appropriate for desktop-focused business sites
Shows desktop-specific layout and SEO elements
Matches desktop search engine crawlers

Mobile user agent crawling:

Simulates mobile browsers (typically mobile Chrome or Safari)
Essential for mobile-first or responsive sites
Reflects Google's mobile-first indexing approach
Reveals mobile-specific issues hidden on desktop

Choosing the Right User Agent

For most modern websites, run mobile user agent audits as your primary monitoring tool. Google predominantly uses mobile crawlers for indexing, so your mobile experience matters most for SEO.

Run periodic desktop audits to ensure desktop users still get a quality experience, especially if your analytics show significant desktop traffic.

For specialized scenarios:

E-commerce sites: Crawl both mobile and desktop, as shopping behavior differs significantly
B2B websites: May prioritize desktop if analytics show business users on desktop
Local service sites: Focus on mobile, as most local searches happen on phones
News or blog sites: Mobile-first, reflecting how most readers consume content

Custom User Agents

Advanced users can set custom user agent strings to:

Test how their site responds to specific browsers
Simulate particular devices for testing
Match their CDN or caching rules for specific user agents
Debug user agent-specific issues

To set a custom user agent:

Navigate to your project → Site Audit tab
Click Configure Crawl
Select Custom User Agent from the dropdown
Enter your custom user agent string
Save and run your audit

Exclusions

Exclusion rules prevent NitroShock from crawling and auditing pages that don't need analysis, saving you credits and focusing results on pages that matter.

URL Pattern Exclusions

URL pattern exclusions use pattern matching to skip entire categories of pages. Common patterns to exclude include:

Admin and system pages:

/wp-admin/*
/wp-login.php
/admin/*
/dashboard/*

User account pages:

/account/*
/login/*
/register/*
/checkout/*

Non-HTML resources:

*.pdf
*.jpg
*.png
*.zip
*.doc

Pagination and filters:

?page=
?sort=
/page/[2-9]/
/page/[0-9][0-9]/

Query Parameter Exclusions

Query parameters often create duplicate content variations that waste crawl budget. Common parameters to exclude:

utm_source, utm_medium, utm_campaign - Tracking parameters
fbclid, gclid - Platform click IDs
sessionid, PHPSESSID - Session identifiers
sort, filter, color, size - E-commerce filters
page, offset, limit - Pagination parameters

Configure parameter exclusions to treat URLs as identical regardless of these parameters. For example, excluding utm_source means:

example.com/product and
example.com/product?utm_source=facebook

are treated as the same page, and only one gets crawled.

Robots.txt Respect

NitroShock can either respect or ignore your robots.txt file during crawls.

Respecting robots.txt (default):

Follows your site's crawler directives
Skips pages marked as disallowed
Matches how search engines crawl your site
Appropriate for production sites

Ignoring robots.txt:

Crawls everything regardless of robots.txt rules
Useful for staging sites with overly restrictive robots.txt
Helps audit pages accidentally blocked from search engines
Identifies discrepancies between your intentions and robots.txt rules

If you're unsure why certain pages aren't being crawled, try running one audit with robots.txt ignored to see if it's blocking the crawler.

Creating Exclusion Rules

To configure exclusions:

Navigate to your project → Site Audit tab
Click Configure Crawl
Scroll to the Exclusions section
Add URL patterns in the Exclude URLs matching field
Add query parameters in the Ignore parameters field
Toggle Respect robots.txt as needed
Save your configuration

Exclusion rules persist across audits, so you only need to configure them once per project unless your site structure changes.

Exclusion Strategy by Site Type

Small business sites (10-100 pages):

Minimal exclusions needed
Exclude only admin pages and media files
Let everything else crawl for comprehensive coverage

E-commerce sites:

Exclude filter combinations and sort variations
Keep category and product pages
Exclude user account and checkout flows
Consider excluding old/discontinued product pages

News or blog sites:

Exclude deep pagination (beyond page 3-5)
Keep recent articles and important archive pages
Exclude author archives if they duplicate content
Exclude tag pages with minimal content

Enterprise or large sites:

Develop comprehensive exclusion rules to focus on priority pages
Exclude test environments and staging areas
Exclude duplicate content variations
Consider separate audits for different site sections

Advanced Settings

Beyond basic crawl configuration, NitroShock offers advanced settings for specialized audit scenarios and precise control over crawler behavior.

JavaScript Rendering

Modern websites often rely heavily on JavaScript to generate content. The JavaScript rendering setting determines whether NitroShock executes JavaScript before analyzing pages.

JavaScript rendering disabled (faster, uses fewer credits):

Analyzes raw HTML only
Appropriate for server-rendered sites
Faster crawl completion
May miss content loaded via JavaScript

JavaScript rendering enabled (slower, uses more credits):

Executes JavaScript like a real browser
Captures dynamically loaded content
Reflects actual user and search engine experience
Necessary for React, Vue, Angular, or other JavaScript frameworks

Enable JavaScript rendering if:

Your site uses client-side rendering frameworks
Important content loads via AJAX or fetch requests
You need to audit single-page applications
Previous audits showed missing content

Cookie Handling

Some sites display different content based on cookie consent or user preferences. Configure how NitroShock handles cookies:

Accept all cookies:

Simulates a user who accepts all cookie consents
Shows complete site functionality
Appropriate when cookie banners hide content

Reject all cookies:

Simulates strict privacy settings
Shows minimum functionality experience
Tests if essential content appears without consent

Custom cookie values:

Set specific cookie values for testing
Useful for bypassing age gates or region checks
Can simulate logged-in states for permitted testing

Custom Headers

Add custom HTTP headers to crawl requests for specialized scenarios:

Authorization headers - Audit password-protected staging sites
API keys - Include authentication for headless CMS systems
Custom cache headers - Test CDN behavior with specific cache rules
Accept-Language headers - Audit specific language versions

To add custom headers:

Navigate to Site Audit → Configure Crawl
Expand Advanced Settings
Click Add Custom Header
Enter header name and value
Add multiple headers as needed

Follow External Links

By default, NitroShock only crawls internal links within your domain. The follow external links setting changes this behavior:

Disabled (default):

Crawls only your domain
Keeps audits focused and efficient
Prevents credit usage on external sites

Enabled:

Follows and checks external links
Validates outbound links aren't broken
Useful for affiliate sites or resource directories
Significantly increases pages crawled

Sitemap Integration

Rather than discovering pages through link crawling, you can direct NitroShock to use your XML sitemap:

Sitemap-based crawling:

Crawls pages listed in your sitemap
Faster than discovery-based crawling
Ensures all important pages get audited
Matches search engine priority signals

Combined approach:

Uses sitemap as starting point
Also follows links to discover unlisted pages
Identifies pages missing from your sitemap
Most comprehensive audit method

Configure sitemap settings:

Navigate to Site Audit → Configure Crawl
Enable Use sitemap
Enter your sitemap URL (usually /sitemap.xml)
Choose whether to also follow links from sitemap pages
Run your audit

Schedule and Automation

Set up automated crawls to monitor your site continuously without manual intervention:

One-time crawl:

Runs immediately when triggered
Uses credits once
Appropriate for ad-hoc audits

Scheduled crawls:

Daily, weekly, or monthly schedules
Automatically uses credits per schedule
Monitors site health over time
Generates trend data for improvement tracking

Configure scheduled audits:

Navigate to your project → Site Audit tab
Click Schedule Crawl
Select frequency (daily, weekly, monthly)
Choose time of day (typically off-peak hours)
Configure notification preferences
Confirm credit authorization for recurring use

Scheduled crawls automatically use the same configuration settings each time. Update your crawl configuration to affect future scheduled crawls.

Common Questions

How many pages will my crawl audit?

The exact number depends on your crawl depth, exclusion rules, and site structure. Before confirming any audit, NitroShock estimates the page count based on your settings and shows the expected credit cost. Start with conservative settings (depth 2, moderate exclusions) for your first crawl, then adjust based on results.

Why isn't NitroShock crawling some of my pages?

Check these common causes: your robots.txt file may be blocking the crawler, your exclusion rules might be too broad, pages may not be linked from other pages within your crawl depth, or your server might be returning errors or timeouts for those pages. Run an audit with robots.txt ignored and minimal exclusions to diagnose the issue.

Can I crawl a staging or password-protected site?

Yes, use custom headers to include authentication credentials. Navigate to Site Audit → Configure Crawl → Advanced Settings and add an Authorization header with your credentials. For HTTP basic authentication, use the format Basic [base64-encoded-credentials]. Alternatively, if your staging site uses IP whitelisting, contact NitroShock support to whitelist the crawler IP addresses.

Should I run mobile or desktop audits?

Run mobile audits as your primary monitoring tool since Google uses mobile-