Google Crawl Errors: Diagnosis & Fix Guide

On this page

Why crawl errors are the silent indexation killer Crawl error types, root causes, and first-action fix Pre-fix sanity checks (skip these at your own risk)Step-by-step resolution: 404 and soft 404 errors Error triage workflow Worked example: cleaning 1,200 soft 404s from an ecommerce site DNS and server errors: the infrastructure traps FAQ

Field notes

Why crawl errors are the silent indexation killer

Googlebot is polite but not patient. When it hits a 404, a soft 404, a DNS timeout, or a 5xx server error on a URL it expects to find, it doesn't just move on. It flags that URL as problematic. enough flags, and the entire section or domain loses crawl budget and trust. The core bottleneck is not error volume. It is the pattern behind the errors. A single misconfigured regex in your robots.txt can orphan 50,000 product pages. A cheap DNS provider that times out at peak hours can kill 600 URLs in one crawl cycle. We treat each error type as a distinct failure mode, and we fix them differently.

In practice, when you open Google Search Console's 'Pages' report, the raw numbers lie. You might see 1,200 '404 not found' entries, but only 80 of those are actual broken internal links. The rest are often crawled URL parameters, pagination copies, or old AMP variants. A common situation we see: a site mails a newsletter with a tracking parameter appended to a URL that was already 301-redirected. GSC logs that as a 'crawl error' on the final URL. The real fix is not a redirect change; it is removing the bad newsletter URLs from the sitemap and blocking the parameter in GSC. That is the kind of nuance this guide exists for.

Data table

Crawl error types, root causes, and first-action fix

Error Type	Likely Root Cause	First Action (within 1 hour)	Hidden Failure Mode
404 (Not Found) Internal link points to deleted page	Content removed without redirect; old sitemap entries; broken navigation links	Run a Screaming Frog crawl on the 404 list. For high-value URLs: place a 301 redirect to the closest topical equivalent. For low-value URLs: let them 404 but remove from sitemap.	Redirecting all 404s to the homepage causes soft-404 signals. Google sees a mismatch between the requested content and the landing page. Only redirect when the replacement page covers the same topic.
Soft 404 Page returns 200 but content is empty or thin	Search results with no results; category pages with zero products; paginated pages with only one item; login-walled pages returning 200 to Googlebot	Check the actual HTTP response header for the URL. If content is truly empty, change the response to 404 or 410. If it is a search page, add noindex and remove from sitemap.	Ecommerce sites often have 50,000+ soft 404s from 'no results' search pages. Fixing the template to return 404 instead of 200 can instantly clear 90% of these errors. But watch out: if Google expects those URLs from a sitemap, the 404 will create new errors. Remove them from the sitemap first.
DNS error Googlebot could not resolve the hostname	DNS provider outage; misconfigured A/AAAA records; TTL too high (caching stale records); CDN origin IP changed	Check DNS propagation with a tool like DNSChecker. Verify your nameserver is responding: `dig example.com NS`. If using a CDN, check the origin server IP is correct in your CDN dashboard.	A single DNS failure during a Googlebot crawl cycle can cause up to 15% crawl drop for the next 24 hours, even after the DNS is fixed. Reason: Google caches DNS failures for a few hours. To accelerate recovery, resubmit the sitemap via GSC after the fix.
Server error (5xx) Googlebot received 500, 502, 503	Web server overload; PHP worker pool exhaustion; database connection pool starvation; WAF blocking Googlebot IP range	Check server logs for the exact error code at the time of the crawl. If 503: likely traffic spike or rate limiting. Add Googlebot IP ranges to a whitelist in your WAF. If 502: backend service (e.g., PHP-FPM, Node, Gunicorn) crashed. Restart the service.	The most dangerous server error pattern is intermittent 503s. They do not show up as a massive spike in GSC but quietly reduce crawl frequency over weeks. Set up a cron job to hit your own URLs every 5 minutes and log the HTTP status. If you see 503s at 2% rate, you have a problem.

Pre-fix sanity checks (skip these at your own risk)

1

Verify the error URL is actually requested by Googlebot. Use the URL Inspection tool in GSC. If the error is from a redirected URL, the real problem is upstream.

2

Check if the URL has been canonicalized elsewhere. A soft 404 often hides behind a self-canonical pointing to a different URL.

3

Look at the referring page. Is the broken link in a footer, a blogroll, or a dynamically generated breadcrumb? Fix the source, not just the destination.

4

For DNS and server errors: are they global or isolated to your Googlebot crawl? Check your CDN logs for the specific user-agent. If normal traffic works but Googlebot gets errors, your WAF or rate-limiter is blocking it.

Step-by-step resolution: 404 and soft 404 errors

Export the list of URLs with 404 errors from GSC. Use the 'Pages' report, filter by 'Not found (404)', and download the CSV. Do not use the 'Crawl errors' legacy report; it is deprecated and less accurate.
Run the CSV through a bulk checker to verify the current status. Many URLs listed as 404 will now return 200. Those are false positives. Remove them from your list. For bulk verification of 100,000+ URLs without hitting GSC rate limits, consider a tool like the one described in <a href="https://medium.com/@alexa.sam2026/mass-verification-without-gsc-how-a-bulk-google-index-checker-handles-100-000-urls-9ca89519c1d3">this bulk Google index checker workflow</a>.
For the remaining true 404s, categorize by page type: product, blog, category, etc. For each category, decide whether to redirect (high-value, topical match exists), restore (content was accidentally deleted), or let 404 (low-value, no topical match).
For soft 404s, check the HTML body of the page. If it contains 'No results found' or similar, change the HTTP response to 404 or 410. If it contains a thin but real page (e.g., a category with one product), either add more content or <code>noindex</code> it.
After fixing, resubmit the affected URLs using the GSC URL Inspection tool or a bulk submission via the Indexing API. For agencies handling massive sites, a pragmatic approach is described in <a href="https://medium.com/@alexa.sam2026/the-pragmatic-index-checker-tool-for-seo-agencies-4a92f9722c5d">this pragmatic index checker tool overview</a>.

Workflow map

Error triage workflow

1. Export from GSC

Download the full error list from the Pages report. Filter by error type. Do not use the legacy Crawl Errors report.

2. Verify current status

Run a bulk HTTP status check on the list. Remove false positives (URLs that now return 200).

3. Categorize pattern

Group by error type and page template. Look for systemic patterns: all errors are from one sitemap, one template, or one parameter.

4. Apply root fix

For 404s: redirect or restore. For soft 404s: change status code or add noindex. For DNS/5xx: fix infrastructure or whitelist Googlebot.

5. Monitor and resubmit

Resubmit fixed URLs via GSC or API. Check the report again after 7 days to confirm the error count dropped.

Worked example

Worked example: cleaning 1,200 soft 404s from an ecommerce site

We inherited a site with 1,230 'soft 404' errors in GSC. The initial reaction was to 301-redirect everything to the homepage. That would have been a disaster.

Step 1: We exported the list and ran a custom script that checked each URL's HTTP status and HTML body. Result: 610 URLs were actually returning 200 but with zero products (empty result pages). The remaining 620 were returning 200 with thin content (one product on a category page).

Step 2: We modified the ecommerce platform template. For empty result pages, we changed the HTTP status code to 410 (Gone). For thin category pages, we added a <meta name='robots' content='noindex, follow'> tag and removed them from the XML sitemap.

Step 3: We resubmitted the 610 fixed URLs through the Indexing API. 14 days later, the soft 404 count dropped to 45. Those remaining were from a third-party review page that we had to fix manually.

The key insight: 50% of the errors were from a single template. Fixing the template fixed 610 errors in one deployment.

Field notes

DNS and server errors: the infrastructure traps

DNS errors are the most dangerous because they affect entire domains, not individual URLs. A 30-minute DNS outage can cause Google to deprioritize crawling your site for days. The fix is rarely simple: check your DNS provider's SLA, configure secondary nameservers, and set TTL to 300 seconds or lower for critical records. For server errors, the most common mistake is assuming a 503 means 'server too busy'. Often, it is a WAF rule that blocks Googlebot's IP range. Check your firewall logs for the 'Googlebot' user-agent. If you see 403 or 503 for that user-agent, whitelist the entire Googlebot IP range (published by Google).

For a deeper understanding of how search engines interpret these signals, the Moz SEO learning center offers a solid foundation on crawl budget and server response codes. It is worth revisiting even for experienced practitioners.

FAQ

What is the fastest way to fix 404 crawl errors for an entire site?

Export the 404 list from GSC, run it through a bulk checker to confirm the current status, then use a regex or URL pattern to write server-level redirect rules. If the error pattern is consistent (e.g., all URLs contain '/old-blog/'), a single .htaccess or nginx rule can fix hundreds of URLs in seconds. Do not redirect all 404s to the homepage.

How do I identify soft 404 errors from search result pages in GSC?

GSC labels pages as soft 404 when they return 200 but have little or no content. The most common source is internal search result pages with no results. To find them, export the soft 404 CSV, then grep for 'search', 'query', 'q=', or 's=' in the URL. Check the HTML body for the phrase 'no results' or similar. Then fix the template to return 404 for empty results.

Can I use the Indexing API to fix crawl errors for 100,000 URLs?

Yes, but with limits. The Indexing API allows 200 URLs per call and has a daily quota (usually 200,000 per project). For bulk fixes, batch your URLs and use the API with exponential backoff. Do not use it for low-value pages; it is best for high-priority content. A practical alternative is to use a bulk index checker tool that verifies status without consuming your GSC quota, as described in related resources.

What is the difference between a 404 and a soft 404 for SEO?

A 404 tells Google the page does not exist. Google removes it from the index after a few crawls. A soft 404 is a page that returns 200 but has no useful content. Google sees a mismatch and may keep the URL in a 'crawled but not indexed' state indefinitely. Soft 404s are more dangerous because they waste crawl budget and dilute index quality. Fix them by returning 404 or improving the content.

How do I fix DNS errors that only happen during Googlebot crawls?

That is almost always a WAF or rate-limiting issue. Find Googlebot's IP ranges (published by Google) and add them to a whitelist. Also check your CDN's Web Application Firewall logs. If you use Cloudflare, ensure the 'Under Attack' mode is off. For persistent DNS issues, switch to a premium DNS provider with a 100% uptime SLA and low TTL (60-300 seconds).

What is the most common mistake when fixing server errors in GSC?

Assuming the error is on your server when it is actually on your CDN or reverse proxy. A 502 error often means your origin server is fine, but the CDN node is timing out. Check your CDN logs for the specific request that failed. Also, many server errors are intermittent and only happen during peak traffic. Set up a monitoring tool that checks your URLs every 1 minute from multiple locations.

Should I redirect all 404 errors to the homepage?

No. That is the fastest way to generate soft 404 errors. Google sees a redirect from '/obsolete-product' to '/homepage' and notices the content does not match. It flags the target URL as a soft 404. Only redirect when the target page is topically equivalent. For example, redirect '/old-iphone-case' to '/new-iphone-case', not to the homepage.

How often does Google update the crawl errors report in Search Console?

GSC refreshes the Pages report every 24-48 hours for most sites. However, the data is sampled and may lag by up to 3 days for large sites. Do not make decisions based on a single day of data. Look at the 7-day or 28-day trend. If you fix an error today, wait at least 7 days to confirm the fix in GSC.

What tools do SEO agencies use to bulk-check crawl errors without GSC?

Agencies often use custom scripts (Python + requests library) or SaaS tools that check HTTP status codes in parallel. For a workflow that handles 100,000 URLs, see the bulk Google index checker approach linked in this article. The key is to respect rate limits (10-20 requests per second per IP) and handle redirects properly. Do not follow redirects blindly; record the final status and the redirect chain length.

Next reads

Related guides

↗

Main guide

↗

Crawl vs Index: Key Differences Explained for SEO

↗

Googlebot Crawl Budget Calculator & Optimization Tips

↗

Google Crawl Log Analysis: Server Log Workflow

Budget math

Estimate the cost of waiting

Quick calculator. Put in the expected monthly value of a page or link batch and the natural waiting time.

Expected monthly value, USD Average waiting time, days