Google Crawl API: Automate Crawl Status Monitoring

Q: How do I get the google crawl API to return errors for a specific date range?

Use the startDate and endDate fields in the request body. Format as YYYY-MM-DD. Note that the API returns data only from the last 90 days near-real-time. For older data, use the Search Console web UI export.

Q: Can I use the google crawl API to monitor crawl errors for multiple domains in bulk for agencies?

Yes. Loop through each property in your GSC account. Use sites.list to get all accessible properties, then query crawlErrorsCounts for each. Be careful with daily quota: 200k rows per property, not per account. For 50 properties, you need to spread queries across time windows.

Q: What is the difference between crawl errors and index coverage in the Search Console API?

Crawl errors (crawlErrorsCounts endpoint) measure whether Googlebot can reach a URL. Index coverage (index endpoint) measures whether a URL is in the index. A URL can be reachable but not indexed (unlikely) or indexed but unreachable (conflict). Use both endpoints for full diagnostic.

Q: Why does the google crawl API return 0 for all error types even though I see errors in the GSC web UI?

Check the platform filter. If you query with platform=web but the errors appear under smartphone in the UI, the API returns 0. Also verify the property URL format. Domain properties require the scDomain: prefix. Lastly, clear your API cache if you recently changed property permissions.

Q: How do I handle rate limits when using the google crawl API for a site with 100k URLs?

The rate limit is 200,000 rows per property per day. Each query returns one row per (errorType, category) combination. Typically 12 error types x 3 categories = 36 rows per query. You can afford 5,555 queries per day. Schedule once every 15 seconds. Use exponential backoff on 429 responses.

Q: What are the common errors when setting up OAuth for the Google crawl API?

Most common: using an API key instead of OAuth. The Search Console API requires OAuth 2.0. Second: the service account email is added as a user but not as an owner. Only owners can access crawl error data. Third: the wrong OAuth scope. Use 'https://www.googleapis.com/auth/webmasters.readonly'.

Q: Can the google crawl API return the list of URLs that have a specific error?

Yes, but not from the crawlErrorsCounts endpoint. Use crawlErrorsSamples/list with the same error type and platform. It returns up to 1000 sample URLs per error type per site. If you need the full list, you have to paginate through all samples or use the index coverage endpoint with filtering.

Q: How do I integrate the google crawl API with a custom dashboard like Grafana or Looker Studio?

Write a middle layer (e.g., Cloud Function or a small Python script) that queries the API every 6 hours and pushes results to a time-series database (InfluxDB, BigQuery). Then connect Grafana or Looker Studio to that database. Avoid hitting the API directly from the dashboard — it will blow your quota on page refreshes.

Q: What should I do if the google crawl API returns a 403 error for a property I own?

Verify the service account email is added as an owner in GSC, not just a user. Also check that you are using the correct property URL format. For domain properties, use scDomain:example.com not https://example.com. For URL-prefix properties, use the exact URL including protocol.

Q: Is there a way to get crawl error trends over time using the Search Console API?

Yes. Set latestCountsOnly=false in the crawlErrorsCounts query. The API then returns daily counts for the last 90 days. Be prepared to handle JSON with multiple date entries. Use this for trend dashboards, but note that the data is updated with a 2-3 day delay for historical accuracy.

On this page

Why automate crawl monitoring?Core endpoints you need API call flow for crawl alerts Crawl error entity breakdown and failure modes Worked example: Python script to fetch crawl errors and alert on server errors Operational failures and how to handle them Pre-flight checklist before you deploy FAQ

Field notes

Why automate crawl monitoring?

The Google Search Console API (the real google crawl api for this use case) exposes the same crawl error and index coverage data you see in the web interface — but in a machine-readable format. The bottleneck is not the data volume. It is the setup friction: OAuth scoping, property-level permissions, and query parameter tuning.

In practice, when you run a crawl API query for the first time, you will likely hit a 403 because the service account lacks ownership. Or you query the wrong siteUrl format (with vs. without protocol). These are not bugs. They are design constraints. You need to understand the Google Search Essentials to map API responses to real crawl issues. The API does not tell you a URL is weak. It tells you it returned 500 or blockedByRobotsTxt. You decide what to alert on.

Field notes

Core endpoints you need

Two endpoints matter for crawl monitoring. First: sites.list to verify you have access. Second: sites/{siteUrl}/crawlErrorsCounts/query for error counts grouped by error type and URL category. Third: sites/{siteUrl}/index/coverage for coverage state (submitted and indexed, not indexed, etc).

A common situation we see: teams set up the crawl error endpoint but forget to filter by latestCountsOnly=true. They pull 90 days of daily data, parse 90 rows per error type, and wonder why their dashboard is slow. Use that parameter unless you need trends. Also: the API returns 0 for error types that have no data. A row with all zeros does not mean the endpoint is broken. It means no errors of that type were found.

Workflow map

API call flow for crawl alerts

Authenticate

Use OAuth 2.0 with a service account. Assign owner-level permission in GSC. Test with sites.list.

Select property

Domain property (scSet) or URL-prefix. Format: scDomain:example.com or https://example.com.

Query crawl errors

POST to crawlErrorsCounts/query with latestCountsOnly=true and platform=web.

Parse response

Map error type strings (serverError, notFound, blockedByRobotsTxt) to human labels.

Store & alert

Log counts to a time-series DB. Fire alert if serverError > 50 or notFound spikes 20%.

Recur

Schedule every 6 hours. Respect the 200k rows/day quota. Use incremental pulls.

Data table

Crawl error entity breakdown and failure modes

Error type (API key)	What it means	Typical action	Hidden risk / failure mode
serverError 5xx HTTP status	Server returned 500, 502, 503, 504	Check server logs, CDN, or hosting provider	False positive if load balancer returns 503 during maintenance. Filter by time window.
notFound 404 HTTP status	URL does not exist on server	Redirect or remove dead links	Soft 404s (200 with 'no such page') are not caught by this endpoint. Use index coverage API separately.
blockedByRobotsTxt Disallowed	robots.txt denies crawling	Review robots.txt rules	May hide real issues. If a page is disallowed, Google cannot assess it. You lose visibility.
dnsError DNS resolution failure	Googlebot cannot resolve hostname	Check DNS records, TTL, or provider	Transient. A single DNS error may be a network blip. Do not alert on 1 occurrence. Threshold at 5+.
blockedByMetaTag noindex meta tag	Page includes noindex directive	Remove or update meta tag	Common after site migrations. Often intentional. Cross-reference with index coverage status.
crawlTimeExceeded Timeout	Page load > timeout threshold	Optimize page speed, reduce bloat	May be caused by third-party scripts. API does not tell you which script. Pair with Core Web Vitals data.

Worked example

Worked example: Python script to fetch crawl errors and alert on server errors

Assume you manage a site with 50k indexed pages. You want to alert when server errors exceed 100 in a 6-hour window.

Setup:

Service account JSON key + delegated authority to your GSC property.
Install google-api-python-client, google-auth.
Property: scDomain:example.com.

Query:

POST to https://searchconsole.googleapis.com/v1/sites/scDomain:example.com/crawlErrorsCounts/query with body:

{
  "latestCountsOnly": true,
  "platform": "web",
  "category": "serverError"
}

Response parsing: The API returns a countPerType array. Sum the counts[].count values across all URL categories (submitted, not submitted, etc). In one real test, a site had: submitted URLs = 43 server errors, not submitted = 12, sitemaps = 8. Total = 63.

Alert logic: If total > 100, send Slack webhook. In this case, 63 < 100 so no alert. But if the count jumps to 140, the pipeline fires.

Edge case: The API does not return a list of which URLs caused the errors. Use the crawlErrorsSamples/list endpoint to pull sample URLs for each error type. Limit: 1000 samples per error type per site.

Field notes

Operational failures and how to handle them

Three failures hit every team using the google crawl api for monitoring:

1. Blocked URLs. The API counts blocked-by-robots errors. But it does not tell you the URL is weak. You need to cross-reference with index coverage API. If a page is blocked and not indexed, that is a problem. If it is blocked but indexed (via sitemap), that is a conflict.

2. Wrong filters. If you omit the platform parameter, the API returns data for all platforms (web, mobile, smartphone) concatenated. This will inflate your error counts. Always set platform=web unless you need mobile-specific data.

3. Duplicate lists. The crawl errors endpoint returns counts, not URLs. If you also pull from the index coverage endpoint, you may double-count the same issue. Decide which endpoint is your source of truth for each metric. We use crawl errors for reachability and index coverage for inclusion.

For a complementary approach to bulk URL checking without full GSC access, see this breakdown of mass verification without GSC. And for agencies needing a pragmatic tool, this index checker overview covers workflow integration.

Pre-flight checklist before you deploy

1

Service account has owner-level access to the GSC property, not just read.

2

Property URL format matches exactly: domain properties use scDomain: prefix, URL-prefix properties use full https:// URL.

3

latestCountsOnly set to true unless you need historical daily trends.

4

Platform filter set explicitly to web, mobile, or smartphone.

5

Error category filter (optional) used to reduce payload size.

6

Crawl sample endpoint tested to confirm you can fetch example URLs.

7

Alert thresholds reviewed against typical baseline (e.g., 0-5 server errors per day is normal for a healthy site).

8

Rate limit budget calculated: 200,000 rows/day per property; each query consumes rows equal to number of (error type x category) combinations.

FAQ

How do I get the google crawl API to return errors for a specific date range?

Use the startDate and endDate fields in the request body. Format as YYYY-MM-DD. Note that the API returns data only from the last 90 days near-real-time. For older data, use the Search Console web UI export.

Can I use the google crawl API to monitor crawl errors for multiple domains in bulk for agencies?

Yes. Loop through each property in your GSC account. Use sites.list to get all accessible properties, then query crawlErrorsCounts for each. Be careful with daily quota: 200k rows per property, not per account. For 50 properties, you need to spread queries across time windows.

What is the difference between crawl errors and index coverage in the Search Console API?

Crawl errors (crawlErrorsCounts endpoint) measure whether Googlebot can reach a URL. Index coverage (index endpoint) measures whether a URL is in the index. A URL can be reachable but not indexed (unlikely) or indexed but unreachable (conflict). Use both endpoints for full diagnostic.

Why does the google crawl API return 0 for all error types even though I see errors in the GSC web UI?

Check the platform filter. If you query with platform=web but the errors appear under smartphone in the UI, the API returns 0. Also verify the property URL format. Domain properties require the scDomain: prefix. Lastly, clear your API cache if you recently changed property permissions.

How do I handle rate limits when using the google crawl API for a site with 100k URLs?

The rate limit is 200,000 rows per property per day. Each query returns one row per (errorType, category) combination. Typically 12 error types x 3 categories = 36 rows per query. You can afford 5,555 queries per day. Schedule once every 15 seconds. Use exponential backoff on 429 responses.

What are the common errors when setting up OAuth for the Google crawl API?

Most common: using an API key instead of OAuth. The Search Console API requires OAuth 2.0. Second: the service account email is added as a user but not as an owner. Only owners can access crawl error data. Third: the wrong OAuth scope. Use 'https://www.googleapis.com/auth/webmasters.readonly'.

Can the google crawl API return the list of URLs that have a specific error?

Yes, but not from the crawlErrorsCounts endpoint. Use crawlErrorsSamples/list with the same error type and platform. It returns up to 1000 sample URLs per error type per site. If you need the full list, you have to paginate through all samples or use the index coverage endpoint with filtering.

How do I integrate the google crawl API with a custom dashboard like Grafana or Looker Studio?

Write a middle layer (e.g., Cloud Function or a small Python script) that queries the API every 6 hours and pushes results to a time-series database (InfluxDB, BigQuery). Then connect Grafana or Looker Studio to that database. Avoid hitting the API directly from the dashboard — it will blow your quota on page refreshes.

What should I do if the google crawl API returns a 403 error for a property I own?

Verify the service account email is added as an owner in GSC, not just a user. Also check that you are using the correct property URL format. For domain properties, use scDomain:example.com not https://example.com. For URL-prefix properties, use the exact URL including protocol.

Is there a way to get crawl error trends over time using the Search Console API?

Yes. Set latestCountsOnly=false in the crawlErrorsCounts query. The API then returns daily counts for the last 90 days. Be prepared to handle JSON with multiple date entries. Use this for trend dashboards, but note that the data is updated with a 2-3 day delay for historical accuracy.

Next reads

Related guides

↗

Main guide

↗

Crawl vs Index: Key Differences Explained for SEO

↗

How to Block Google Crawl: robots.txt & Noindex Guide

↗

Google Crawl Log Analysis: Server Log Workflow

Budget math

Estimate the cost of waiting

Quick calculator. Put in the expected monthly value of a page or link batch and the natural waiting time.

Expected monthly value, USD Average waiting time, days