Stop logging into Search Console every morning. Fetch crawl stats, error counts, and sitemap status programmatically. This is the integration playbook for the Search Console API — with real filters, rate limits, and failure modes.
The Google Search Console API (the real google crawl api for this use case) exposes the same crawl error and index coverage data you see in the web interface — but in a machine-readable format. The bottleneck is not the data volume. It is the setup friction: OAuth scoping, property-level permissions, and query parameter tuning.
In practice, when you run a crawl API query for the first time, you will likely hit a 403 because the service account lacks ownership. Or you query the wrong siteUrl format (with vs. without protocol). These are not bugs. They are design constraints. You need to understand the Google Search Essentials to map API responses to real crawl issues. The API does not tell you a URL is weak. It tells you it returned 500 or blockedByRobotsTxt. You decide what to alert on.
Two endpoints matter for crawl monitoring. First: sites.list to verify you have access. Second: sites/{siteUrl}/crawlErrorsCounts/query for error counts grouped by error type and URL category. Third: sites/{siteUrl}/index/coverage for coverage state (submitted and indexed, not indexed, etc).
A common situation we see: teams set up the crawl error endpoint but forget to filter by latestCountsOnly=true. They pull 90 days of daily data, parse 90 rows per error type, and wonder why their dashboard is slow. Use that parameter unless you need trends. Also: the API returns 0 for error types that have no data. A row with all zeros does not mean the endpoint is broken. It means no errors of that type were found.
Use OAuth 2.0 with a service account. Assign owner-level permission in GSC. Test with sites.list.
Domain property (scSet) or URL-prefix. Format: scDomain:example.com or https://example.com.
POST to crawlErrorsCounts/query with latestCountsOnly=true and platform=web.
Map error type strings (serverError, notFound, blockedByRobotsTxt) to human labels.
Log counts to a time-series DB. Fire alert if serverError > 50 or notFound spikes 20%.
Schedule every 6 hours. Respect the 200k rows/day quota. Use incremental pulls.
| Error type (API key) | What it means | Typical action | Hidden risk / failure mode |
|---|---|---|---|
| serverError 5xx HTTP status | Server returned 500, 502, 503, 504 | Check server logs, CDN, or hosting provider | False positive if load balancer returns 503 during maintenance. Filter by time window. |
| notFound 404 HTTP status | URL does not exist on server | Redirect or remove dead links | Soft 404s (200 with 'no such page') are not caught by this endpoint. Use index coverage API separately. |
| blockedByRobotsTxt Disallowed | robots.txt denies crawling | Review robots.txt rules | May hide real issues. If a page is disallowed, Google cannot assess it. You lose visibility. |
| dnsError DNS resolution failure | Googlebot cannot resolve hostname | Check DNS records, TTL, or provider | Transient. A single DNS error may be a network blip. Do not alert on 1 occurrence. Threshold at 5+. |
| blockedByMetaTag noindex meta tag | Page includes noindex directive | Remove or update meta tag | Common after site migrations. Often intentional. Cross-reference with index coverage status. |
| crawlTimeExceeded Timeout | Page load > timeout threshold | Optimize page speed, reduce bloat | May be caused by third-party scripts. API does not tell you which script. Pair with Core Web Vitals data. |
Assume you manage a site with 50k indexed pages. You want to alert when server errors exceed 100 in a 6-hour window.
Setup:
scDomain:example.com.Query:
POST to https://searchconsole.googleapis.com/v1/sites/scDomain:example.com/crawlErrorsCounts/query with body:
{
"latestCountsOnly": true,
"platform": "web",
"category": "serverError"
}Response parsing: The API returns a countPerType array. Sum the counts[].count values across all URL categories (submitted, not submitted, etc). In one real test, a site had: submitted URLs = 43 server errors, not submitted = 12, sitemaps = 8. Total = 63.
Alert logic: If total > 100, send Slack webhook. In this case, 63 < 100 so no alert. But if the count jumps to 140, the pipeline fires.
Edge case: The API does not return a list of which URLs caused the errors. Use the crawlErrorsSamples/list endpoint to pull sample URLs for each error type. Limit: 1000 samples per error type per site.
Three failures hit every team using the google crawl api for monitoring:
1. Blocked URLs. The API counts blocked-by-robots errors. But it does not tell you the URL is weak. You need to cross-reference with index coverage API. If a page is blocked and not indexed, that is a problem. If it is blocked but indexed (via sitemap), that is a conflict.
2. Wrong filters. If you omit the platform parameter, the API returns data for all platforms (web, mobile, smartphone) concatenated. This will inflate your error counts. Always set platform=web unless you need mobile-specific data.
3. Duplicate lists. The crawl errors endpoint returns counts, not URLs. If you also pull from the index coverage endpoint, you may double-count the same issue. Decide which endpoint is your source of truth for each metric. We use crawl errors for reachability and index coverage for inclusion.
For a complementary approach to bulk URL checking without full GSC access, see this breakdown of mass verification without GSC. And for agencies needing a pragmatic tool, this index checker overview covers workflow integration.
Service account has owner-level access to the GSC property, not just read.
Property URL format matches exactly: domain properties use scDomain: prefix, URL-prefix properties use full https:// URL.
latestCountsOnly set to true unless you need historical daily trends.
Platform filter set explicitly to web, mobile, or smartphone.
Error category filter (optional) used to reduce payload size.
Crawl sample endpoint tested to confirm you can fetch example URLs.
Alert thresholds reviewed against typical baseline (e.g., 0-5 server errors per day is normal for a healthy site).
Rate limit budget calculated: 200,000 rows/day per property; each query consumes rows equal to number of (error type x category) combinations.
Use the startDate and endDate fields in the request body. Format as YYYY-MM-DD. Note that the API returns data only from the last 90 days near-real-time. For older data, use the Search Console web UI export.
Yes. Loop through each property in your GSC account. Use sites.list to get all accessible properties, then query crawlErrorsCounts for each. Be careful with daily quota: 200k rows per property, not per account. For 50 properties, you need to spread queries across time windows.
Crawl errors (crawlErrorsCounts endpoint) measure whether Googlebot can reach a URL. Index coverage (index endpoint) measures whether a URL is in the index. A URL can be reachable but not indexed (unlikely) or indexed but unreachable (conflict). Use both endpoints for full diagnostic.
Check the platform filter. If you query with platform=web but the errors appear under smartphone in the UI, the API returns 0. Also verify the property URL format. Domain properties require the scDomain: prefix. Lastly, clear your API cache if you recently changed property permissions.
The rate limit is 200,000 rows per property per day. Each query returns one row per (errorType, category) combination. Typically 12 error types x 3 categories = 36 rows per query. You can afford 5,555 queries per day. Schedule once every 15 seconds. Use exponential backoff on 429 responses.
Most common: using an API key instead of OAuth. The Search Console API requires OAuth 2.0. Second: the service account email is added as a user but not as an owner. Only owners can access crawl error data. Third: the wrong OAuth scope. Use 'https://www.googleapis.com/auth/webmasters.readonly'.
Yes, but not from the crawlErrorsCounts endpoint. Use crawlErrorsSamples/list with the same error type and platform. It returns up to 1000 sample URLs per error type per site. If you need the full list, you have to paginate through all samples or use the index coverage endpoint with filtering.
Write a middle layer (e.g., Cloud Function or a small Python script) that queries the API every 6 hours and pushes results to a time-series database (InfluxDB, BigQuery). Then connect Grafana or Looker Studio to that database. Avoid hitting the API directly from the dashboard — it will blow your quota on page refreshes.
Verify the service account email is added as an owner in GSC, not just a user. Also check that you are using the correct property URL format. For domain properties, use scDomain:example.com not https://example.com. For URL-prefix properties, use the exact URL including protocol.
Yes. Set latestCountsOnly=false in the crawlErrorsCounts query. The API then returns daily counts for the last 90 days. Be prepared to handle JSON with multiple date entries. Use this for trend dashboards, but note that the data is updated with a 2-3 day delay for historical accuracy.
Quick calculator. Put in the expected monthly value of a page or link batch and the natural waiting time.