Automate Index Coverage Monitoring with the URL Inspection API (Python + n8n)
Detecting traffic loss after it happens is the easy part. By the time Google Search Console shows a drop, the page has already been quietly dropped from the index — often weeks earlier. The expensive failure mode in modern SEO isn’t a ranking slide; it’s pages that get crawled but never indexed, or that silently fall out of coverage and sit there bleeding potential traffic while your rank-tracker keeps reporting “not in top 100” with no explanation.
The fix is to stop treating index coverage as something you check by hand in the GSC Pages report. The URL Inspection API gives you the same per-URL verdict the GSC UI shows — coverageState, indexingState, last crawl time, canonical, robots verdict — programmatically, at up to 2,000 queries per property per day. This post walks through a Python + n8n pipeline that inspects every URL in your sitemap on a schedule, flags the ones drifting out of the index, and pings you on Slack before the traffic damage shows up in your reports.
Why the URL Inspection API beats manual coverage checks
The GSC Pages report aggregates. It tells you 412 URLs are “Crawled – currently not indexed” but makes you click into each reason bucket and export to find which URLs, and even then the export is sampled and lagged by days. For a site with thousands of URLs, that’s not a monitoring system — it’s an autopsy.
The URL Inspection API returns a structured verdict for a single URL on demand. The field that matters most is indexStatusResult.coverageState, a human-readable string such as:
"Submitted and indexed"— the healthy state."Crawled - currently not indexed"— Google fetched it and decided it wasn’t worth indexing. Usually a quality or duplication signal."Discovered - currently not indexed"— Google knows the URL exists but hasn’t crawled it. Often a crawl-budget or internal-linking problem."Duplicate, Google chose different canonical than user"— your canonical is being overridden."URL is unknown to Google"— never discovered at all.
Two of these — crawled-not-indexed and discovered-not-indexed — are the silent killers. A page can sit in either state for months. If you catch the shift the week it happens, you can act (improve the content, fix internal links, consolidate duplicates) while the page still has a chance. That early-warning loop is exactly the gap this pipeline fills, and it pairs naturally with a content-decay detector that flags traffic loss from the GSC performance API — coverage monitoring catches the pages that never got the chance to lose traffic in the first place.
Prerequisites and quota math
You need a Google Cloud project with the Search Console API enabled, a service account, and that service account’s email added as a full or restricted user on the GSC property (Settings → Users and permissions). The URL Inspection API authenticates with the service account; the property must already be verified in Search Console.
Mind the limits before you architect anything:
- 2,000 queries per property per day. If your sitemap has 12,000 URLs, you can’t inspect them all daily — you rotate. Inspect the 2,000 highest-priority or oldest-checked URLs each run.
- 600 queries per minute per property. Throttle to stay under it, or you’ll get
429responses.
The practical pattern: store a last_inspected timestamp per URL, sort ascending, and process the oldest 2,000 each day. Over a week you cover 14,000 URLs with a rolling window — plenty for most sites, and the highest-traffic pages can be tagged for daily inspection regardless of rotation.
Step 1 — Pull the URL list from your sitemap
Start with the canonical source of truth for “pages that should be indexed”: your XML sitemap(s). Parsing it keeps your monitored set in sync with what you’re actually telling Google to index.
import requests, xml.etree.ElementTree as ET
NS = {"sm": "http://www.sitemaps.org/schemas/sitemap/0.9"}
def urls_from_sitemap(sitemap_url):
r = requests.get(sitemap_url, timeout=30)
root = ET.fromstring(r.content)
# Handle sitemap index files (nested sitemaps)
if root.tag.endswith("sitemapindex"):
for sm in root.findall("sm:sitemap/sm:loc", NS):
yield from urls_from_sitemap(sm.text)
else:
for loc in root.findall("sm:url/sm:loc", NS):
yield loc.text
all_urls = list(urls_from_sitemap("https://example.com/sitemap.xml"))
print(f"{len(all_urls)} URLs to monitor")
Step 2 — Inspect URLs with throttling and backoff
Authenticate the service account, then call urlInspection.index.inspect for each URL. The two things that will bite you in production are the per-minute rate limit and transient 503s, so build throttling and exponential backoff in from the start.
import time, random
from google.oauth2 import service_account
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
SCOPES = ["https://www.googleapis.com/auth/webmasters.readonly"]
SITE = "https://example.com/"
creds = service_account.Credentials.from_service_account_file(
"service-account.json", scopes=SCOPES)
svc = build("searchconsole", "v1", credentials=creds)
def inspect(url, attempt=1):
body = {"inspectionUrl": url, "siteUrl": SITE, "languageCode": "en-US"}
try:
resp = svc.urlInspection().index().inspect(body=body).execute()
idx = resp["inspectionResult"]["indexStatusResult"]
return {
"url": url,
"coverage": idx.get("coverageState"),
"verdict": idx.get("verdict"),
"robots": idx.get("robotsTxtState"),
"indexing": idx.get("indexingState"),
"last_crawl": idx.get("lastCrawlTime"),
"canonical_google": idx.get("googleCanonical"),
"canonical_user": idx.get("userCanonical"),
}
except HttpError as e:
if e.resp.status in (429, 503) and attempt <= 4:
wait = (2 ** attempt) + random.random()
time.sleep(wait) # 2s, 4s, 8s, 16s backoff
return inspect(url, attempt + 1)
raise
results = []
for u in all_urls[:2000]: # respect the 2,000/day quota
results.append(inspect(u))
time.sleep(0.12) # ~500/min, safely under 600/min
Step 3 — Flag the problem states
Now reduce the raw verdicts to an actionable alert set. Anything that isn't cleanly indexed, or where Google overrode your canonical, gets surfaced.
PROBLEM_STATES = {
"Crawled - currently not indexed",
"Discovered - currently not indexed",
"Duplicate, Google chose different canonical than user",
"URL is unknown to Google",
"Excluded by 'noindex' tag",
}
flagged = [
r for r in results
if r["coverage"] in PROBLEM_STATES
or (r["canonical_google"] and r["canonical_user"]
and r["canonical_google"] != r["canonical_user"])
]
print(f"{len(flagged)} of {len(results)} inspected URLs need attention")
Store the full results set with a timestamp so you can diff against the previous run. A URL that flips from "Submitted and indexed" to "Crawled - currently not indexed" between runs is your highest-signal alert — that's a fresh regression, not a long-standing known issue. Logging the history to BigQuery or Google Sheets also lets you build the kind of coverage trend dashboard covered in the BigQuery + Looker Studio SEO dashboard teardown.
Step 4 — Orchestrate and alert with n8n
Wrapping this in n8n turns a script you have to remember to run into a monitoring system that runs itself. The minimal workflow:
- Schedule Trigger — daily at, say, 06:00, before you start work.
- Execute Command (or a Code node) — run the inspection script; emit the
flaggedlist as JSON. - IF node — only continue when
flagged.length > 0. - Set / Function node — format a compact digest: count by
coveragestate, plus the URLs that newly regressed since the last run. - Slack node — post to your SEO channel with the digest and a link to the full Sheet/BigQuery row set.
Keep the heavy lifting in Python and let n8n handle scheduling, branching, retries, and notification fan-out. This is the same division of labour used across SAC pipelines — for example the n8n + GSC keyword-research workflow — and it keeps your automations debuggable: when something breaks you know whether it's the data layer (Python) or the orchestration layer (n8n).
Results: what this catches that dashboards miss
Running this on a mid-sized content site, the pattern that consistently surfaces is a cluster of recently published or recently updated pages stuck in "Discovered - currently not indexed". The aggregate GSC report buries them inside a months-old backlog of the same status; the diff-based alert isolates the new entrants — usually a sign that a new section is under-linked internally and Googlebot hasn't prioritized crawling it. Fixing internal links (or submitting an updated sitemap) typically clears them within a crawl cycle or two.
The second recurring catch is canonical overrides on near-duplicate templated pages — exactly the failure mode that plagues programmatic SEO at scale. If you run a pSEO build, pairing this monitor with a pre-publish quality gate in your programmatic SEO pipeline closes the loop: the gate stops thin pages going live, and the monitor catches the ones that slip through and get demoted post-publish.
The takeaway: coverage monitoring is the cheapest high-leverage automation in the SEO stack. It's read-only, it runs on free Google quota, and it converts a reactive "why did traffic drop" investigation into a proactive "this page is drifting, fix it now" alert. Build it once and it pays rent every morning.
If this kind of working-code playbook is useful, bookmark SEO Automation Club — we publish a new automation teardown most mornings, each one built around real pipelines you can deploy rather than generic advice. For the detection side of the same coin, read the companion piece on building a custom rank tracker on the GSC API.
Frequently asked questions
Is the URL Inspection API free to use?
Yes. It's part of the Search Console API and has no monetary cost. The only constraints are the quotas: 2,000 queries per property per day and 600 queries per minute per property. For sites larger than ~2,000 URLs you rotate inspection across days rather than checking everything daily.
How is this different from the Indexing API?
The Indexing API requests crawling/indexing and is officially supported only for pages with JobPosting or BroadcastEvent structured data. The URL Inspection API is read-only — it reports the current index status of any URL on a verified property. This pipeline uses the URL Inspection API for monitoring; it does not try to force-index general content pages, which Google does not support.
Can a service account access the URL Inspection API?
Yes, provided the service account's email is added as a user on the verified Search Console property and the project has the Search Console API enabled. The required OAuth scope is webmasters.readonly.
What's the single most actionable field in the response?
indexStatusResult.coverageState. It mirrors the verdict string shown in the GSC UI and tells you exactly why a URL is or isn't indexed. Diffing it against the previous run isolates fresh regressions, which are the alerts worth acting on immediately.
