| | |

Agentic SEO with Claude Code: Build an Autonomous Technical Auditor That Files Its Own GitHub Issues

Most “AI-powered” SEO audits today are barely automation. They run a crawler, hand a CSV to a chatbot, and produce a glossy PDF that someone on the marketing team is supposed to email to engineering — where it dies in a Slack thread. The audit happens, but nothing ships.

Over the last quarter I have been running an experiment that closes that loop. Instead of producing a report, the auditor produces a queue of tracked engineering tickets, each one scoped, prioritized, and grouped by repository. The agent is a thin orchestration layer around Claude Code (the Claude Agent SDK), a Bright Data-powered crawler, and the GitHub REST API. It runs twice a week on a schedule, diffs the previous run, and only files issues for things that genuinely changed. After eight weeks, the agent has filed 73 issues across two production sites. Forty-one have been closed. Median time-to-resolution dropped from “never” to 6 days.

This post is the teardown: the architecture, the tradeoffs, and the actual prompt-and-tool pattern that lets an agent take SEO findings and turn them into pull request material instead of slide decks.

The problem with traditional SEO audit pipelines

The conventional audit stack — Screaming Frog → spreadsheet → human triage → Jira — has three failure modes that automation has historically made worse, not better.

It produces too much output. A full crawl of a 50k URL site can surface 8,000 issues. Most are noise: a duplicate title tag on a paginated archive, an h1 missing from a print-only template, redirect chains inside an old marketing redirect file no one reads. Dumping that into a ticketing system buries the items that matter.

It loses context between runs. Run-over-run, a traditional audit cannot tell you whether title tags getting shorter on category pages is a recent regression or a year-old known issue. Without a diff, every audit looks like the first one.

It hands off to humans who don’t have repo context. SEO findings are written in the language of HTTP status codes and meta tags. Engineers want to know which file in which repo changed and what the regression test should look like. That translation step is where most audits stall.

The agentic approach fixes all three by inverting the workflow: instead of generating a report and asking a human to file tickets, the agent files tickets directly and lets a human triage the queue.

Architecture: four moving parts

The auditor has four components and one piece of state. It is small on purpose — every additional dependency is a place the schedule can break overnight without anyone noticing.

1. The crawler

I’m using a Bright Data Web Unlocker zone behind a simple Python harness. The harness reads a config file listing the seed URLs per site, the depth, and a list of URL patterns to exclude (utility pages, paginated archives past page 3, faceted-search URL noise). For each crawled URL it stores the final status code, the canonical, the title, the meta description, the h1, the count of internal vs external links, and the rendered word count.

I deliberately do not use a heavy crawler like Screaming Frog or Sitebulb at this stage. Those tools are excellent for human-driven exploration, but for an unattended weekly run I want a 200-line Python script I can read in one sitting and a known-good JSONL output format the rest of the pipeline can depend on. If you want the full comparison, the n8n vs Make vs Zapier teardown covers the same “favor simplicity for unattended jobs” principle.

2. The rule engine

Before any LLM gets involved, a deterministic rule pass runs against the crawl JSONL. This catches the unambiguous stuff: missing canonical, 404 from internal link, title over 60 chars, duplicate title across more than one indexable URL, 3xx chain longer than two hops, mismatch between canonical and self-referencing URL. Each rule emits a row with {rule_id, severity, urls_affected, evidence}.

This part is boring on purpose. The agent is expensive. The rule pass is free. Anything you can decide without an LLM, decide without an LLM.

3. The diff store

This is the single piece of state. Every run, the auditor writes the rule pass output to a SQLite database keyed by (site_id, run_date, rule_id, url_hash). Before producing any issues, the agent compares the latest run to the previous run and computes three sets per rule: new, resolved, and persisting.

Only new findings become candidates for issue creation. Resolved findings auto-close any open GitHub issues that reference the same fingerprint. Persisting findings are ignored unless they cross a configurable age threshold, at which point the agent escalates them as a single rollup issue rather than filing 400 individual tickets.

This diff-first approach is what makes the system tolerable to live with. The first run is loud. The second run is much quieter. By the fifth run, the agent files three or four issues a week, and every one of them is something genuinely new.

4. The Claude Code agent

For each new finding (or batch of related findings), the agent receives a single prompt with the rule output, the evidence rows, and access to two tools: a read-only repository search tool and a write tool for creating GitHub issues via the REST API.

The agent’s job is narrow: take the finding, locate the most likely responsible template or file in the repository, and write a GitHub issue that an engineer can act on without needing to know what a canonical tag is. The system prompt is explicit that the agent must not propose a code fix in the issue body — only describe the symptom, the expected behavior, and the file location. The fix is deliberately left to the engineer or to a follow-up code-writing agent.

The agent prompt that actually worked

I went through about a dozen versions of the system prompt. The version that finally produced consistently file-able issues had four properties.

First, it included a worked example. Not a generic “write a clear issue” instruction, but a real before/after: here is a finding row, here is the issue we would file from it, here is what the issue must not contain. Few-shot examples beat abstract instructions every time in my testing.

Second, it constrained the structure. The issue must have a one-line title, a “What we observed” paragraph that quotes the evidence, a “Where this likely lives” section that names a file or template, and a “Suggested check” section that names a regression test. No introductions, no recap of why SEO matters, no closing pleasantries. Agents over-explain by default; structural constraints fix this.

Third, it required a confidence label. The agent must classify each finding as high, medium, or low confidence and apply a matching GitHub label. Issues labeled low are auto-routed to a triage column rather than the team’s main backlog. This is the single change that made engineering leadership stop complaining about ticket volume.

Fourth, it forbade speculation. If the agent cannot locate a plausible file in the repo with the search tool, it must say so explicitly in the issue body and skip the “Where this likely lives” section. Hallucinated file paths were the most common failure mode in early runs.

Hooking it to a schedule

The whole thing runs on a Tuesday/Friday cron. The orchestration is a single shell script: crawl, run rules, diff, invoke the agent, write a digest to a Markdown file in the audit repo. The digest is the only human-facing artifact and it exists mostly so that on Monday morning I can scan a five-line summary and see how the week went.

If you already have an n8n instance for SEO work, the same flow drops into n8n nodes one-for-one. The crawl becomes an Execute Command node, the rule engine becomes a Code node, the diff is a Postgres node, and the agent invocation is an HTTP Request node calling the Claude Agent SDK. The pattern is identical; only the runtime changes. The Python and Slack alerts monitoring workflow is a good companion if you want the same “diff-first” approach for non-SEO signals like uptime and Core Web Vitals.

What the numbers actually looked like

After eight weeks on two production sites (one ecommerce, one content), the agent had created 73 issues. Twelve were high confidence and high impact: things like a misconfigured canonical that started pointing all paginated category pages back to page 1 after a template refactor. Every one was closed within a sprint. Thirty-eight were medium — primarily title tag drift on auto-generated pages and a slow accumulation of 301 chains. About half were closed; the rest were correctly downgraded to “won’t fix” with engineering rationale. Twenty-three were low confidence and never reached the main backlog, which is exactly what should happen.

The single largest behavior change was not the number of issues filed — it was that engineering started commenting on them. A SEO finding written in repo language is a finding a developer can argue with. Half of the conversations on agent-filed issues are engineers pushing back, which is a healthier dynamic than silent ignoring.

Three things I would change if I started over

Version the rule definitions and require any rule change to bump a version number. The differ currently treats a rule update as a new rule, producing a one-time wave of false positives. Versioning lets the differ skip rules whose definitions just changed.

Feed the previous week’s closed issues back into the agent as negative examples. Embedding closed issues and running a similarity check before filing would cut duplicate ticket noise further.

Give the agent read access to git log around the last deploy. Most “new” SEO regressions correlate with a specific deploy, and surfacing the commit range in the issue would save the engineer fifteen minutes of bisecting.

Where this pattern fits

Agentic SEO is not about replacing the SEO practitioner — it is about removing the rope-pulling work between “we found a problem” and “engineering knows about a problem in their language.” If your team already has a healthy auditing rhythm and the bottleneck is execution, this pattern earns its keep within a month. If you don’t have an audit rhythm at all, start with the rules engine and the diff store; the agent layer is the last piece, not the first.

For a deeper look at the SEO automation stack this fits into, the 2026 practitioner stack covers the rest of the surface area.

If you found this useful, bookmark SEO Automation Club — we publish working playbooks like this twice a day and a longer Friday case study every week. Subscribe to the newsletter at the top of the page to get the next one in your inbox.

FAQs

Do I need Claude Code specifically, or will a different agent framework work?

The pattern is framework-agnostic. Any agent runtime that can call tools (read the repo, write a GitHub issue) and that respects a structured output format will work. Claude Code is convenient because the tool-calling surface is easy to wire up to a local repo, but I have prototyped the same flow with the OpenAI Assistants API and a self-hosted CrewAI setup. The tradeoffs are in tooling, not in the architecture.

How much does a run actually cost?

On the two sites I run, the crawl is the most expensive line item (a few dollars per run via Bright Data on the larger ecommerce site). The agent invocations cost cents per finding because the rule pass already does the heavy lifting and the agent only sees small evidence rows. Total weekly cost across both sites is under fifteen dollars at current pricing.

Won’t engineers complain about an AI filing tickets?

In my experience they complain for the first two weeks and then ask you to expand the scope. The trick is to label every agent-filed issue clearly (I prefix titles with [SEO-Auditor]) and to route low-confidence issues away from the main backlog. Once engineering trusts that the queue is filtered, the conversation shifts from “stop filing these” to “can you also catch X?”

Does this replace tools like Screaming Frog or Sitebulb?

No. The agentic auditor is for known, deterministic regressions on production sites between deploys. Screaming Frog and Sitebulb are still the right tools for an open-ended exploratory crawl, a migration audit, or any situation where you don’t yet know what you’re looking for. The two workflows complement each other rather than overlap.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *