AI crawler and agent access
Know exactly which AI agents can access your domain — and optimize for them.
Agent Readiness audits robots.txt, llms.txt, sitemap, structured data, and protocol support across all 14 tracked AI crawlers plus agentic-commerce readiness.
Check a domain
What we probe
Every agent readiness signal, explained.
robots.txt deep analysis
Checks every AI crawler by name (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, CCBot, +9 more), extracts allow/disallow directives, crawl-delay, and flags disallow-all wildcard policies.
llms.txt — the AI-readable web spec
Checks whether the domain serves llms.txt, measures content length, and scores how well it provides brand facts, priority pages, product categories, API docs, and citation guidance for AI systems.
ai.txt policy file
Fetches and inventories the ai.txt policy declaration — scores presence against emerging best practices for AI-first publishing.
sitemap.xml health
Checks sitemap presence, parses URL count, validates XML, and uses sitemap URLs to prioritize the deep crawl — same path used by AI retrieval systems.
Structured data for agent parsing
Validates all JSON-LD blocks, extracts schema.org types, and checks for Organization, WebSite, Product, FAQPage, Article, BreadcrumbList — the top types for AI knowledge-graph indexing.
API and product readiness signals
Detects developer docs, pricing pages, and API reference signals — key friction points for AI agents doing autonomous product research or commerce.
HTTP/2 and HTTP/3 protocol support
Detects ALPN-negotiated HTTP/2 and HTTP/3 Alt-Svc advertisements — next-gen crawlers and AI agents prefer these for faster context retrieval.
Tracked crawlers
14 AI crawlers checked individually in every robots.txt audit.
We check each by name — not just User-agent: * — and distinguish allow, block, and no-directive separately.
| Crawler | Owner | Primary purpose |
|---|---|---|
| GPTBot | OpenAI | Training + retrieval |
| ChatGPT-User | OpenAI | Real-time browsing |
| OAI-SearchBot | OpenAI | Search index |
| ClaudeBot | Anthropic | Training + retrieval |
| Claude-SearchBot | Anthropic | Search index |
| PerplexityBot | Perplexity | Answer retrieval |
| Google-Extended | Gemini training | |
| GoogleOther | AI products | |
| CCBot | Common Crawl | Open dataset |
| Amazonbot | Amazon | Alexa + AI |
| Applebot-Extended | Apple | Apple Intelligence |
| Diffbot | Diffbot | Enterprise AI data |
| Bytespider | ByteDance | Training |
| YouBot | You.com | Answer retrieval |
Why it matters
Agent readiness is the new SEO.
AI search visibility starts here
ChatGPT, Claude, Perplexity, and Google AI Overview all rely on their crawlers to build knowledge about your domain. Blocking or ignoring them removes you from AI-generated answers.
Agentic commerce requires machine-readable structure
As AI agents make autonomous purchase, research, and booking decisions, domains without llms.txt, clean schema, and clear pricing get skipped for machine-readable competitors.
Fine-grained control over training vs. retrieval
You can allow retrieval bots (ChatGPT-User, Claude-SearchBot) while blocking training crawlers (GPTBot, ClaudeBot) — but only if your robots.txt is explicit. Wildcard disallow blocks both.