AI crawler and agent access

Know exactly which AI agents can access your domain: and optimize for them.

Agent Readiness audits robots.txt, llms.txt, sitemap, structured data, and protocol support across all 14 tracked AI crawlers plus agentic-commerce readiness.

Audit your domain API docs

Check a domain

What we probe

Every agent readiness signal, explained.

agent

robots.txt deep analysis

Checks every AI crawler by name (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, CCBot, +9 more), extracts allow/disallow directives, crawl-delay, and flags disallow-all wildcard policies.

agent

llms.txt: the AI-readable web spec

Checks whether the domain serves llms.txt, measures content length, and scores how well it provides brand facts, priority pages, product categories, API docs, and citation guidance for AI systems.

agent

ai.txt policy file

Fetches and inventories the ai.txt policy declaration: scores presence against emerging best practices for AI-first publishing.

agent

sitemap.xml health

Checks sitemap presence, parses URL count, validates XML, and uses sitemap URLs to prioritize the deep crawl: same path used by AI retrieval systems.

visibility

Structured data for agent parsing

Validates all JSON-LD blocks, extracts schema.org types, and checks for Organization, WebSite, Product, FAQPage, Article, BreadcrumbList: the top types for AI knowledge-graph indexing.

agent

API and product readiness signals

Detects developer docs, pricing pages, and API reference signals: key friction points for AI agents doing autonomous product research or commerce.

performance

HTTP/2 and HTTP/3 protocol support

Detects ALPN-negotiated HTTP/2 and HTTP/3 Alt-Svc advertisements: next-gen crawlers and AI agents prefer these for faster context retrieval.

Tracked crawlers

14 AI crawlers checked individually in every robots.txt audit.

We check each by name: not just User-agent: * : and distinguish allow, block, and no-directive separately.

Crawler	Owner	Primary purpose
GPTBot	OpenAI	Training + retrieval
ChatGPT-User	OpenAI	Real-time browsing
OAI-SearchBot	OpenAI	Search index
ClaudeBot	Anthropic	Training + retrieval
Claude-SearchBot	Anthropic	Search index
PerplexityBot	Perplexity	Answer retrieval
Google-Extended	Google	Gemini training
GoogleOther	Google	AI products
CCBot	Common Crawl	Open dataset
Amazonbot	Amazon	Alexa + AI
Applebot-Extended	Apple	Apple Intelligence
Diffbot	Diffbot	Enterprise AI data
Bytespider	ByteDance	Training
YouBot	You.com	Answer retrieval

Why it matters

Agent readiness is the new SEO.

AI search visibility starts here

ChatGPT, Claude, Perplexity, and Google AI Overview all rely on their crawlers to build knowledge about your domain. Blocking or ignoring them removes you from AI-generated answers.

Agentic commerce requires machine-readable structure

As AI agents make autonomous purchase, research, and booking decisions, domains without llms.txt, clean schema, and clear pricing get skipped for machine-readable competitors.

Fine-grained control over training vs. retrieval

You can allow retrieval bots (ChatGPT-User, Claude-SearchBot) while blocking training crawlers (GPTBot, ClaudeBot): but only if your robots.txt is explicit. Wildcard disallow blocks both.

Run agent readiness audit AI Crawler Access Index →