AI ClicksScan · Detect · Secure

AI crawler and agent access

Know exactly which AI agents can access your domain — and optimize for them.

Agent Readiness audits robots.txt, llms.txt, sitemap, structured data, and protocol support across all 14 tracked AI crawlers plus agentic-commerce readiness.

Check a domain

What we probe

Every agent readiness signal, explained.

agent

robots.txt deep analysis

Checks every AI crawler by name (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, CCBot, +9 more), extracts allow/disallow directives, crawl-delay, and flags disallow-all wildcard policies.

agent

llms.txt — the AI-readable web spec

Checks whether the domain serves llms.txt, measures content length, and scores how well it provides brand facts, priority pages, product categories, API docs, and citation guidance for AI systems.

agent

ai.txt policy file

Fetches and inventories the ai.txt policy declaration — scores presence against emerging best practices for AI-first publishing.

agent

sitemap.xml health

Checks sitemap presence, parses URL count, validates XML, and uses sitemap URLs to prioritize the deep crawl — same path used by AI retrieval systems.

visibility

Structured data for agent parsing

Validates all JSON-LD blocks, extracts schema.org types, and checks for Organization, WebSite, Product, FAQPage, Article, BreadcrumbList — the top types for AI knowledge-graph indexing.

agent

API and product readiness signals

Detects developer docs, pricing pages, and API reference signals — key friction points for AI agents doing autonomous product research or commerce.

performance

HTTP/2 and HTTP/3 protocol support

Detects ALPN-negotiated HTTP/2 and HTTP/3 Alt-Svc advertisements — next-gen crawlers and AI agents prefer these for faster context retrieval.

Tracked crawlers

14 AI crawlers checked individually in every robots.txt audit.

We check each by name — not just User-agent: * — and distinguish allow, block, and no-directive separately.

CrawlerOwnerPrimary purpose
GPTBotOpenAITraining + retrieval
ChatGPT-UserOpenAIReal-time browsing
OAI-SearchBotOpenAISearch index
ClaudeBotAnthropicTraining + retrieval
Claude-SearchBotAnthropicSearch index
PerplexityBotPerplexityAnswer retrieval
Google-ExtendedGoogleGemini training
GoogleOtherGoogleAI products
CCBotCommon CrawlOpen dataset
AmazonbotAmazonAlexa + AI
Applebot-ExtendedAppleApple Intelligence
DiffbotDiffbotEnterprise AI data
BytespiderByteDanceTraining
YouBotYou.comAnswer retrieval

Why it matters

Agent readiness is the new SEO.

AI search visibility starts here

ChatGPT, Claude, Perplexity, and Google AI Overview all rely on their crawlers to build knowledge about your domain. Blocking or ignoring them removes you from AI-generated answers.

Agentic commerce requires machine-readable structure

As AI agents make autonomous purchase, research, and booking decisions, domains without llms.txt, clean schema, and clear pricing get skipped for machine-readable competitors.

Fine-grained control over training vs. retrieval

You can allow retrieval bots (ChatGPT-User, Claude-SearchBot) while blocking training crawlers (GPTBot, ClaudeBot) — but only if your robots.txt is explicit. Wildcard disallow blocks both.