AI Search Readiness Audit: The 14 Signals That Decide If ChatGPT, Claude, Perplexity, and Google AI Cite You
AI search engines don't rank your site — they decide whether to cite it inside a synthesized answer. 14 specific signals separate cited sites from invisible ones. Our free 25-second audit checks every one.
AI search readiness is measured by 14 signals: 3 around crawler access (robots.txt allows for GPTBot/ClaudeBot/PerplexityBot/Google-Extended/Applebot-Extended/CCBot), 1 for llms.txt presence and quality, 4 for structured data (Organization, Person, FAQPage, SpeakableSpecification), 3 for content structure (named author bio, FAQ-style Q&A blocks, comparison content), 2 for content clarity (heading hierarchy + content freshness/dates), and 1 for authoritative outbound links. Our free audit at zorvalabs.com/ai-visibility checks all 14 in about 25 seconds.
AI search isn't ranking — it's citation
The mental model that ruled SEO for 20 years was: rank #1 on Google, win the click. The mental model for 2026 is different. When someone asks ChatGPT "best web designer in Nashville" or asks Perplexity "what's the difference between Wix and Squarespace," they get a synthesized answer that cites 3-7 sources. Being one of those sources is the new top of the funnel.
The mistake most SEO advice still makes is treating AI search like SEO with extra steps. It isn't. The signals that get you cited in a generative answer are different from the signals that get you ranked in blue links. Our AI Search Readiness Audit tests for 14 specific ones.
The 14 signals, grouped by what they prove
Each signal answers a different question an AI engine asks itself before citing you. The 14 fall into five groups.
Group 1 — Crawler access (3 signals)
If AI engines can't fetch your pages, nothing else matters. Three signals check whether they're allowed to.
1. GPTBot allowed in robots.txt
OpenAI's crawler. Used by ChatGPT for its web tool and for training data. Blocked by default on most Cloudflare-hosted sites (Cloudflare's AI Crawler Control auto-injects a block). Costs you ChatGPT citations.
2. ClaudeBot + anthropic-ai allowed
Anthropic's crawlers. Used by Claude.ai's web tool. Same Cloudflare auto-block problem.
3. PerplexityBot, Google-Extended, Applebot-Extended, CCBot, OAI-SearchBot allowed
Perplexity's crawler, Google's AI training opt-in (separate from Googlebot), Apple's AI training opt-in, Common Crawl (which feeds most open-weights LLMs), and OpenAI's search-specific crawler. Each is its own block decision — passing one doesn't pass the others.
Group 2 — llms.txt (1 signal)
4. llms.txt at site root, with structured business summary
An emerging convention: a Markdown file at yourdomain.com/llms.txt that summarizes your business in AI-readable format. Quick facts, FAQs, services, pricing, links. We've documented the full setup pattern. The audit checks both that the file exists and that it has enough content to be useful (200+ bytes — empty or stub files fail).
Group 3 — Structured data (4 signals)
JSON-LD schema is how you tell AI engines what your business is, who runs it, and what questions you answer. Four schemas matter most for AI citation.
5. Organization (or LocalBusiness)
Establishes you as a real business entity. Includes NAP (name, address, phone), service area, hours, and links to social profiles.
6. Person — named human author/founder
AI engines weight content with a named human author significantly higher than anonymous content. The Person node should be tied to your Organization via worksFor or founder properties.
7. FAQPage
The single biggest schema type for AI citation. FAQs are pre-formatted answers — exactly what generative engines want to quote. Pages with FAQPage schema get cited disproportionately often in AI Overview boxes and Perplexity answers.
8. SpeakableSpecification
Marks paragraphs as voice-friendly. Voice search engines (Siri, Alexa, Google Assistant) prefer pages with explicit speakable hooks for read-aloud answers.
Group 4 — Content structure (3 signals)
How the human-readable content is shaped on the page. AI engines parse structure to decide which paragraphs answer which questions.
9. Visible founder/team bio
Beyond the Person schema, there has to be a visible bio on the page — a name, a photo, a sentence about who they are. AI engines cross-reference schema against rendered content; mismatches downgrade citation likelihood.
10. FAQ-style Q&A blocks in the visible HTML
Even with FAQPage schema, you need the Q&A pattern in the rendered HTML. The structure (heading-then-paragraph alternation, or summary-then-detail) is what AI engines use to extract direct quotes.
11. Comparison content
"Wix is $72/mo with apps; Zorva is $57/mo all-in" is the kind of structured data AI loves to repeat. Pages with explicit "X vs Y" tables or paragraphs get cited disproportionately in buyer-intent queries.
Group 5 — Clarity + freshness (3 signals)
12. Heading hierarchy (H2/H3)
One H1, multiple H2s, H3s nested under H2s. Skipping levels (H1 → H4) or using only one heading level confuses AI engines about which paragraphs answer which questions.
13. Visible content dates
Pages with explicit "Updated October 2026" or datePublished + dateModified in schema get cited more than undated pages. AI engines weight recency heavily.
14. Authoritative outbound links
Pages that cite Wikipedia, government sources, peer-reviewed studies, and recognized industry publications signal that the author has done research. AI engines reward this behavior with their own citations.
What 100/100 vs 50/100 looks like
A site scoring 100 typically has: all 6 AI bots allowed in robots.txt, an llms.txt with ~150 lines of structured business info, JSON-LD @graph with Organization + Person + WebPage + Speakable + FAQPage, a visible founder bio with photo, 5-10 Q&A blocks throughout the homepage, and dates on every published post.
A site scoring 50 typically has: HTTPS, a viewport, maybe a generic Organization schema, and zero of the rest. The site renders fine for humans but gives AI engines almost nothing to work with.
The most common failure modes
- Cloudflare auto-blocked AI bots. You'd never notice unless you read robots.txt. Cloudflare's "AI Crawler Control" feature was on by default for ~12 months in 2024-2025, and a lot of sites still have it enabled. Toggle: Cloudflare dashboard → Security → Settings → AI Audit / Crawler Control → off.
- No llms.txt at all. 90%+ of small business sites don't have one. The free 30-minute fix gets you a measurable citation lift inside 2-3 weeks.
- Schema present but no visible bio. Schema declares a founder; the page has no photo, no name, no bio. AI engines downgrade the mismatch.
- Single H1 page with no H2s. The whole page is one big chunk of paragraphs. AI engines can't extract section-level answers.
FAQ
How is this different from the 38-check SEO scanner?
The main scanner covers AI search as 4 of its 38 checks. This dedicated AI Search Readiness Audit goes 14 layers deep on just the AI-citation angle. Run the scanner first for the broad picture; run this one if AI citation is your specific priority.
Will I actually see citations in ChatGPT after fixing this?
Citation lift typically shows up in 2-6 weeks as AI engines re-crawl. The fastest wins are llms.txt + unblocking the bots in robots.txt — those affect what the AI sees the very next time it crawls.
Does AI search send real traffic?
Per our client data through Q1 2026: AI search referrals are 10-15% of organic traffic on optimized sites and climbing 2-3x per quarter. Perplexity in particular sends high-intent referrals.
Should I block AI bots to "protect my content"?
Mostly no, with one nuance. Blocking GPTBot hides you from ChatGPT users. The publishers who block AI bots are usually large news organizations with paid licensing deals. Small businesses gain more from being cited than they lose from being trained on.
Run yours — no signup, no email gate
Every tool at zorvalabs.com/tools is free, instant, and locks nothing behind an email form. You'll see the same numbers we see when we audit a paying client's site — same checks, same thresholds, same fix recommendations. If you want us to actually ship the fixes, plans start at $57/month, all-in. If you just want the report and a checklist, take it and run.
Curious how your site scores?
Run the free 38-check SEO / AEO / AIO / GEO scan — 25 seconds, instant on-screen results, no email required.