Back to dispatches
§ Dispatch № 152

What Googlebot, GPTBot, and ClaudeBot Actually See on Your Site in 2026

Three crawlers, three very different views of the same page. We logged a month of requests across small sites and mapped exactly what each one renders, ignores, and cites.

Filed
April 26, 2026
Read
9 min
Author
SEOABLE

You can rank on Google and still be invisible to ChatGPT. You can be cited by Claude every week and have Google ignore you for a year. We see it constantly in the audits we run, and the reason is almost always the same: founders ship one HTML payload and assume every crawler reads it the same way. They do not.

We pulled a month of access logs across 40 small sites — typical bootstrapped SaaS, ecommerce, and content domains — and mapped exactly what each major crawler fetched, executed, and came back for. Here is what shows up on your site, what it actually sees, and what to do about the gap.

The three crawlers that matter in 2026

Forget the dozens of bots in your logs. For organic visibility, three families do most of the work:

Crawler Owner Purpose Renders JS Honors robots.txt
Googlebot Google Search index + AI Overviews Yes (deferred, Chromium) Yes
GPTBot / OAI-SearchBot / ChatGPT-User OpenAI Training + SearchGPT + on-demand fetches No Yes (mostly)
ClaudeBot / Claude-User Anthropic Training + Claude on-demand fetches No Yes

Perplexity, Gemini, and Grok matter too, but their behavior tracks closely with the patterns below — Perplexity fetches like ChatGPT-User, Gemini reuses Googlebot's render pipeline, Grok is a no-render text-only crawl. If you handle the three above, the rest mostly fall in line.

Googlebot: the only one that actually renders your JavaScript

Googlebot is still the most patient crawler on the web. It comes in two waves. The first wave fetches your raw HTML and parses it. The second wave hands the page to a headless Chromium instance, executes JavaScript, and re-indexes whatever the rendered DOM produces. The gap between the two waves can be hours, days, or — for low-authority sites — weeks.

What this means in practice:

  • Server-rendered content is indexed in hours. Static HTML, SSR, and properly hydrated SSG routes show up in Search Console within a day for healthy sites.
  • Client-rendered content is indexed eventually, if at all. A React or Vue SPA that fetches its content after mount will get crawled, queued, rendered, and indexed — but the queue is shared, prioritized by site authority, and visibly slower for new domains.
  • Render budget is real. Googlebot will give up if your page takes too long to become interactive, fails on a third-party script, or throws unhandled errors during hydration. We see this constantly on Lovable, V0, and Bolt-generated MVPs that ship without an SSR layer.

The fix is not "render everything server-side." The fix is: anything you want indexed reliably must be in the initial HTML payload. Lazy-loaded copy, tabs that fetch on click, modals that fetch on open — Google may eventually see them, but you are gambling. AI crawlers will never see them.

GPTBot, OAI-SearchBot, ChatGPT-User: three bots, one company, very different jobs

OpenAI runs three crawlers with overlapping names that confuse most operators. The distinction matters because they behave differently and you can block them independently.

  • GPTBot — Bulk training crawler. Walks your site, pulls raw HTML, ignores JavaScript completely. If you are listed in robots.txt as Disallow: / for User-agent: GPTBot, your content does not enter the training set.
  • OAI-SearchBot — The SearchGPT index crawler. This is the one that builds the live search index ChatGPT uses for "search the web" queries. Also no JS execution.
  • ChatGPT-User — Real-time fetches when a ChatGPT user includes a URL or asks a question that triggers a live retrieval. Also no JS, but operates with a much shorter timeout (typically under 5 seconds) and aggressive content extraction.

All three see only your raw HTML. None execute JavaScript. None wait for hydration. This is the central architectural fact AI engineers do not bury in their docs but founders consistently miss.

What this means for your site:

  • If your blog post body is rendered by a client-side framework after page load, no OpenAI bot will see it. Your post effectively does not exist for ChatGPT.
  • If your title tag, H1, and first paragraph are server-rendered but the body is hydrated, you will get the link cited but the citation will be shallow — surface-level summary, no depth, no verbatim quotes.
  • If you serve different content to bots vs. humans (cloaking), OAI-SearchBot will detect it and likely deprioritize you. This is not theoretical; we have watched two client sites recover citation volume after removing a bot-detection layer that was returning a stripped HTML version.

ClaudeBot and Claude-User: similar to OpenAI, with one quirk

Anthropic's crawlers are simpler. ClaudeBot handles bulk crawling for training. Claude-User handles on-demand fetches when a Claude conversation triggers a retrieval (the equivalent of ChatGPT-User). Both fetch raw HTML only.

The quirk worth noting: in our logs, ClaudeBot retries failed fetches more aggressively than GPTBot. A 500 error or timeout will get retried within hours; a 429 rate limit gets a slower backoff but still retries. If you are returning errors to Anthropic crawlers because of an over-aggressive WAF rule (Cloudflare's bot fight mode catches both regularly), you will see the impact in Claude citation volume within a week.

We have also observed Claude-User showing the strongest preference for clean semantic HTML of any AI crawler. Pages with proper <article>, <section>, <h1><h3> hierarchy, and <time> elements get cited more often than visually identical pages built from <div> soup. Whether this is causal or correlated with overall content quality, we cannot prove — but the pattern is consistent enough across sites that we now flag it in audits.

The render-vs-citation gap

Here is the table that matters. For a typical client-side rendered SPA (no SSR, content fetched after mount), this is what each crawler sees:

Element Googlebot (wave 1) Googlebot (wave 2) GPTBot ClaudeBot
<title> Yes Yes Yes Yes
Meta description Yes Yes Yes Yes
Body copy (rendered after mount) No Eventually Never Never
Schema.org JSON-LD (in initial HTML) Yes Yes Yes Yes
Schema.org JSON-LD (injected by JS) No Eventually Never Never
Internal links (rendered after mount) No Eventually Never Never
Images with alt text Depends Yes Yes if in initial HTML Yes if in initial HTML

Read that table again. If your site is a Vite/React SPA without SSR, you are publishing a title and a meta description for the AI engines and nothing else. We have audited bootstrapped sites with 80+ blog posts that were essentially blank to ChatGPT and Claude.

What to actually do this week

Five concrete moves, in order of impact:

  1. Curl your own site as each bot and look at the raw HTML. Run curl -A "GPTBot" https://yoursite.com/blog/your-best-post and read what comes back. If your article body is missing, that is what ChatGPT sees too. Do this once per template (homepage, blog post, pricing page, product page). It takes ten minutes.

  2. Move all critical content into the initial HTML payload. For Next.js, that means default to Server Components or generateStaticParams. For Astro, you are already there. For Vite/React without SSR, this is a bigger lift — but at minimum, prerender your top 20 pages with something like vite-plugin-prerender so the HTML you ship to bots actually contains the content.

  3. Audit your robots.txt for accidental AI blocks. Several site templates from 2024 shipped with Disallow: / rules for GPTBot and ClaudeBot baked in — copied from a Hacker News thread when "block the AI scrapers" was the consensus take. That consensus has flipped. If you want AI citations, you need to let these crawlers in.

  4. Check your WAF rules. Cloudflare's bot fight mode, Vercel's bot protection, and most managed WAFs default to challenging or blocking AI crawlers. Look at your last 30 days of logs filtered to User-Agent containing GPTBot, ClaudeBot, PerplexityBot. If you see 429s, 403s, or 503s, fix that before you do anything else — you are paying for content nobody can read.

  5. Add JSON-LD Article schema to your blog template, in the initial HTML. Not injected by JS. Not loaded in a footer script. In the <head> of the server-rendered response. AI crawlers parse JSON-LD eagerly and use it for citation context. This is the single highest-leverage change for AI visibility we measure across audits.

How to verify each fix

Each move above has a one-shot verification step. Run them after you ship:

  • HTML payload check: curl -A "Mozilla/5.0 (compatible; GPTBot/1.0; +https://openai.com/gptbot)" yoursite.com/some-post | grep -c "your unique phrase" — if the count is 0, the bot does not see that phrase.
  • Robots.txt check: Visit yoursite.com/robots.txt directly. Confirm there is no Disallow: / for GPTBot, ClaudeBot, OAI-SearchBot, PerplexityBot, or CCBot.
  • WAF check: Search your access logs for status:429 OR status:403 filtered by AI bot user-agents over the last 7 days. Should be near zero.
  • Schema check: Use Google's Rich Results Test on your blog post URL. It runs from a real Googlebot context and shows you the parsed JSON-LD as Google sees it.
  • Citation check: Ask Claude or ChatGPT directly: "What does [yourdomain.com] say about [your topic]?" If the answer is generic or wrong, your content is not in their index. Wait two weeks after fixing the above, then check again.

The pattern we keep seeing

The founders winning AI citations in 2026 are not the ones writing the most content. They are the ones whose HTML payload is the most legible. A 40-post blog on a properly server-rendered site outperforms a 400-post blog on a client-rendered SPA. Every audit we run confirms this.

The good news: the fix is mechanical. It is not a strategy problem. It is a build configuration problem. And once you fix it, every existing post you have already written becomes visible to AI engines retroactively, the next time they crawl.

If you want a 60-second read on what your site currently looks like to each of these crawlers — including a render-vs-raw diff and a robots.txt audit — run a Seoable audit. It is the same check we run manually for clients, automated.

Ship clean HTML. The bots will read it.

§ The Dispatch

Get the next
dispatch on Monday.

One email per week with the most important SEO and AEO moves for founders. Unsubscribe in one click.

Free · Weekly · Unsubscribe anytime