{
  "@context": "https://schema.org",
  "@type": "BlogPosting",
  "headline": "I Built a 60,000-Page AI Website for $10: GPTBot Crawled It 30,000+ Times in 12 Hours",
  "description": "Let me be clear upfront: this website is designed purely as an experiment. I wanted to observe what real traffic looks like on a large-scale programmatic SEO site and more importantly, how AI crawlers behave in the wild.",
  "datePublished": "2026-03-04T00:00:00.000Z",
  "dateModified": "2026-03-04T00:00:00.000Z",
  "url": "https://metehan.ai/blog/i-built-a-60-000-page-ai-website-for-10-gptbot-crawled-it-30-000-times-in-12-hours/",
  "category": "experiment",
  "tags": [],
  "image": "/wp-content/uploads/39k-requests.png",
  "wordCount": 1654,
  "readTime": "8 min",
  "articleBody": "A wild experiment.\n\n## Why I Built This\n\nLet me be clear upfront: **this website is designed purely as an experiment.** I wanted to observe what real traffic looks like on a large-scale programmatic SEO site and more importantly, how AI crawlers behave in the wild.\n\nThis is not a guide on how to build a sustainable business with AI-generated content. If you create programmatic SEO pages only for traffic, one of two things will happen:\n\n1. **Your traffic will tank within weeks** due to deindexing, Google's systems are increasingly good at detecting thin, templated content at scale\n2. **You'll see initial traction, then a slow bleed over a few months** as manual reviews or algorithm updates catch up\n\nI've seen this pattern play out repeatedly across the industry. The economics of generating pages are now so cheap that the barrier is essentially zero which is exactly why Google has gotten aggressive about it.\n\nSo why build it? Because **the interesting part isn't the SEO. It's the bots.**\n\nI did not expect GPTBot to crawl a brand-new, zero-backlink domain at the scale it did. That was the real discovery.\n\n## The Experiment\n\nI built [StateGlobe.com,](https://stateglobe.com) a Statista-style statistics website covering digital marketing, SEO, content marketing, and web technology across 200 countries. Every single page was generated by AI.\n\n* **60,000 pages** generated in under 30 minutes\n* **Total API cost: less than $10**\n* **Model used: `gpt-4.1-nano`** via OpenAI's Chat Completions API\n* **Hosted on Cloudflare Workers + D1** (serverless, edge-rendered)\n\nThe entire project is open source.\n\n## The Tech Stack\n\n### Content Generation Pipeline\n\nThe pipeline is straightforward:\n\n1. **Taxonomy**: 300 statistical topics × 200 countries = 60,000 unique combinations\n2. **Generation**: A Node.js script fires real-time API calls to `gpt-4.1-nano` with controlled concurrency (50 parallel requests) and a token bucket rate limiter\n3. **Output**: Each page gets a title, meta description, 5 key statistics, 3 analysis paragraphs, and 2 FAQ items. All as structured JSON\n4. **Import**: Results are bulk-imported into Cloudflare D1 (serverless SQLite)\n\nThe prompt asks the model to produce 2026 projections based on industry trends. Each response costs fractions of a cent, `gpt-4.1-nano` with `max_tokens: 700` and `response_format: json_object`.\n\n### Cloudflare Architecture\n\n* **Cloudflare Workers**: Edge-rendered HTML, no build step, no static files. Every page is assembled on-demand from D1 data\n* **Cloudflare D1**: Serverless SQLite storing all 60,000 pages and visit analytics\n* **Dynamic OG Images**: Generated on-the-fly as PNGs using `@resvg/resvg-wasm` with the Inter font loaded from a CDN. No pre-generated images, no storage costs\n* **Clean URLs**: `/brazil/seo-budget-allocation-statistics` No `.html` extensions, proper 404 headers\n* **SEO**: Structured data (Article, FAQPage, BreadcrumbList), XML sitemaps (paginated), canonical URLs, internal linking (same topic across countries, same country across topics)\n\nTotal hosting cost: effectively free on Cloudflare's free tier.\n\n## What Happened Next: The Bots Arrived\n\nWithin minutes of deploying, **GPTBot** started crawling. Hard.\n\n### First 12 Hours\n\n* **29,000+ requests from GPTBot alone**\n* GPTBot was hitting the site at roughly **1 request per second**, systematically crawling through pages\n* OAI-SearchBot and ChatGPT-User also showed up\n* GoogleOther appeared with 60+ requests\n* Googlebot, AhrefsBot, and PerplexityBot followed\n\n<iframe width=\"560\" height=\"315\" src=\"https://www.youtube.com/embed/-wBdQhq-DmI?si=9xYNM45kM0p8949p\" title=\"YouTube video player\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen></iframe>\n\n### 3-Hour Snapshot after Server Side Tracking Enabled\n\n| Bot           | Requests |\n| ------------- | -------- |\n| GPTBot        | 5,200+   |\n| GoogleOther   | 140+     |\n| OAI-SearchBot | 94       |\n| Googlebot     | 11       |\n| AhrefsBot     | 7        |\n| PerplexityBot | 2        |\n| ChatGPT-User  | 1        |\n\nBy the time server-side tracking was fully operational, Cloudflare's own analytics showed **37,000+ total requests** to the worker.\n\n![39k requests by almost GPTBot](/wp-content/uploads/39k-requests.png)\n\nGPTBot was by far the most aggressive crawler, more active than Googlebot by orders of magnitude.\n\n### The Part I Didn't Expect\n\nI've launched plenty of sites before. I expected Googlebot to show up, maybe some SEO tool crawlers. That's normal.\n\nWhat I did **not** expect was OpenAI's GPTBot hitting a brand-new domain — with zero backlinks, zero social shares, no Search Console submission, at **1 request per second** within minutes of deployment. It found the site through the XML sitemap and just started consuming everything.\n\nThis raises serious questions. If GPTBot is this aggressive on fresh domains, how much of the open web is it processing daily? And what does it mean for site owners who haven't explicitly blocked it in `robots.txt`?\n\nFor context: Googlebot made 11 requests in the same period that GPTBot made 5,200+. That's a **470x difference** in crawl intensity.\n\n## Building Public Analytics\n\nI wanted anyone to see what was happening in real time, so I built a **public analytics dashboard** at [stateglobe.com/analytics](https://stateglobe.com/analytics).\n\n### Server-Side Tracking\n\nInitially, I used a client-side beacon (`navigator.sendBeacon`). But bots don't execute JavaScript, so I was missing all bot traffic. The fix was server-side tracking:\n\n* Every request to the Worker records `page_slug`, `user_agent`, `country` (from `cf-ipcountry`), and `is_bot` directly into D1\n* Bot detection runs against a pattern list (GPTBot, Googlebot, AhrefsBot, etc.)\n* `ctx.waitUntil()` ensures the D1 write completes without blocking the response\n* The client-side beacon was removed entirely — one clean tracking path\n\n### The Dashboard Shows\n\n* **Human vs. Bot traffic** with separate summary cards (today, this week, all time)\n* **Daily traffic chart** (inline SVG line chart, last 30 days, human vs. bot)\n* **Top pages**, **Top bots**, **Top human user agents**\n* **Recent visits** with bot badges, paginated\n* Pre-tracking estimates included in totals with clear notes\n\n## IP Verification: Catching Spoofed Bots\n\nUser agent strings can be spoofed by anyone. A `curl` request with `-A \"GPTBot\"` would be counted as a real bot visit.\n\nSo I implemented **IP verification for OpenAI bots**:\n\n1. Downloaded the official IP ranges from [openai.com/gptbot.json](https://openai.com/gptbot.json), [searchbot.json](https://openai.com/searchbot.json), and [chatgpt-user.json](https://openai.com/chatgpt-user.json)\n2. Built a CIDR matching engine directly in the Worker, parses IP ranges into bitmasks for efficient lookup\n3. When a request claims to be GPTBot, OAI-SearchBot, or ChatGPT-User, the source IP (from Cloudflare's `cf-connecting-ip` header) is checked against the official ranges\n4. **If the IP doesn't match -> classified as human** (potential spoofer)\n\nThis means the bot counts on the analytics page are verified — only requests from OpenAI's actual infrastructure count as OpenAI bot traffic.\n\nI verified this works: a `curl` request from Turkey with the GPTBot user agent correctly shows up as a human visit, not a bot.\n\n## Hiding Content from AI Crawlers\n\nHere's an interesting twist. I added an \"About This Experiment\" box on the homepage explaining that the site is AI-generated. But I didn't want GPTBot to read it and potentially use it to discount the content.\n\nThe solution: **render it client-side only**.\n\nThe HTML source contains an empty `<div>` and a `<script>` tag with the content Base64-encoded. Browsers decode and display it normally. Bots that don't execute JavaScript see nothing — and even bots that parse JS source code can't read Base64 without decoding it.\n\nVerified with `curl` as GPTBot: zero matches for \"experiment\", \"AI Slop\", or \"Metehan\" in the response.\n\n## Cost Breakdown\n\n| Item                                 | Cost                       |\n| ------------------------------------ | -------------------------- |\n| OpenAI API (gpt-4.1-nano, 60k pages) | ~$10                       |\n| Cloudflare Workers                   | $0 (free tier)             |\n| Cloudflare D1                        | $0 (free tier)             |\n| Domain (stateglobe.com)              | ~$10/year                  |\n| OG image generation                  | $0 (on-demand, no storage) |\n| **Total**                            | **~$20**                   |\n\nThe entire 60,000-page website costs about $20 to create and run.\n\n## A Warning About Programmatic SEO\n\nI want to be honest about this: **don't do what I did and expect to rank.**\n\nThe economics are seductive. 60,000 pages for $10, hosted for free. But Google has seen this playbook a thousand times. Here's what typically happens:\n\n1. **Week 1-2**: Pages get indexed, you see impressions in Search Console\n2. **Week 3-4**: Traffic trickles in, you feel validated\n3. **Month 2-3**: Google's quality systems catch up. Pages start getting deindexed. Traffic drops off a cliff\n4. **Month 4+**: Most pages are removed from the index entirely\n\nI've watched this cycle happen to dozens of programmatic SEO projects. The content quality bar keeps rising, and mass-generated pages, even well-structured ones with proper schema, sitemaps, and internal linking — simply don't pass muster anymore.\n\n**This site will almost certainly get deindexed.** That's not a failure. It's the expected outcome. The experiment was never about ranking. It was about observing the ecosystem.\n\n![](/wp-content/uploads/screenshot-at-mar-04-12-44-07.png)\n\n## Key Takeaways\n\n1. **GPTBot is more aggressive than Googlebot.** On a fresh domain with zero authority, GPTBot made 470x more requests than Googlebot in the first 24 hours. It found and crawled the site almost immediately with no prompting.\n2. **Bot traffic dwarfs human traffic.** Bots made up ~98% of all requests. If you're not tracking server-side with user agent analysis, you're blind to what's actually happening on your site.\n3. **User agent verification matters.** Without IP verification, anyone can spoof bot traffic. OpenAI publishes their IP ranges, use them.\n4. **The barrier to creating massive websites is zero.** 60,000 pages, $10, 30 minutes. This is simultaneously impressive and terrifying for the health of the open web.\n5. **Programmatic SEO for traffic alone is a dead end.** Google will catch it. Build real value or don't bother.\n6. **Cloudflare Workers + D1 is a solid stack** for this kind of project. Edge rendering, serverless SQLite, free tier that handles thousands of requests. No servers to manage.\n\n## Follow the Experiment\n\n* **Live site**: [stateglobe.com](https://stateglobe.com)\n* **Public analytics** (real-time bot & human traffic): [stateglobe.com/analytics](https://stateglobe.com/analytics)\n* **LinkedIn**: [linkedin.com/in/metehanyesilyurt](https://www.linkedin.com/in/metehanyesilyurt)\n* **X (Twitter)**: [x.com/metehan777](https://x.com/metehan777)\n\nThe site is still live and GPTBot is still crawling. Check the analytics page to see the latest numbers.\n\n*This website is intentionally another piece of AI Slop. Created as an experiment to observe how AI crawlers interact with programmatically generated content at scale. If you're thinking about doing this for real SEO: don't. Build something genuinely useful instead.*",
  "author": {
    "@type": "Person",
    "name": "Metehan Yesilyurt",
    "url": "https://metehan.ai",
    "sameAs": [
      "https://x.com/metehan777",
      "https://www.linkedin.com/in/metehanyesilyurt",
      "https://github.com/metehan777"
    ]
  },
  "publisher": {
    "@type": "Person",
    "name": "Metehan Yesilyurt",
    "url": "https://metehan.ai"
  },
  "alternateFormat": {
    "html": "https://metehan.ai/blog/i-built-a-60-000-page-ai-website-for-10-gptbot-crawled-it-30-000-times-in-12-hours/",
    "json": "https://metehan.ai/api/post/i-built-a-60-000-page-ai-website-for-10-gptbot-crawled-it-30-000-times-in-12-hours.json",
    "rss": "https://metehan.ai/rss.xml"
  }
}