May 11, 2026 · 6 min read · Crawlantix

How to Control Which AI Bots Can Crawl Your WordPress Site

Your content, your rules. Learn four practical methods to decide which AI crawlers get access to your WordPress site — from robots.txt to honeypot detection.

AI companies are sending crawlers to your site right now. GPTBot, ClaudeBot, Bytespider, CCBot — there are over 60 distinct AI bot user agents active in 2026, and most WordPress site owners have no idea which ones are visiting or how often.

The good news: you can control this. The question is which method actually works — and how to decide which bots to allow and which to block.

Not All AI Bots Are Equal

Before choosing a blocking method, decide which crawlers you actually want to stop. Not every AI bot deserves the same treatment.

Bots you might want to allow:

PerplexityBot — if your content appears in Perplexity’s AI search results, that’s free referral traffic with citations
Applebot-Extended — powers Apple Intelligence features that can surface your content to iPhone and Mac users
Google-Extended — used for Gemini features, and blocking it might affect future Google AI integrations

Bots you probably want to block:

Bytespider — ByteDance’s aggressive crawler, known for high-volume requests and spotty robots.txt compliance
CCBot — feeds the Common Crawl corpus that dozens of AI companies use for training
Meta-ExternalAgent — collects training data for Meta’s Llama models

Bots you definitely want to catch:

Disguised crawlers — bots using standard Chrome or Firefox user-agent strings to evade detection. These can only be caught through behavioral methods like honeypot traps.

The right approach isn’t “block everything” or “allow everything” — it’s making per-bot decisions based on your site’s goals and the value exchange each crawler offers.

Method 1: robots.txt

The first thing most site owners try is adding directives to their robots.txt file:

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Bytespider
Disallow: /

This is a reasonable starting point, but it has a fundamental limitation: robots.txt is voluntary. It’s a suggestion, not an instruction. Crawlers that respect it will comply. Crawlers that don’t will ignore it completely and crawl your content anyway.

Based on detection data from AI Bot Tracker installations, roughly 13% of known AI bot user agents have been observed disregarding robots.txt directives. For the bots that do respect it, you’re relying on each AI company’s goodwill — and their interpretation of your rules.

robots.txt also requires you to know the user-agent string of every bot you want to block. New AI crawlers appear regularly, and your blocklist is always playing catch-up.

Method 2: .htaccess Rules

On Apache servers, you can block bots at the server level using .htaccess:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (GPTBot|ClaudeBot|Bytespider) [NC]
RewriteRule .* - [F,L]

This is more enforceable than robots.txt because it doesn’t rely on the bot’s cooperation — the server rejects the request before WordPress even loads. But it has its own downsides:

It only works on Apache (not Nginx or LiteSpeed without equivalent config)
You still need to maintain a list of user-agent strings manually
Sophisticated bots can rotate or spoof their user-agent
You get no analytics — blocked requests are silently dropped
Mistakes in .htaccess syntax can break your entire site

Method 3: Cloudflare or CDN Rules

If your site sits behind Cloudflare, you can create firewall rules that block traffic based on user-agent, ASN (network), or other request properties. This is effective and handles traffic before it even reaches your server.

The downside is complexity. Cloudflare rules require careful configuration, and overly aggressive rules can accidentally block legitimate traffic. You also need a Cloudflare account and DNS configuration, which may be more infrastructure than a typical WordPress site owner wants to manage.

Method 4: WordPress Plugin Detection

A WordPress-native approach gives you detection, analytics, and response options without leaving your admin dashboard.

AI Bot Tracker takes this approach. It installs as a standard WordPress plugin and starts detecting AI crawlers immediately. Instead of maintaining blocklists manually, it identifies over 60 known AI bot user agents automatically and logs every visit with the bot identity, timestamp, and page requested.

What makes it different from the methods above is the honeypot detection layer. AI Bot Tracker injects a hidden link into your pages — invisible to human visitors but visible to bots that parse raw HTML. When a bot follows this link, it proves it’s crawling beyond what robots.txt allows, and you can respond accordingly.

Response options on paid tiers include blocking, tarpitting, rate limiting, serving decoy content, or shadowbanning — each serving a different strategic purpose depending on the bot’s behavior and your goals.

Method 5: Policy Standards (ai.txt and llms.txt)

Two emerging web standards let you declare AI-specific policies beyond what robots.txt can express:

ai.txt declares per-bot permissions for training, summarization, and attribution
llms.txt provides a curated content guide so AI systems know which pages matter most

These are complementary to blocking methods — they communicate your preferences to AI systems that check for them. Adoption is growing but not universal, so treat them as a policy layer on top of enforcement, not a replacement for it.

Which Method Should You Use?

In practice, most sites benefit from layering these approaches:

robots.txt as a baseline — it handles the well-behaved bots
ai.txt and llms.txt as policy declarations — for bots that check these emerging standards
A detection plugin for visibility — you can’t control what you can’t see
Active response for bots that ignore robots.txt — honeypot detection catches them, response strategies deal with them

The order matters. Start by understanding what’s actually hitting your site before deciding how to respond. Many site owners are surprised to discover how many AI bots are already crawling their content — and which ones are ignoring their robots.txt entirely.

Monitoring After Setup

Setting up bot controls isn’t a one-time task. The AI crawler landscape changes constantly — new bots appear, existing ones change behavior, and some companies launch secondary crawlers under different user-agent strings.

After configuring your initial response strategies:

Check your dashboard weekly for new bots or changes in crawl patterns
Watch honeypot hits — a spike usually means a new disguised crawler has appeared
Review your allowed list — bots you chose to allow might change their crawl behavior or terms of service
Monitor bandwidth impact — crawl volumes fluctuate, especially around major AI model training cycles

AI Bot Tracker’s dashboard gives you this ongoing visibility. The free Monitor tier is enough to start — upgrade to Protect when you’re ready to take action against the bots you’ve identified. See the complete setup guide for detailed configuration instructions.

Try AI Bot Tracker — Free on WordPress.org

Detect, monitor, and respond to AI crawlers on your WordPress site. Full bot detection is free forever.

Download Free Plugin