May 9, 2026 · 6 min read · Crawlantix

WordPress Honeypot Detection: How to Catch AI Crawlers That Ignore robots.txt

Honeypot traps catch bots that bypass robots.txt by placing invisible links only crawlers will follow. Here's how the technique works and how to set it up on WordPress.

Not all AI bots respect robots.txt. Of the 60+ AI crawlers active in 2026, roughly 13% have been observed ignoring robots.txt directives — and an unknown number of disguised crawlers never check it at all. When a crawler ignores your disallow rules, you need a way to detect it — and that’s exactly what honeypot traps are designed to do.

Honeypot detection is one of the oldest techniques in web security, adapted here for the specific problem of AI crawler identification. The principle is simple, the implementation is elegant, and the false positive rate is effectively zero.

How Honeypot Detection Works

A honeypot trap works in three steps:

Step 1: Plant the bait. A hidden link is injected into your site’s HTML. This link points to a path that doesn’t appear in your navigation, sitemap, or any visible content. It’s rendered invisible to human visitors using CSS (e.g., display: none, off-screen positioning, or zero-opacity styling).

Step 2: Wait for bots. Human visitors never see the link, so they never click it. But bots that parse raw HTML will find it and follow it — because bots don’t render CSS. To a crawler reading your page’s source code, the hidden link looks like any other link on the page.

Step 3: Catch the crawler. When a request arrives at the honeypot path, you know with certainty that it came from an automated crawler, not a human. No legitimate user would visit a path that’s invisible on the page. The bot has revealed itself.

Why It Works So Well

The strength of honeypot detection is its zero false positive rate. Unlike user-agent matching (which relies on bots honestly identifying themselves) or rate limiting (which can accidentally catch fast-scrolling humans), honeypot traps only trigger when something follows an invisible link. Humans can’t follow a link they can’t see.

This makes it particularly effective against AI bots that:

Spoof their user-agent — some crawlers disguise themselves as regular browsers
Ignore robots.txt — bots that don’t check your disallow rules will happily follow hidden links
Use residential proxies — IP-based blocking is ineffective against bots that rotate through residential IP ranges

The honeypot doesn’t care what the bot calls itself or where it connects from. It catches behavior, not identity.

Setting Up Honeypot Detection on WordPress

You can implement a basic honeypot manually by creating a hidden page and logging requests to it. But maintaining this yourself means building the link injection, request logging, bot identification, and response logic from scratch.

AI Bot Tracker handles all of this automatically:

Install and activate the plugin from the WordPress plugin directory
The honeypot deploys immediately — a hidden link is injected into your pages pointing to a randomly generated path (e.g., /_ai-honeypot/f8a3b1/)
Every bot that follows the link is logged with its user-agent, IP hash, timestamp, and the URL it was crawling when it found the honeypot

No configuration is needed. The honeypot is active the moment you activate the plugin.

What Happens After a Bot Is Caught

On the free Monitor tier, honeypot hits are logged and displayed in your dashboard. You can see which bots are following hidden links, how often, and which pages they were crawling.

On the Protect tier and above, you can configure what happens when a bot hits the honeypot. Each of these response strategies serves a different purpose:

Block 403 — immediately deny the request
Tarpit — send data at an extremely slow drip rate, tying up the crawler’s connection and wasting its resources
Rate Limit 429 — tell the bot to slow down with a Retry-After header
Decoy Content — serve fake content so the crawler thinks it got real data
Shadowban — return a 200 OK with empty or minimal content so the bot doesn’t know it’s been detected

You can also enable auto-blocking, which automatically applies your chosen response strategy to any bot that hits the honeypot. Unknown bots are blocked on the first hit. Known bots get a configurable threshold (default 3 hits) before being blocked — just in case a legitimate crawler accidentally follows the link once.

Advanced: Custom Honeypot Paths

The default auto-generated honeypot path works well for most sites. But on the Protect tier and above, you can configure up to 5 custom honeypot paths. This is useful for:

Testing detection coverage — place honeypots in different sections of your site to see where bots are most active
Targeting specific behavior — create paths that mimic common content patterns (e.g., /wp-content/premium-article/) to attract crawlers looking for high-value content
Multiple detection layers — more honeypot paths means more opportunities to catch sneaky crawlers

Which Bots Get Caught?

Based on aggregated honeypot data from AI Bot Tracker installations, the most common bots caught by honeypot traps fall into three categories:

Disguised crawlers — bots using standard browser user-agent strings (Chrome, Firefox) that would be invisible to user-agent detection alone. These are the highest-value catches because no other detection method can reliably identify them.
Non-compliant known bots — crawlers like Bytespider that sometimes ignore robots.txt Disallow rules. The honeypot provides concrete proof of non-compliance.
Unknown scrapers — new bots that haven’t been added to any detection database yet. Honeypots catch them by behavior, regardless of identity.

The honeypot is particularly valuable for catching the first category. As more AI companies train models using data collected through disguised crawlers and residential proxy networks, behavioral detection becomes the only reliable defense against these hidden crawlers.

Honeypot + robots.txt: Complementary Tools

Honeypot detection doesn’t replace robots.txt — it complements it. Think of it as an enforcement layer within a broader bot management strategy:

robots.txt tells bots what they shouldn’t crawl (honor system)
ai.txt and llms.txt declare nuanced AI usage policies (emerging standards)
Honeypot traps catch bots that crawl anyway (enforcement)
Response strategies deal with caught bots (action)

Together, these layers give you policy declaration, violation detection, and enforcement. Use robots.txt to set your boundaries. Use honeypot detection to find out who’s crossing them. And use response strategies to decide what happens to violators — from logging to tarpitting to shadowbanning.

For the full details on configuring honeypot detection, see the documentation.

Try AI Bot Tracker — Free on WordPress.org

Detect, monitor, and respond to AI crawlers on your WordPress site. Full bot detection is free forever.

Download Free Plugin