May 6, 2026 · 10 min read · Crawlantix

Tarpit, Block, or Shadowban: 6 Ways to Respond to AI Crawlers

Not every AI bot deserves the same response. Here's a strategy guide to the 6 response types available for dealing with AI web crawlers on WordPress.

Once you know which AI bots are crawling your WordPress site, the next question is: what do you do about them?

The AI web crawler landscape has changed dramatically. In 2024, most sites dealt with a handful of identifiable bots like GPTBot and ClaudeBot. By 2026, there are over 60 distinct AI bot user agents actively crawling the web — plus an unknown number of disguised crawlers using standard browser user-agent strings.

Blocking every bot with a 403 is the obvious answer, but it’s not always the best one. Different crawlers have different intents, and the optimal response depends on what you’re trying to achieve. Do you want to deny access entirely? Waste the crawler’s resources? Poison its training data? Or just slow it down?

AI Bot Tracker offers six distinct response strategies, each serving a different purpose. Here’s when and why to use each one.

Start With robots.txt

Before choosing a response strategy, make sure your robots.txt file is configured correctly. robots.txt is the baseline — it tells well-behaved web crawlers which parts of your site they’re allowed to access.

For compliant bots like GPTBot, ClaudeBot, and Google-Extended, a robots.txt Disallow rule is often sufficient. The problem is that robots.txt doesn’t stop all AI bots. Some crawlers ignore it, and disguised bots don’t even check it. That’s where active response strategies come in — they handle the bots that robots.txt can’t reach.

Think of it as layered defense: robots.txt handles the compliant crawlers, and response strategies handle everything else.

The 6 Response Strategies

1. Log Only (Free — Monitor Tier)

What it does: Records the bot visit silently. The web crawler receives your normal page content.

When to use it: When you want visibility without interference. Log Only is the right choice when you’re still learning about your bot traffic and don’t want to make blocking decisions yet.

It’s also the right strategy for bots you deliberately want to allow. If you want your content to appear in AI search results from Perplexity or to be cited in Claude’s responses, letting those bots crawl normally — while logging the visits — keeps you informed without disrupting the relationship.

Downside: The bot gets your content. If your goal is to prevent AI training data collection, Log Only doesn’t accomplish that.

Server impact: None. Your site responds normally.

2. Block 403 (Protect Tier)

What it does: Returns an HTTP 403 Forbidden response. The web crawler receives no content.

When to use it: When you want a clear, unambiguous denial. A 403 tells the bot “you’re not welcome here.” Well-behaved crawlers will typically stop retrying after receiving repeated 403 responses.

This is the most straightforward response strategy and the right default for bots you want to keep out — especially aggressive crawlers like Bytespider that may consume significant bandwidth if left unchecked.

Downside: The bot knows it’s been blocked. Sophisticated crawlers may switch to a different user-agent or IP address and try again. The 403 response is also fast — the bot can quickly check hundreds of URLs and move on, which doesn’t cost it much.

Server impact: Minimal. A 403 response is a few hundred bytes — far less than serving a full page.

3. Tarpit (Protect Tier)

What it does: Accepts the connection but sends data back at an extremely slow drip rate. The response takes minutes or hours to complete, keeping the bot’s connection tied up.

When to use it: When you want to waste the web crawler’s resources. Tarpitting is the digital equivalent of putting a telemarketer on hold. The bot is stuck waiting for a response that never meaningfully arrives, which ties up its connections and slows its overall crawling speed.

This is particularly effective against aggressive crawlers that make hundreds or thousands of requests per day. Each tarpitted request costs the bot time and connection resources. If a bot is making 500 requests/day and each one gets tarpitted for 10 minutes, that’s 83 hours of wasted connection time per day — a meaningful operational cost for the crawler operator.

Downside: Tarpitting also uses a connection on your server. On shared hosting with limited PHP workers, tarpitting many requests simultaneously could affect your site’s performance for real visitors. On most modern WordPress hosting setups with adequate resources, this isn’t a practical concern.

Server impact: Moderate. Each tarpitted connection holds a PHP worker. Monitor your concurrent tarpits if you’re on a resource-constrained hosting plan.

4. Rate Limit 429 (Protect Tier)

What it does: Returns an HTTP 429 Too Many Requests response with a Retry-After header suggesting when the bot should come back.

When to use it: When you don’t want to block the bot entirely but need to control how fast it crawls. Rate limiting is a diplomatic response — it says “slow down” rather than “go away.”

This works well for bots operated by companies you have a relationship with or bots that provide some value to your site. You’re setting boundaries without burning bridges — reducing the bandwidth impact while maintaining access.

Downside: The bot might not respect the Retry-After header. The 429 status code is a standard HTTP signal, but compliance is voluntary — just like robots.txt.

Server impact: Minimal. Similar to a 403 — a small response with no page content.

5. Decoy Content (Protect Tier)

What it does: Returns an HTTP 200 with fake, AI-generated placeholder content instead of your real page. The bot thinks it received legitimate content, but everything it collected is fabricated.

When to use it: When your primary concern is protecting the value of your content from being used in AI training datasets. Decoy content poisons the well — the web crawler collects data, but that data is worthless or actively misleading.

This is the most creative response option and is particularly appealing to publishers, journalists, and content creators who are concerned about their original work being used to train models without permission or compensation.

Downside: You’re serving content (even if it’s fake), which uses bandwidth and server resources. The decoy content needs to be plausible enough that the bot doesn’t immediately discard it, which means generating it has a computational cost.

Server impact: Similar to serving a normal page. More resource-intensive than a 403 or 429.

6. Shadowban (Protect Tier)

What it does: Returns an HTTP 200 OK response with an empty or minimal body. The bot’s request appears to succeed, but it gets nothing useful.

When to use it: When you want stealth. The bot doesn’t know it’s been detected. From its perspective, the request succeeded — it just happened to return very little content. This is useful against crawlers that adapt their behavior when they detect they’re being blocked.

Shadowbanning is also the lowest-conflict option. There’s no 403 to trigger retry logic, no tarpit consuming your server resources, and no decoy content to generate. It’s quiet, efficient, and hard for the bot to detect.

Downside: Some bots may flag pages with minimal content and re-crawl them later, hoping for a different result. If the crawler is persistent, you may want to upgrade to a more aggressive strategy.

Server impact: Minimal. An empty 200 response is lightweight.

Comparing Server Impact

Each strategy has different implications for your server resources. Here’s how they compare:

Strategy	Response Size	Connection Duration	CPU Usage	Best For
Log Only	Full page	Normal	Normal	Monitoring, allowed bots
Block 403	~200 bytes	Instant	Minimal	Clear denial
Tarpit	Drip feed	Minutes to hours	Low CPU, holds connection	Aggressive crawlers
Rate Limit 429	~200 bytes	Instant	Minimal	Diplomatic throttling
Decoy Content	Full page (fake)	Normal	Moderate	Content protection
Shadowban	~0 bytes	Instant	Minimal	Stealth denial

For most WordPress sites, the server impact of any strategy is negligible compared to normal traffic. The exception is tarpitting — if you’re tarpitting dozens of concurrent connections on shared hosting, monitor your PHP worker availability.

Decision Framework

Here’s a practical framework for choosing a response strategy based on your situation and goals:

Situation	Recommended Strategy
Still learning about your bot traffic	Log Only
Want to allow a specific bot (Perplexity, Applebot)	Log Only
Want to block a known bot definitively	Block 403
Dealing with an aggressive, high-volume crawler	Tarpit
Want to slow a bot down without blocking it	Rate Limit 429
Concerned about content being used for AI training	Decoy Content
Want to deny access without the bot knowing	Shadowban
Bot caught by honeypot trap	Tarpit or Block 403

In practice, most sites use 2–3 strategies: Log Only for bots they want to allow, Block 403 or Shadowban as the default for bots they want to deny, and Tarpit for the especially aggressive ones.

Combining Strategies for Layered Defense

The most effective approach uses multiple strategies together:

robots.txt as the first layer — blocks compliant AI crawlers like GPTBot and ClaudeBot.
Per-bot response rules as the second layer — assigns specific strategies (block, tarpit, shadowban) to known AI bots based on their behavior.
Honeypot detection as the third layer — catches disguised crawlers that evade user-agent detection. When a bot trips the honeypot, it can be automatically blocked or tarpitted.
ai.txt and llms.txt as a policy declaration — states your preferences for AI systems that check these emerging standards.

This layered approach handles both the bots you can identify and the ones trying to hide.

Setting Up Response Strategies in WordPress

In AI Bot Tracker, you set a default response strategy that applies to all detected AI bots, then override it for specific bots or IP addresses using per-bot rules. This gives you the flexibility to treat each web crawler differently based on your assessment of its intent and value.

For example, you might configure:

Default: Shadowban (quiet denial for any AI bot)
PerplexityBot: Log Only (you want Perplexity citations)
Bytespider: Tarpit (penalize aggressive crawling)
Honeypot-triggered bots: Block 403 (caught disguised crawlers)

The free Monitor tier includes Log Only. All six strategies are available on the Protect tier ($69/year) and above. For complete configuration details, see the response strategies documentation.

How to Tell If Your Strategy Is Working

After setting up response strategies, monitor your AI Bot Tracker dashboard for changes:

Blocked bots should show declining visit counts over time as they stop retrying.
Tarpitted bots may maintain visit counts but show reduced crawl volume per session.
Shadowbanned bots may continue visiting but collect no useful content.
New unknown bots appearing in your logs may be previously blocked bots trying a different user-agent — this is where honeypot detection catches them.

If you’re on the Optimize tier, Crawl Analytics shows exactly which pages each bot is requesting and how often, giving you detailed data to refine your strategy. You can identify patterns like bots re-crawling the same pages daily or consuming disproportionate bandwidth.

The goal isn’t to win a war against every bot on the internet. It’s to make deliberate choices about which AI systems access your content, enforce those choices effectively, and spend as little of your own server resources doing it as possible.

Try AI Bot Tracker — Free on WordPress.org

Detect, monitor, and respond to AI crawlers on your WordPress site. Full bot detection is free forever.

Download Free Plugin