robots.txt was designed in 1994 for search engine crawlers. It works reasonably well for that purpose, but it was never built to handle the nuances of AI interaction — training data collection, content summarization, attribution requirements, or model-specific permissions.
Two new standards are emerging to fill that gap: ai.txt and llms.txt. Both aim to give website owners more granular control over how AI systems interact with their content. They’re complementary to robots.txt, not replacements for it.
llms.txt: A Machine-Readable Content Guide
llms.txt is the more mature of the two standards. Proposed in late 2024 and gaining adoption through 2025–2026, it’s designed to help large language models understand and interact with your site’s content more effectively.
What It Does
An llms.txt file sits at the root of your domain (e.g., https://example.com/llms.txt) and provides structured information about your site that LLMs can use to:
- Understand what your site is about
- Know which content is most important or authoritative
- Find documentation, FAQs, and key pages
- Respect your preferences for content usage
Think of llms.txt as a curated table of contents for AI systems. Instead of letting a web crawler decide which pages matter by crawling your entire site, you tell it directly.
Format
llms.txt uses a simple Markdown-like format:
# Site Name
> Brief description of what this site is about.
## Docs
- [Getting Started](/docs/getting-started): How to install and configure
- [API Reference](/docs/api): REST API documentation
## Blog
- [Latest Posts](/blog): Technical articles and updates
## Optional
- [Terms of Service](/terms)
- [Privacy Policy](/privacy)
The file is human-readable and machine-parseable. It serves as a curated guide to your content — telling AI systems where to look and what matters.
Current Adoption
llms.txt has seen meaningful adoption among developer-focused sites, documentation platforms, and SaaS products. Major AI companies including Anthropic and Perplexity have stated they read llms.txt when available.
The standard is still evolving. There’s no RFC or formal specification yet, but the community convention is well-established and the format is stable enough for production use.
ai.txt: Permission and Policy Declaration
ai.txt takes a different approach. While llms.txt is about guiding AI systems to your content, ai.txt is about declaring policies — what AI systems are and aren’t allowed to do with your content.
What It Does
An ai.txt file declares your site’s policy on AI usage:
- Whether AI systems may use your content for training
- Whether AI-generated summaries or excerpts are permitted
- Attribution requirements
- Which specific AI systems are allowed or denied
- Contact information for licensing inquiries
Format
ai.txt uses a structured format inspired by robots.txt but with AI-specific directives:
# ai.txt — AI Usage Policy for example.com
User-Agent: *
Allow-Training: No
Allow-Summarization: Yes
Attribution-Required: Yes
Contact: ai-licensing@example.com
User-Agent: GPTBot
Allow-Training: No
User-Agent: PerplexityBot
Allow-Summarization: Yes
Attribution-Required: Yes
This lets you set per-bot policies beyond simple allow/deny. You might allow Perplexity to summarize your content (with attribution) while denying OpenAI permission to use it for training — a nuance that robots.txt simply can’t express.
Current Adoption
ai.txt is newer and has less adoption than llms.txt. It’s being discussed in web standards communities and some AI companies have signaled interest in reading it, but it’s not yet widely supported by major AI crawlers. Think of it as an emerging standard worth implementing now so you’re ready when adoption grows.
How They Differ From robots.txt
| Feature | robots.txt | llms.txt | ai.txt |
|---|---|---|---|
| Purpose | Crawl access control | Content guide for LLMs | AI usage policy |
| Scope | Crawling only | Content understanding | Training, summarization, attribution |
| Format | INI-style directives | Markdown sections | Structured directives |
| Enforcement | Voluntary (honor system) | Informational | Voluntary (honor system) |
| Granularity | Allow/deny per path | Curated content links | Per-bot, per-action policies |
| Age | 1994 | 2024 | 2025 |
The key difference is intent. robots.txt says “don’t crawl here.” llms.txt says “here’s what’s important.” ai.txt says “here’s what you’re allowed to do with my content.”
Together, the three files give AI systems a complete picture: what they can access, what content matters most, and what they’re permitted to do with it.
Should You Implement Them?
llms.txt: Yes. It’s low effort (a single text file), has meaningful adoption, and helps AI systems interact with your content more intelligently. If you want your documentation or key pages to be well-represented in AI responses, llms.txt guides the AI to the right content rather than leaving it to crawl randomly.
ai.txt: Worth doing. Even though adoption is still growing, declaring your AI usage policy is a proactive move. It takes minutes to create and establishes your position on AI content usage. As more AI companies begin respecting ai.txt, you’ll already be covered.
robots.txt: Keep it. These new standards are additive, not replacements. Your robots.txt Disallow rules still handle the basics of crawl access control. Use all three together for layered coverage.
Implementation on WordPress
For WordPress sites, you can create both files as static files in your site’s root directory. Some plugins (including AI Bot Tracker) are adding support for generating and managing these files from the WordPress admin.
Manual Setup
The simplest approach is to create the files manually:
- Create
llms.txtin your site’s root with a curated guide to your most important content — homepage, key service pages, documentation, and recent blog posts. - Create
ai.txtwith your AI usage policy — decide whether you allow training, summarization, or both, and whether you require attribution. - Keep your existing
robots.txtrules in place for crawl access control.
Verifying Your Files
After creating the files, verify they’re accessible:
- Visit
https://yoursite.com/llms.txtandhttps://yoursite.com/ai.txtdirectly in your browser - Make sure your hosting or caching layer isn’t blocking
.txtfiles in the root directory - Check that the files aren’t being rewritten by your WordPress installation (some setups route all requests through
index.php)
If your WordPress permalink settings or server configuration are intercepting these files, you may need to add a rewrite rule to serve them directly.
Common Mistakes
- Putting the files in
/wp-content/instead of the site root. AI systems look forllms.txtandai.txtat the domain root, just likerobots.txt. - Using HTML formatting instead of plain text. Both files should be served as
text/plain, nottext/html. - Listing every page on your site in llms.txt. The point is curation — link to your most important and authoritative content, not everything.
- Setting ai.txt policies but not enforcing them. ai.txt is a declaration, not enforcement. Pair it with actual blocking tools for bots that violate your policy.
How These Standards Fit Into Your Bot Management Strategy
llms.txt and ai.txt are policy layers — they declare your preferences. But declarations alone don’t stop non-compliant crawlers, just as robots.txt alone doesn’t stop all AI bots.
A complete AI bot management strategy combines:
- Policy declaration — robots.txt, ai.txt, and llms.txt tell compliant bots what’s allowed
- Detection — tracking which bots actually visit your site, whether they comply with your policies
- Enforcement — blocking, tarpitting, or shadowbanning bots that don’t respect your declared policies
- Behavioral detection — honeypot traps that catch disguised crawlers no policy file can reach
Looking Ahead
The web is still figuring out how to handle AI. robots.txt worked for search engines because there was a clear mutual benefit — crawlers needed content, site owners wanted search traffic. The AI ecosystem is more complicated because the value exchange isn’t always clear.
llms.txt and ai.txt are early steps toward a more structured conversation between site owners and AI systems. They’re not perfect, and enforcement remains voluntary, but they move the conversation forward from “block or allow” to a more nuanced set of permissions and preferences.
As the number of AI bots continues to grow and the bandwidth impact becomes harder to ignore, having clear policies in place — declared through these standards and enforced through tools like AI Bot Tracker — puts you in the strongest position possible.