What Are ai.txt and llms.txt? New Standards for Controlling AI Access to Your Site

Two new web standards — ai.txt and llms.txt — are emerging to give site owners more control over how AI systems interact with their content. Here's how they work and how to implement them.

robots.txt was designed in 1994 for search engine crawlers. It works reasonably well for that purpose, but it was never built to handle the nuances of AI interaction — training data collection, content summarization, attribution requirements, or model-specific permissions.

Two new standards are emerging to fill that gap: ai.txt and llms.txt. Both aim to give website owners more granular control over how AI systems interact with their content. They’re complementary to robots.txt, not replacements for it.

llms.txt: A Machine-Readable Content Guide

llms.txt is the more mature of the two standards. Proposed in late 2024 and gaining adoption through 2025–2026, it’s designed to help large language models understand and interact with your site’s content more effectively.

What It Does

An llms.txt file sits at the root of your domain (e.g., https://example.com/llms.txt) and provides structured information about your site that LLMs can use to:

Think of llms.txt as a curated table of contents for AI systems. Instead of letting a web crawler decide which pages matter by crawling your entire site, you tell it directly.

Format

llms.txt uses a simple Markdown-like format:

# Site Name

> Brief description of what this site is about.

## Docs
- [Getting Started](/docs/getting-started): How to install and configure
- [API Reference](/docs/api): REST API documentation

## Blog
- [Latest Posts](/blog): Technical articles and updates

## Optional
- [Terms of Service](/terms)
- [Privacy Policy](/privacy)

The file is human-readable and machine-parseable. It serves as a curated guide to your content — telling AI systems where to look and what matters.

Current Adoption

llms.txt has seen meaningful adoption among developer-focused sites, documentation platforms, and SaaS products. Major AI companies including Anthropic and Perplexity have stated they read llms.txt when available.

The standard is still evolving. There’s no RFC or formal specification yet, but the community convention is well-established and the format is stable enough for production use.

ai.txt: Permission and Policy Declaration

ai.txt takes a different approach. While llms.txt is about guiding AI systems to your content, ai.txt is about declaring policies — what AI systems are and aren’t allowed to do with your content.

What It Does

An ai.txt file declares your site’s policy on AI usage:

Format

ai.txt uses a structured format inspired by robots.txt but with AI-specific directives:

# ai.txt — AI Usage Policy for example.com

User-Agent: *
Allow-Training: No
Allow-Summarization: Yes
Attribution-Required: Yes
Contact: ai-licensing@example.com

User-Agent: GPTBot
Allow-Training: No

User-Agent: PerplexityBot
Allow-Summarization: Yes
Attribution-Required: Yes

This lets you set per-bot policies beyond simple allow/deny. You might allow Perplexity to summarize your content (with attribution) while denying OpenAI permission to use it for training — a nuance that robots.txt simply can’t express.

Current Adoption

ai.txt is newer and has less adoption than llms.txt. It’s being discussed in web standards communities and some AI companies have signaled interest in reading it, but it’s not yet widely supported by major AI crawlers. Think of it as an emerging standard worth implementing now so you’re ready when adoption grows.

How They Differ From robots.txt

Featurerobots.txtllms.txtai.txt
PurposeCrawl access controlContent guide for LLMsAI usage policy
ScopeCrawling onlyContent understandingTraining, summarization, attribution
FormatINI-style directivesMarkdown sectionsStructured directives
EnforcementVoluntary (honor system)InformationalVoluntary (honor system)
GranularityAllow/deny per pathCurated content linksPer-bot, per-action policies
Age199420242025

The key difference is intent. robots.txt says “don’t crawl here.” llms.txt says “here’s what’s important.” ai.txt says “here’s what you’re allowed to do with my content.”

Together, the three files give AI systems a complete picture: what they can access, what content matters most, and what they’re permitted to do with it.

Should You Implement Them?

llms.txt: Yes. It’s low effort (a single text file), has meaningful adoption, and helps AI systems interact with your content more intelligently. If you want your documentation or key pages to be well-represented in AI responses, llms.txt guides the AI to the right content rather than leaving it to crawl randomly.

ai.txt: Worth doing. Even though adoption is still growing, declaring your AI usage policy is a proactive move. It takes minutes to create and establishes your position on AI content usage. As more AI companies begin respecting ai.txt, you’ll already be covered.

robots.txt: Keep it. These new standards are additive, not replacements. Your robots.txt Disallow rules still handle the basics of crawl access control. Use all three together for layered coverage.

Implementation on WordPress

For WordPress sites, you can create both files as static files in your site’s root directory. Some plugins (including AI Bot Tracker) are adding support for generating and managing these files from the WordPress admin.

Manual Setup

The simplest approach is to create the files manually:

  1. Create llms.txt in your site’s root with a curated guide to your most important content — homepage, key service pages, documentation, and recent blog posts.
  2. Create ai.txt with your AI usage policy — decide whether you allow training, summarization, or both, and whether you require attribution.
  3. Keep your existing robots.txt rules in place for crawl access control.

Verifying Your Files

After creating the files, verify they’re accessible:

If your WordPress permalink settings or server configuration are intercepting these files, you may need to add a rewrite rule to serve them directly.

Common Mistakes

How These Standards Fit Into Your Bot Management Strategy

llms.txt and ai.txt are policy layers — they declare your preferences. But declarations alone don’t stop non-compliant crawlers, just as robots.txt alone doesn’t stop all AI bots.

A complete AI bot management strategy combines:

  1. Policy declaration — robots.txt, ai.txt, and llms.txt tell compliant bots what’s allowed
  2. Detectiontracking which bots actually visit your site, whether they comply with your policies
  3. Enforcementblocking, tarpitting, or shadowbanning bots that don’t respect your declared policies
  4. Behavioral detectionhoneypot traps that catch disguised crawlers no policy file can reach

Looking Ahead

The web is still figuring out how to handle AI. robots.txt worked for search engines because there was a clear mutual benefit — crawlers needed content, site owners wanted search traffic. The AI ecosystem is more complicated because the value exchange isn’t always clear.

llms.txt and ai.txt are early steps toward a more structured conversation between site owners and AI systems. They’re not perfect, and enforcement remains voluntary, but they move the conversation forward from “block or allow” to a more nuanced set of permissions and preferences.

As the number of AI bots continues to grow and the bandwidth impact becomes harder to ignore, having clear policies in place — declared through these standards and enforced through tools like AI Bot Tracker — puts you in the strongest position possible.

Try AI Bot Tracker — Free on WordPress.org

Detect, monitor, and respond to AI crawlers on your WordPress site. Full bot detection is free forever.

Download Free Plugin