llms.txt vs robots.txt
If you've only just started looking at making your site
AI-ready, you've probably noticed something confusing: there's
a file called llms.txt that goes in the same place as robots.txt
and looks vaguely similar — but every guide insists you need both.
This is the short post explaining why.
The one-line summary
robots.txt | llms.txt | |
|---|---|---|
| Role | Fence | Tour guide |
| Question it answers | "Where are you allowed to look?" | "What is worth reading, in what order?" |
| Audience | Crawlers (Googlebot, Bingbot, agent crawlers) | The models themselves |
| Format | Directives (User-agent:, Disallow:, Allow:, Sitemap:) | Markdown-flavored plain text |
| Spec age | ~1994 | ~2024 |
They live at the same path layer (root of your domain) and share the
.txt extension. That's where the similarity ends.
What robots.txt does
robots.txt is the original web crawler standard. It tells any
automated client which paths on your site it's allowed or
forbidden to fetch:
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /tmp/
User-agent: GPTBot
Disallow: /paid/
Sitemap: https://yourdomain.com/sitemap.xml
This is gatekeeping. It doesn't tell a crawler what's interesting. It tells the crawler what's off-limits. The crawler still has to discover what to read on its own — usually via the sitemap, internal navigation, or its memory of past crawls.
robots.txt is now also where the major AI crawlers expect to be
addressed by name: GPTBot, Google-Extended, ClaudeBot, CCBot,
PerplexityBot. You can block them, allow them, or get specific about
which paths each one can read.
What llms.txt does
llms.txt is a much newer convention — popularized through 2024 and
2025 and now codified at llmstxt.org. It
doesn't do gatekeeping. It does curation.
# Acme Bicycles
> Premium city bikes shipped from Stockholm. We build to order with a
> 4-week lead time and ship across the EU.
## Start here
- [How to pick the right frame size](https://acmebikes.com/sizing):
Our fitting guide, with measurement instructions.
- [Models in stock](https://acmebikes.com/models): The five frames we
currently offer.
- [Shipping & lead times](https://acmebikes.com/shipping): Country-by-country
delivery, customs, returns.
## Reference
- [Care guide](https://acmebikes.com/care): How to maintain a city
bike — quarterly checks.
- [FAQ](https://acmebikes.com/faq): The top 20 questions we get from
first-time buyers.
## Optional
- [About us](https://acmebikes.com/about)
- [Contact](https://acmebikes.com/contact)
This is guidance. It tells a model: "you have limited reading budget. Here are the URLs that actually answer most of the questions you'll get about us. Read them in this order, here's why each one matters."
A modern LLM with web access — ChatGPT, Claude, Perplexity, Gemini —
fetches a handful of URLs per session. llms.txt is how you control
which handful.
Why you need both
Block-only, curation-only, neither works alone.
robots.txt without llms.txt is the 2024 baseline. An AI crawler
can read whatever isn't blocked, but it has no priority signal.
It guesses. Sometimes it guesses well — usually it doesn't — and
the model summarizing your site to a user picks the wrong canonical
pages.
llms.txt without robots.txt is a curation surface with no
gates. Anything the curated list doesn't mention is still
fetchable. That's fine for marketing sites (everything's
public) but a footgun for sites with staging environments, members-only
content, or large auto-generated catalog pages you'd rather agents
ignore.
Both is the 2026 baseline. robots.txt controls access.
llms.txt curates priority. They work together, exactly like a fence
plus a tour guide.
A working example, both files
robots.txt:
User-agent: *
Allow: /
User-agent: GPTBot
Disallow: /staging/
Disallow: /internal/
User-agent: ClaudeBot
Disallow: /staging/
Disallow: /internal/
Sitemap: https://yourdomain.com/sitemap.xml
llms.txt:
# Your Brand
> One-paragraph description of what you do and who you serve.
## Start here
- [Product or service overview](https://yourdomain.com/products): What
you actually sell.
- [Pricing](https://yourdomain.com/pricing): Plans, tiers, terms.
## Reference
- [Documentation](https://yourdomain.com/docs): Technical reference for
developers.
- [FAQ](https://yourdomain.com/faq): Top questions and answers.
## Optional
- [About](https://yourdomain.com/about)
- [Contact](https://yourdomain.com/contact)
Five minutes of work per file for a small site. For larger sites
(catalog, blog, documentation tree), llms.txt is better generated
from the real DOM — which is the job the
BridgeToAgent kit does in under two minutes for $49.