BridgeToAgentGet the kit — $49
All posts
1 Jan 1970~4 min readBridgeToAgent

llms.txt vs robots.txt

If you've only just started looking at making your site AI-ready, you've probably noticed something confusing: there's a file called llms.txt that goes in the same place as robots.txt and looks vaguely similar — but every guide insists you need both.

This is the short post explaining why.


The one-line summary

robots.txtllms.txt
RoleFenceTour guide
Question it answers"Where are you allowed to look?""What is worth reading, in what order?"
AudienceCrawlers (Googlebot, Bingbot, agent crawlers)The models themselves
FormatDirectives (User-agent:, Disallow:, Allow:, Sitemap:)Markdown-flavored plain text
Spec age~1994~2024

They live at the same path layer (root of your domain) and share the .txt extension. That's where the similarity ends.


What robots.txt does

robots.txt is the original web crawler standard. It tells any automated client which paths on your site it's allowed or forbidden to fetch:

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /tmp/

User-agent: GPTBot
Disallow: /paid/

Sitemap: https://yourdomain.com/sitemap.xml

This is gatekeeping. It doesn't tell a crawler what's interesting. It tells the crawler what's off-limits. The crawler still has to discover what to read on its own — usually via the sitemap, internal navigation, or its memory of past crawls.

robots.txt is now also where the major AI crawlers expect to be addressed by name: GPTBot, Google-Extended, ClaudeBot, CCBot, PerplexityBot. You can block them, allow them, or get specific about which paths each one can read.


What llms.txt does

llms.txt is a much newer convention — popularized through 2024 and 2025 and now codified at llmstxt.org. It doesn't do gatekeeping. It does curation.

# Acme Bicycles

> Premium city bikes shipped from Stockholm. We build to order with a
> 4-week lead time and ship across the EU.

## Start here

- [How to pick the right frame size](https://acmebikes.com/sizing):
  Our fitting guide, with measurement instructions.
- [Models in stock](https://acmebikes.com/models): The five frames we
  currently offer.
- [Shipping & lead times](https://acmebikes.com/shipping): Country-by-country
  delivery, customs, returns.

## Reference

- [Care guide](https://acmebikes.com/care): How to maintain a city
  bike — quarterly checks.
- [FAQ](https://acmebikes.com/faq): The top 20 questions we get from
  first-time buyers.

## Optional

- [About us](https://acmebikes.com/about)
- [Contact](https://acmebikes.com/contact)

This is guidance. It tells a model: "you have limited reading budget. Here are the URLs that actually answer most of the questions you'll get about us. Read them in this order, here's why each one matters."

A modern LLM with web access — ChatGPT, Claude, Perplexity, Gemini — fetches a handful of URLs per session. llms.txt is how you control which handful.


Why you need both

Block-only, curation-only, neither works alone.

robots.txt without llms.txt is the 2024 baseline. An AI crawler can read whatever isn't blocked, but it has no priority signal. It guesses. Sometimes it guesses well — usually it doesn't — and the model summarizing your site to a user picks the wrong canonical pages.

llms.txt without robots.txt is a curation surface with no gates. Anything the curated list doesn't mention is still fetchable. That's fine for marketing sites (everything's public) but a footgun for sites with staging environments, members-only content, or large auto-generated catalog pages you'd rather agents ignore.

Both is the 2026 baseline. robots.txt controls access. llms.txt curates priority. They work together, exactly like a fence plus a tour guide.


A working example, both files

robots.txt:

User-agent: *
Allow: /

User-agent: GPTBot
Disallow: /staging/
Disallow: /internal/

User-agent: ClaudeBot
Disallow: /staging/
Disallow: /internal/

Sitemap: https://yourdomain.com/sitemap.xml

llms.txt:

# Your Brand

> One-paragraph description of what you do and who you serve.

## Start here

- [Product or service overview](https://yourdomain.com/products): What
  you actually sell.
- [Pricing](https://yourdomain.com/pricing): Plans, tiers, terms.

## Reference

- [Documentation](https://yourdomain.com/docs): Technical reference for
  developers.
- [FAQ](https://yourdomain.com/faq): Top questions and answers.

## Optional

- [About](https://yourdomain.com/about)
- [Contact](https://yourdomain.com/contact)

Five minutes of work per file for a small site. For larger sites (catalog, blog, documentation tree), llms.txt is better generated from the real DOM — which is the job the BridgeToAgent kit does in under two minutes for $49.


Related