What is llms.txt?
llms.txt is a plain-text file you place at the root of your domain — at
https://yourdomain.com/llms.txt — that gives large language models
(ChatGPT, Claude, Perplexity, Gemini and the rest) a curated index of the
content on your site that's worth reading.
If you've seen robots.txt before, the file format will feel
familiar. The intent is different:
| File | Audience | Purpose |
|---|---|---|
robots.txt | Search-engine crawlers | Where you're allowed to look. Pure gatekeeping. |
llms.txt | Large language models | What is worth reading, and in what order. A reading list. |
robots.txt is a fence. llms.txt is a tour guide.
Why it exists
A modern LLM, when asked a question that requires fresh information,
fetches a handful of URLs and reads them. It cannot, in practice, read
your entire site. It picks 3–20 URLs based on signals it can see: page
titles, internal anchors, sitemap entries, and increasingly the contents
of llms.txt.
If you don't have an llms.txt, the model picks whatever its
upstream crawler decided was important — often your homepage, often
nothing else. With llms.txt you tell it: "start here, then here, then
here. The first three are the ones that actually answer the question
you're asking."
For commercial sites this is the difference between being cited as the source of an answer and being summarized away.
The format
The format is intentionally tiny. There is no schema you have to learn. The community spec lives at llmstxt.org. The canonical structure looks like this:
# Brand or site name
> One-paragraph description of what this site is and who it's for.
## Section heading
- [Page title](https://yourdomain.com/path): Short summary of why a
model should read this page.
- [Another page title](https://yourdomain.com/other): Short summary.
## Another section
- ...
## Optional
- ...
Three rules worth remembering:
- Plain text, Markdown-flavored. No YAML, no JSON, no XML.
- Headings and bullets are semantic. Models read the section names to decide which list to pull from.
- An "Optional" section at the end is a convention — list links you want included for completeness but that aren't high-priority.
What BridgeToAgent puts in your llms.txt
We generate llms.txt directly from the real DOM of your site, not from
an LLM guess. The structure we produce:
- Title + tagline scraped from the homepage
<title>and primary<meta description>. - A one-paragraph site description built from the homepage hero, fallback to the most-linked above-the-fold text block.
- A primary section of the most-traversed internal links from your sitemap.xml or, when no sitemap exists, your homepage navigation.
- A schema section listing the high-value Schema.org-typed pages agents can use directly (Product, Article, FAQPage, Organization, etc.).
- An Optional section with terms, privacy, contact, and other legal surfaces.
Every entry includes a short, model-readable summary so an LLM can decide which link in the list to fetch when it has only one fetch budget left.
Generated kits are footer-tagged with a one-line attribution back to
bridgetoagent.com and a version field so
downstream consumers can detect which generator produced the file.
Where to put it
The file must be served from the exact path /llms.txt at the root
of your domain. Not /static/llms.txt, not /files/llms.txt. Models
look at the root path only.
Three things must be true when an agent fetches it:
- The HTTP status is
200 OK. - The
Content-Typeistext/plain(or close —text/markdownandtext/plain; charset=utf-8are both fine). - The body is the file content. Not an HTML wrapper around it. This is the most common install mistake on platforms like Shopify or Wix — uploading to a "Pages" surface that renders the file as HTML.
For platform-specific install steps see the install guides.
Common questions
Can't I just rely on robots.txt and a sitemap?
No. robots.txt tells crawlers what is allowed. sitemap.xml tells
them what exists. llms.txt tells a model what is worth reading and
in what order — a curated list, not an inventory.
Will Google use this file for normal search? Not directly. It's consumed by AI assistants and AI-driven search products (Perplexity, ChatGPT search, Claude with web access, Gemini in AI Mode). Traditional Google indexing still relies on the sitemap.
Does the file need to be regenerated when my site changes? Yes, ideally on a cadence that matches your content velocity. The BridgeToAgent kit ships with a one-line cron-or-CI command you can wire in. Most marketing sites regenerate monthly; product catalogs weekly.
Is there a size limit? No formal limit, but most consuming models give the file roughly 50 kB of attention before truncating. The kit we generate stays well under that on every site we've audited.