llms.txt: a guide to AI search optimization | OptimyCloud

llms.txt: how to make your site visible to ChatGPT, Perplexity and AI engines

The robots.txt of artificial intelligence already exists. Most sites still ignore it.

March 15, 2026 12 min read Alexandre Gillon
Artificial intelligence and SEO optimization for AI search engines

When you search for something on Google in 2026, there's a good chance an AI-generated answer appears before the first links. These AI Overviews already cover 16% of Google searches. Add ChatGPT Search, Perplexity and Gemini, and a whole share of web traffic now plays out in AI answers rather than in lists of blue links.

The problem: your site might be invisible to these AI engines. Not because your content is bad, but because you don't speak their language. That's where llms.txt comes in, a simple file that could change the game. It's the approach we followed on optimycloud.com ourselves: our llms.txt file has been live since January 2026, and we support our clients on the subject.

In short

The llms.txt file is a Markdown file placed at the root of your site that guides AI engines toward your strategic content. Combined with GEO (Generative Engine Optimization), it lets you get cited in answers from ChatGPT, Perplexity and Google AI Overviews. Fewer than 1,000 sites worldwide have deployed it. Now is the time to be one of them.

The problem: AI engines don't read your site the way Google does

Google indexes your pages one by one. It follows links, reads the HTML, understands the structure. LLMs work differently. Their context window is limited. An entire site with its navigation, scripts and CSS is too much noise for too little signal.

The result: when ChatGPT or Perplexity looks for information in your field of expertise, it lands on your homepage packed with visual components and misses your high-value content, buried three clicks deeper.

What an LLM sees

  • Complex HTML with navigation, scripts, CSS
  • No priority hierarchy between pages
  • Content drowned in technical markup

What llms.txt provides

  • Clean Markdown, readable by AI engines
  • Curated and prioritized content
  • Direct links to strategic pages

llms.txt: the robots.txt of artificial intelligence

The llms.txt file was proposed in September 2024 by Jeremy Howard, co-founder of fast.ai and a major figure in deep learning. The idea is simple: just as robots.txt tells search engines what they can crawl, llms.txt tells AI engines where to find the content that matters.

It's a Markdown file placed at the root of the site (yoursite.com/llms.txt) with a machine-parsable structure:

# Example llms.txt file
# My Company
> Short description of the company and its services.
Additional information about the business, positioning, and target customers.
## Main documentation
- [Services guide](https://mysite.com/services): Full description of our offerings
- [Case studies](https://mysite.com/case-studies): Concrete results with our clients
- [Technical blog](https://mysite.com/blog): In-depth articles on our expertise
## Optional
- [Legal notice](https://mysite.com/legal): Legal information
- [FAQ](https://mysite.com/faq): Frequently asked questions

The rules of the specification

  • A single, mandatory H1: the name of your site or company
  • Blockquote: a one-sentence summary (optional but recommended)
  • H2 sections: content categories with lists of links in the format [title](url): description
  • "Optional" section: secondary resources that AI engines can skip if context is limited

llms.txt vs llms-full.txt: what's the difference?

The specification defines two complementary files. Think of the first as a table of contents and the second as the complete book.

Aspect llms.txt llms-full.txt
Content Index with annotated links Full documentation embedded
Typical size 5,000 - 8,000 words 35,000+ words
Use Discovery and quick navigation Exhaustive context with no navigation
Analogy Annotated table of contents The entire book

Companies like Next.js, Stripe and Vercel already offer both files. Next.js goes even further with per-release versions (/docs/14/llms.txt, /docs/15/llms.txt).

AI crawlers: who visits your site and why

Before talking about optimization, you need to understand who these bots are. Unlike Googlebot, which does everything, AI companies run several distinct bots with different roles.

Bot Operator Role
GPTBot OpenAI Collects data for model training
ChatGPT-User OpenAI Real-time retrieval for answers
OAI-SearchBot OpenAI Indexing for ChatGPT Search
ClaudeBot Anthropic Training and indexing
PerplexityBot Perplexity Indexing for the Perplexity engine
Google-Extended Google Training Gemini (robots.txt token)

Important point

OpenAI alone runs 4 different bots: GPTBot (training), ChatGPT-User (real-time answers), OAI-SearchBot (indexing) and ChatGPT Agent (autonomous browsing). Blocking GPTBot in your robots.txt doesn't necessarily block the others.

Adoption in 2026: where do we stand?

Let's be transparent: llms.txt is still in its early days. The numbers speak for themselves.

~950

domains with an llms.txt worldwide

30,000+

installs of the WordPress llms.txt plugin

0

AI systems that officially read it

Yes, you read that right: no major AI system officially reads llms.txt to date. Google's John Mueller confirmed it. Tests run by Semrush over 6 months detected no visits from GPTBot, ClaudeBot or PerplexityBot to the file.

So why bother? Because adoption by sites always precedes adoption by engines. It was the same for robots.txt in 1994, for Schema.org markup in 2011, for HTTPS in 2014. The companies that position themselves now will have an edge when AI engines start to use this file.

Who has already deployed it?

Stripe Next.js Vercel NVIDIA Postman MariaDB Cal.com Nuxt Retool

GEO: the real revolution behind llms.txt

The llms.txt file is only one piece. The overall strategy is called GEO (Generative Engine Optimization): optimizing your content to be cited in AI answers. It's the SEO of 2026. For SMEs looking to make the most of AI more broadly, we have published a practical guide to integrating generative AI in business.

Researchers from Princeton and Georgia Tech published the foundational GEO study, testing 9 optimization strategies across 10,000 queries. The results are clear: three techniques stand out sharply.

1

Cite reliable sources +30 to 40% visibility

Instead of writing "companies are increasingly using AI," write "according to McKinsey (2024), 72% of companies have adopted AI in at least one function." AI engines love verifiable sources.

2

Add precise statistics +30 to 40% visibility

Replace "many" with numbers. "The conversion rate rose by 23% in 3 months" is infinitely more citable than "results progressed significantly."

3

Include expert quotes +30 to 40% visibility

AI engines favor content with authoritative voices. A direct quote from an expert in your field adds weight to your content in generated answers.

What no longer works

Keyword stuffing, a pillar of 2010s SEO, is almost useless on generative engines. LLMs understand meaning, not repetition. Natural content rich in data beats over-optimized content.

SEO vs GEO: two different games

Criterion Classic SEO GEO
Goal Rank in a list of links Get cited in an AI answer
Main lever Keywords, backlinks, structure Clarity, data, citations, accuracy
Visible result Position in the SERP Mention in the generated answer
Metrics Position, CTR, impressions Mentions, citations, sentiment
Conversion Standard rate 4.4x higher than organic traffic

The conversion point is especially striking: visitors who arrive via AI search convert 4.4 times better than classic organic traffic. It makes sense: when ChatGPT recommends your service, the user arrives with a far higher level of trust than someone clicking a Google link. Pair this with a channel like WhatsApp to automate your customer relationship with AI, and the impact on your acquisition becomes significant.

Practical guide: setting up llms.txt and a GEO strategy

1

Check your robots.txt

First step: don't block AI crawlers. Make sure your robots.txt doesn't disallow GPTBot, ClaudeBot or PerplexityBot.

# robots.txt - Allow AI crawlers
User-agent: GPTBot
Allow: /
Disallow: /admin/
User-agent: ClaudeBot
Allow: /
Disallow: /admin/
User-agent: PerplexityBot
Allow: /
Disallow: /admin/
2

Create your llms.txt file

Place it at the root: yoursite.com/llms.txt. Select your 10 to 20 most strategic pages. No need to list everything: the goal is to guide, not to be exhaustive.

3

Enrich your content for GEO

On your strategic pages, add sourced statistics, expert quotes, and structure your content as question/answer. Google's AI Overviews love paragraphs that directly answer a question.

4

Create Markdown versions of your key pages

The specification recommends providing clean .md versions of your HTML pages. For example, yoursite.com/services.html.md for a cleaned-up Markdown version of your services page.

5

Test your AI visibility

Ask ChatGPT, Perplexity and Gemini questions about your field of expertise. Are you cited? Are your competitors? It's the best way to measure the impact of your GEO efforts.

WordPress: implementation in 2 minutes

If your site runs on WordPress, the "Website LLMs.txt" plugin (30,000+ installs) automatically generates the file from your existing content. It integrates with Yoast, Rank Math and SEOPress.

One-click installation from the WordPress directory
Automatic generation based on your pages and posts
Compatible with the main SEO plugins
Manual customization of the file content

robots.txt, sitemap.xml, llms.txt: who does what

File Role Audience Status
robots.txt Crawl permission / disallow All crawlers Standard
sitemap.xml Exhaustive inventory of pages Search engines Standard
llms.txt Curated guide to key content LLMs and AI agents Emerging

Frequently asked questions

What is the llms.txt file?

A Markdown file placed at the root of your website that gives AI engines a structured summary of your content. Proposed by Jeremy Howard (fast.ai) in September 2024, it plays for LLMs the role that robots.txt plays for search engines.

Do AI engines really read llms.txt?

Not officially yet as of March 2026. But adoption is accelerating (950+ domains, 30,000+ WordPress installs) and major tech companies are positioning themselves. Preparing now means getting ahead before it becomes a standard.

What is the difference between llms.txt and llms-full.txt?

llms.txt is a compact index (an annotated table of contents); llms-full.txt contains the full documentation (the entire book). The first runs 5,000-8,000 words, the second 35,000+.

What is GEO?

Generative Engine Optimization is the practice of optimizing content to be cited by AI engines. Unlike SEO (ranking in a list), GEO aims to be the source mentioned in an answer generated by ChatGPT, Perplexity or Google's AI Overviews.

How do I know if AI engines are citing my site?

Ask ChatGPT, Perplexity and Gemini questions related to your field. Watch whether your brand, your articles or your data are cited. Tools like Semrush are starting to offer AI visibility metrics.

Conclusion: should you get started now?

llms.txt isn't a standard yet. No AI officially reads it. But that's exactly what people said about Schema.org in 2012, HTTPS in 2015, and voice search in 2018. The sites that positioned themselves early on those standards gained months of lead over their competitors.

The setup cost is negligible: a Markdown file at the root of your site, a few tweaks to your robots.txt, and some groundwork on the quality of your content. GEO goes further and requires rethinking how you write: less empty marketing, more verifiable data and sourced citations.

AI Overviews cover 16% of Google searches. Visitors from AI search convert 4.4 times better. The train is leaving the station. The question isn't whether AI engines will use llms.txt, but when.

Make your site visible to AI engines

AI visibility audit, llms.txt implementation, a complete GEO strategy or technical SEO optimization: I help you make your site cited, not just indexed.

Let's talk about your AI visibility