97% of llms.txt Information By no means Get Learn

Everybody has an opinion on llms.txt, however in relation to precise proof we now have solely single-site logs or the odd small-scale experiment.

Utilizing Ahrefs Net Analytics and Bot Analytics, we analyzed the server logs and dwell visitors of 137K domains, plus the person brokers hitting all of them.

Right here’s what we discovered.

Prime findings

28% of the 137K domains utilizing Ahrefs Net Analytics publish an llms.txt file.
97% of these information acquired zero visitors in Might 2026. Nothing fetched them at all.
96% of the requests that did attain llms.txt information got here from bots.
19.5% of fetches got here from named AI instruments (of the three% of information that weren’t ignored). GPTBot is high and Claude-Code is second, forward of each AI search and assistant bot.
12% of fetches come from the trade learning itself: GEO/AEO instruments, llms.txt checker instruments, and researchers.
Zero requests got here from AI bots for llms.txt information that don’t exist. They by no means go searching.
The Chrome Lighthouse llms.txt audit produced roughly 1 in 1,000 fetches.

In late Might 2026, Google took either side of the llms.txt argument in beneath a week.

Its new guide on optimizing for generative AI features instructed web site homeowners, in a bit actually titled “mythbusting”, that machine-readable information like llms.txt aren’t wanted to seem in generative AI search.

Days later, the Chrome crew shipped an llms.txt test inside Lighthouse’s experimental Agentic Browsing audits, with documentation explaining that with out the file, brokers could spend extra time crawling a web site to grasp its construction

A webpage titled "llms.txt" on Chrome for developers, under a "Lighthouse ><img decoding=

When Lily Ray pressed Google’s John Mueller on the contradiction, he defined that llms.txt is “not achieved for search.” It’s a “momentary crutch, maybe to avoid wasting tokens” for AI coding instruments parsing developer documentation—not one thing non-developer websites want to fret about.

He additionally acknowledged that web site homeowners who test their logs will discover little or no AI agent visitors.

A screenshot of a Twitter thread from John Mueller. The highlighted text says, "even with more agentic traffic in the future (and if you check your logs, you’re not getting a lot of that at the moment)."

That is one thing we determined to check.

What llms.txt is (and what it isn’t)

Earlier than we go any additional, let’s clear up what llms.txt really is. Llms.txt is a single index file, written in markdown, positioned at a web site’s root. Proposed by Jeremy Howard, co-founder of Reply.AI and quick.ai, in 2024, it summarizes what a web site is and hyperlinks its most vital content material. The concept being that LLMs and brokers can use this info to orient themselves with out crawling all the things. The “AI visibility” framing round llms.txt got here in a while, connected by the search engine optimisation trade as adoption unfold on the hypothesis that AI platforms would reward the file. Two issues it’s typically confused with, and isn’t.

It isn’t the follow of publishing markdown copies of your net pages, a separate tactic with its own problems.
And regardless of the filename, it’s not a robots.txt-style directive: it controls nothing and blocks nothing.

This research measures the index file, and solely the index file.

Our research focuses on all 137,210 domains in Ahrefs Web Analytics that received traffic in May 2026.

We checked each domain root for an llms.txt returning HTTP 200, then used Ahrefs Bot Analytics to examine every request to /llms.txt paths across the population, split by HTTP response (200 vs 404) and classified by channel and individual user agent.

To rule out soft 404s and phantom files, we also confirmed each file was actual Markdown rather than HTML, and screened titles and content for error signals like “404” or “Page not found”

It’s important to note:

Ahrefs Web Analytics customers skew more technical and SEO-aware than the web at large, so treat the 28% adoption figure as an upper bound.
We did not explicitly study whether a file was well-formed against the llms.txt specification.

28% of domains publish llms.txt

Google Search’s steerage says you’ll be able to skip it, the Chrome crew audits for it, and Mueller calls it a stopgap for coding instruments.

So amid all of the blended messages, how widespread is llms.txt really? Among the many 137K domains in our research, 28% publish these information.

A couple of in 4 domains (38,000) in our inhabitants have adopted llms.txt, even if no main AI platform has ever dedicated to studying it.

Adoption has been pushed by hypothesis that AI platforms could begin consuming the file, fairly than by any affirmation that they do.

Pie chart: 28% of sites publish valid llms.txt (38,360 domains), 72% do not (98,640 domains).

97% of llms.txt information obtain zero requests

Nearly each llms.txt file in our research is unread.

Of the ~38,000 domains with a legitimate file, 97% noticed no requests for it in any respect in Might.

No bots. No people. Nothing.

Bar chart shows ahrefs’ study of 137K domains. 97% of llm.txt files are never requested.

The remaining 3% (1.1K domains) acquired all the llms.txt visitors we measured.

Our knowledge suggests John Mueller is true. Not solely will you discover little or no AI visitors because of this file—you can see little or no visitors, interval.

Should you publish an llms.txt file right this moment, the more than likely consequence by far is that nothing ever fetches it.

The three% of information that do get learn, although, get learn by attention-grabbing guests.

We’ll deal with them for the remainder of the research.

96% of requests to llms.txt information come from bots

Llms.txt information are written for machines, and machines are practically the one issues studying them.

Throughout the information that acquired visitors, 96% of requests got here from bots.

People accounted for 4%, and a piece of these seem like SEOs sharing llms.txt hyperlinks in chat apps, the place unfurl bots dutifully fetch them.

Slackbot alone fetched llms.txt information extra typically than PerplexityBot did.

Perplexity is likely one of the AI search engines like google and yahoo llms.txt was seemingly designed to assist, so discovering {that a} chat app’s link-preview bot outfetched it speaks volumes about how a lot actual AI search curiosity these information are literally producing.

77% of the bots studying llms.txt aren’t from AI instruments

Many websites publish llms.txt exactly as a result of they assume it should enhance their possibilities of showing in ChatGPT solutions, or touchdown Perplexity citations, or successful an AI Overview.

However our knowledge tells a special story: 77% of the bots fetching llms.txt aren’t AI instruments at all.

To know which bots had been requesting llms.txt, we categorised each person agent into twelve classes.

CATEGORY	TYPE	REQUESTS	% OF TOTAL
search engine optimisation audit instrumentsCrawl websites for conventional search engine optimisation well being checks, with no particular curiosity in llms.txte.g. SiteAuditBot, WebPageTest	Auditing	4,776	21.7%
Different and unidentifiedNameless SDK defaults and bots whose function or operator we couldn’t decidee.g. node, satoric-indexer	Unknown	3,278	14.9%
Common net crawlersIndex the net for search and product discovery, with no acknowledged AI-agent use casee.g. Googlebot, Amazonbot	Crawling	2,871	13.1%
Tech profiling instrumentsCrawl websites to establish know-how stacks and enterprise intelligence knowledgee.g. BuiltWith, Dataprovider	Profiling	2,546	11.6%
AI brokers & agentic infrastructureAI brokers performing on a person’s behalf, plus the crawlers and tooling constructed to serve theme.g. Claude-Code, IbouBot	AI	2,302	10.5%
GEO/AEO instrumentsScan web sites and rating their readiness for AI search and agent discoverye.g. CairrotReadinessBot, AuditMetricBot	Learning llms.txt	1,278	5.8%
AI coaching crawlersAccumulate knowledge for mannequin constructinge.g. GPTBot, ClaudeBot	AI	1,179	5.3%
llms.txt discoverability botsParticularly scan, validate, or catalogue llms.txt informatione.g. LLMS-Txt-Scanner, txtfeed-bot	Learning llms.txt	793	3.6%
Service and social botsFetch URLs to generate hyperlink previews in messaging apps and social platformse.g. Slackbot, Skype URI Preview	Social	645	2.9%
Analysis botsCrawl for tutorial or investigative functions, together with safety analysise.g. prompt-injection-survey, ResearchProject	Learning llms.txt	585	2.7%
AI assistantsBrowse the net on behalf of a person in response to a single questione.g. ChatGPT-Consumer, Claude-Consumer	AI	559	2.5%
AI retrieval botsFetch pages to reply dwell person queries in AI search merchandisee.g. OAI-SearchBot, PerplexityBot	AI	233	1.1%

Individually, no AI bot class makes the highest 4.

search engine optimisation audit instruments (21.7%), Different and unidentified (14.9%), Common net crawlers (13.1%), and Tech profiling instruments (11.6%) all ship extra requests than anybody AI bot.

Sidenote.

That high class additionally incorporates Chrome’s Lighthouse audit, the test that reignited the llms.txt debate. It made simply 22 requests—roughly 1 in 1,000.

The largest standalone AI class, AI brokers, sits in fifth place at 10.5%.

However once you mix the 4 AI classes (coaching crawlers, retrieval bots, assistants, and brokers), AI bots develop into the biggest single bucket at 19.5%.

The bot visitors splits into three tales:

AI bots consuming the file (19.5%)
An extended tail of nameless scrapers (14.9%)
An trade auditing it (12.1%)

We’ll dig into a few these under.

19.5% of requests come from AI bots

Of the requests that do attain llms.txt information, named AI bots account for 19.5%.

Whereas AI bots are the biggest identifiable readership of llms.txt, the breakdown by AI bot kind exhibits the file isn’t serving the AI instruments most individuals have in thoughts.

We group them 4 methods:

AI brokers & agentic infrastructure that act on a person’s behalf, or crawl to serve the brokers that do.
AI coaching crawlers that accumulate knowledge for mannequin constructing
AI assistants that browse the net on behalf of a person in actual time
AI retrieval bots that fetch pages to reply dwell person queries in AI platforms

Right here’s how they dimension up…

Bar chart showing AI bot requests. Agents (blue) 10.5%, Training crawlers (orange) 5.3%, Assistants (red) 2.5%, Retrieval bots (green) 1.1%. GPTBot is highest at 4.51%.

*statespace-indexer: operator recognized as Statespace (agentic infrastructure), IP ranges unconfirmed.

Sidenote.

Fast reminder: This evaluation covers the three% of information that acquired any requests in any respect, to not the full 137K domains. That equates to roughly 1.1K domains and 22K requests in whole—so we’re nonetheless solely learning a tiny pool. Additionally, “fetched” doesn’t imply “learn”. Many bots could have fetched the llms.txt file with out ever performing on what’s inside. Each determine on this research is subsequently a ceiling on precise llms.txt consumption. For example, 19.5% of requests from AI is essentially the most beneficiant attainable studying. Precise AI consumption is someplace at or under this.

The agentic net is the true client, sending 10.5% of requests

AI brokers, and the infrastructure constructed to serve them, drive 10.5% of llms.txt requests—greater than every other kind of AI bot.

This discovering traces up with a hunch that many within the trade already had.

We heard earlier from John Mueller that llms.txt works finest as reference materials for AI coding brokers.

Chris Long, Founding father of Nectiv, has additionally acknowledged that, even when llms.txt doesn’t allow you to in Google search, the file has utility in case your prospects “are utilizing Claude Code to supply suggestions”

LinkedIn post by Chris Long about LLMs.txt and its relevance to SEO beyond Google Search, with highlighted text.

Our Bot Analytics knowledge helps each concepts.

We see llms.txt information being fetched far much less by the search and AI bots which might be seemingly accountable for visibility, and way more by the agentic instruments that search out structured info and/or act on a person’s behalf.

Bar chart showing the share of verified AI bot requests from various agents, totaling 10.5%. "statespace-indexer" leads with 3.52%.

*statespace-indexer: operator recognized as Statespace (agentic infrastructure), IP ranges unconfirmed.

Other than statespace-indexer and GPTBot, Claude-Code (Anthropic’s coding agent), out-fetched each AI retrieval bot, each AI assistant, and each AI coaching crawler.

Coaching crawlers are the second-largest AI class at 5.3%

Llms.txt information feed coaching corpora greater than they feed AI search retrieval.

In reality, AI coaching crawlers fetch llms.txt practically 5X greater than AI retrieval bots.

So if llms.txt had been to in any manner affect your model’s AI visibility, it will doubtless be upstream—not on the level of retrieval.

Of all coaching crawlers, GPTBot is way and away the largest fetcher of llms.txt.

You received’t discover a Gemini crawler on this listing, as a result of it doesn’t exist.

Google trains and grounds Gemini on content material fetched by common Googlebot, and Google-Extended, the opt-out publishers use, is a robots.txt token fairly than a crawler with its personal person agent.

Googlebot did fetch llms.txt information ~900 occasions in Might, however Googlebot routinely fetches any URL it discovers on a web site as a part of regular search indexing, so these fetches don’t point out particular curiosity in llms.txt—it’s crawling the file the identical manner it crawls a sitemap or every other web page.

Whether or not any of that content material then feeds Gemini is invisible to us.

AI retrieval bots barely register, with 1.1% of whole requests

Based on our knowledge, AI retrieval bots account for simply 1.1% of AI bot requests.

Even when taken along with AI assistants and AI coaching crawlers, these bots nonetheless depend for under 8.9% of requests (1.6% lower than AI brokers).

OAI-SearchBot, PerplexityBot, and Claude’s search crawler mixed made solely a few hundred fetches throughout hundreds of websites.

In case you are planning on producing an llms.txt in hopes of boosting your AI citations, you could wish to assume once more.

12% of requests come from instruments learning llms.txt, not consuming it

An entire ecosystem has fashioned round auditing, scoring, validating, and learning the llms.txt normal, earlier than we’ve even established whether or not any main AI platform really reads it.

Three classes account for 12% of all requests mixed.

Pie chart showing 12% of requests study the llms.txt standard. Research bots: 2.7%, llms.txt discoverability: 3.6%, GEO/AEO tools: 5.8%.

GEO/AEO instruments ship 5.8% of requests

Business instruments scan web sites and rating their readiness for AI search and agent discovery, with llms.txt presence as one in every of many indicators.

Essentially the most energetic, CairrotReadinessBot, belongs to Cairrot, a WordPress-focused AEO platform launched in late 2025.

Then you’ve got the mainstream web site builders like Framer, Lovable, and Wix all baking AI-readiness checks into their merchandise.

Lms.txt adoption has develop into a platform default earlier than it’s even develop into a webmaster resolution.

llms.txt discoverability bots cowl 3.6% of requests

There’s an ecosystem of instruments that catalog the llms.txt information that nearly no person else reads.

Devoted scanners, validators, and directories constructed solely for llms.txt information ship extra requests than AI retrieval bots and AI assistants.

Analysis bots ship 2.7% of requests

The biggest single analysis crawler within the dataset identifies itself as prompt-injection-survey/1.0.

Somebody is systematically learning llms.txt as a immediate injection alternative that AI brokers are designed to ingest and belief.

The safety implications of brokers trusting llms.txt information at scale have barely been mentioned, and but potential dangerous actors are already on the case.

Zero AI bots “go searching” for llms.txt information that don’t exist

AI instruments by no means go in search of llms.txt information that aren’t there, so publishing one doesn’t put you on any AI radar.

We analyzed each request to /llms.txt paths that returned a 404 and located the cleanest cut up we’ve seen in bot knowledge: the place on the one hand legitimate information drew 96% bot visitors, lacking information drew 98% human visitors, and the AI bot share of these 404s was zero.

The individuals probing for absent llms.txt information are people typing the URL right into a browser, presumably SEOs checking on opponents.

This kills the belief that AI methods actively hunt for llms.txt information, and {that a} web site with out one is lacking a knock on the door.

AI instruments fetch llms.txt when a hyperlink, an index, or a person instruction tells them it exists.

The right way to test your individual llms.txt bot visitors

If you wish to see which bots are literally hitting your llms.txt file, head to Ahrefs Bot Analytics and add a filter for Web page URL → Accommodates → llms.txt, then hit Apply.

studying llmstxt fetches in Ahrefs bot analytics

This narrows all the things all the way down to requests hitting your llms.txt file (or any pages with “llms.txt” within the URL, like weblog posts about it).

We don’t have an llms.txt file on the Ahrefs web site however we’re getting some bots hitting that web page, as indicated by the 404 standing.

From there, you’ll be able to test:

Visits over time. Toggle between By bot and By class to see whether or not visitors is climbing, flat, or spiking.
The Bots desk. See which precise bots are fetching the file.
Final standing in Crawled pages. Verify the standing code. A 404 on /llms.txt means bots are asking for a file that isn’t there.

That final level is the helpful gut-check. Loads of websites get bot requests for an llms.txt they by no means revealed. The visitors is actual; the file isn’t.

You can even use the AI bots filter at high of the web page to strip out different crawlers and see solely the LLM-related ones.

And, bear in mind, a bot requesting your llms.txt isn’t proof something learn or acted on it. It solely tells you the file was fetched.

So, must you create an llms.txt file?

In case your objective is displaying up in ChatGPT, Perplexity, or AI Overviews, an llms.txt file is basically ornament.

AI search bots barely fetch them, no AI system goes in search of them, and 97% of present information entice no readers of any form.

And keep in mind that requests are the beneficiant measure. Whether or not bots act on what they fetch is one other query

Listed below are the professionals and cons, side-by-side.

PROS	CONS
Publishing llms.txt is affordable, and platforms like Wix will more and more do it for you.	The bottom fee is brutal: 97% of present llms.txt information entice no readers of any form.
The closest factor to an meant viewers in our knowledge is coding brokers. In case your prospects use coding brokers, or if brokers act in your web site, the file stands an actual likelihood of being learn.	It received’t assist your AI search visibility right this moment. AI retrieval bots barely fetch these information, and no AI system goes in search of one you haven’t revealed.
It could futureproof your technique. Google has made it clear that the way forward for search is agentic. If brokers find yourself mediating AI search, fairly than retrieval bots fetching pages straight, llms.txt might begin influencing AI visibility by means of the agent layer.	Publishing is simply half the job. Brokers fetch llms.txt when directed, not speculatively, so an unlinked file is unlikely to get picked up.
	It’s a safety threat. Brokers are constructed to belief this file, and potential dangerous actors are already probing llms.txt for immediate injection. A stale or compromised file misleads each agent that reads it.

My verdict: the cons outweigh the professionals proper now. If you wish to present up in AI search, there are extra dependable methods to enhance your visibility than this file.

However when you’re nonetheless toying with the concept of producing llms.txt, listed below are the steps it’s best to take:

Verify your individual logs earlier than investing additional. A 97% likelihood of zero readership is the bottom fee.
Get a website-building platform to do it for you. Wix already generates these information, and Framer and Lovable are scanning for them. Inside a yr, having an llms.txt could also be as a lot a CMS default as having a sitemap. If the payoff is unsure, it is sensible to maintain the trouble minimal.
Route brokers to it. Hyperlink the file out of your HTML, reference it in your docs, or point out it anyplace brokers obtain directions about your web site. Brokers fetch llms.txt when directed, not speculatively.
Offset the immediate injection threat by treating llms.txt like code. Model-control it, limit who can edit it, set an alert for unauthorized modifications, hold the content material to plain hyperlinks and descriptions (nothing instruction-shaped), solely hyperlink to sources you management, and evaluate something a platform auto-generates in your behalf.

This research solutions what number of websites publish llms.txt, and who reads it.

However there are a few different questions worthy of additional analysis that had been past the scope of this research:

Do brokers fetch developer-docs extra typically? Is Claude-Code’s llms.txt curiosity targeting documentation paths like /docs/ and /api/, as Mueller’s framing predicts?
Do bots really act on what they learn? When an AI agent fetches llms.txt, does it then fetch the sources the file hyperlinks to? search engine optimisation guide David McSweeney, Founding father of Queryburst, is already working an experiment alongside these traces: he’s serving AI person brokers a compressed, agent-friendly abstract of his check websites, full with directions for requesting deeper content material, and monitoring whether or not any agent really follows by means of. His outcomes are worth following.

Mueller known as llms.txt a brief crutch.

However that crutch appears to have already got its personal provide chain: platforms producing llms.txt information, an trade auditing them, and safety researchers learning them, all earlier than the “readers” really confirmed up.

Both we’re watching the early scaffolding of an actual normal, or we’re watching the search engine optimisation trade show it will possibly productize something. Our cash is on a little bit of each.

97% of llms.txt Information By no means Get Learn

28% of domains publish llms.txt

97% of llms.txt information obtain zero requests

96% of requests to llms.txt information come from bots

77% of the bots studying llms.txt aren’t from AI instruments

19.5% of requests come from AI bots

12% of requests come from instruments learning llms.txt, not consuming it

Zero AI bots “go searching” for llms.txt information that don’t exist

So, must you create an llms.txt file?

Physics Map: Animation that reveals how completely different areas of physics match collectively

Issues to bear in mind when posting photographs and residential descriptions

Converter

Editors Pick

Newsletter

Categories

Related Posts