Everybody has an opinion on llms.txt, however in relation to precise proof we now have solely single-site logs or the odd small-scale experiment.
Utilizing Ahrefs Net Analytics and Bot Analytics, we analyzed the server logs and dwell visitors of 137K domains, plus the person brokers hitting all of them.
Right here’s what we discovered.
Prime findings
- 28% of the 137K domains utilizing Ahrefs Net Analytics publish an llms.txt file.
- 97% of these information acquired zero visitors in Might 2026. Nothing fetched them at all.
- 96% of the requests that did attain llms.txt information got here from bots.
- 19.5% of fetches got here from named AI instruments (of the three% of information that weren’t ignored). GPTBot is high and Claude-Code is second, forward of each AI search and assistant bot.
- 12% of fetches come from the trade learning itself: GEO/AEO instruments, llms.txt checker instruments, and researchers.
- Zero requests got here from AI bots for llms.txt information that don’t exist. They by no means go searching.
- The Chrome Lighthouse llms.txt audit produced roughly 1 in 1,000 fetches.
In late Might 2026, Google took either side of the llms.txt argument in beneath a week.
Its new guide on optimizing for generative AI features instructed web site homeowners, in a bit actually titled “mythbusting”, that machine-readable information like llms.txt aren’t wanted to seem in generative AI search.
Days later, the Chrome crew shipped an llms.txt test inside Lighthouse’s experimental Agentic Browsing audits, with documentation explaining that with out the file, brokers could spend extra time crawling a web site to grasp its construction

When Lily Ray pressed Google’s John Mueller on the contradiction, he defined that llms.txt is “not achieved for search.” It’s a “momentary crutch, maybe to avoid wasting tokens” for AI coding instruments parsing developer documentation—not one thing non-developer websites want to fret about.
He additionally acknowledged that web site homeowners who test their logs will discover little or no AI agent visitors.


That is one thing we determined to check.
What llms.txt is (and what it isn’t)
- It isn’t the follow of publishing markdown copies of your net pages, a separate tactic with its own problems.
- And regardless of the filename, it’s not a robots.txt-style directive: it controls nothing and blocks nothing.
This research measures the index file, and solely the index file.
Our research focuses on all 137,210 domains in Ahrefs Web Analytics that received traffic in May 2026.
We checked each domain root for an llms.txt returning HTTP 200, then used Ahrefs Bot Analytics to examine every request to /llms.txt paths across the population, split by HTTP response (200 vs 404) and classified by channel and individual user agent.
To rule out soft 404s and phantom files, we also confirmed each file was actual Markdown rather than HTML, and screened titles and content for error signals like “404” or “Page not found”
It’s important to note:
- Ahrefs Web Analytics customers skew more technical and SEO-aware than the web at large, so treat the 28% adoption figure as an upper bound.
- We did not explicitly study whether a file was well-formed against the llms.txt specification.
Google Search’s steerage says you’ll be able to skip it, the Chrome crew audits for it, and Mueller calls it a stopgap for coding instruments.
So amid all of the blended messages, how widespread is llms.txt really? Among the many 137K domains in our research, 28% publish these information.
A couple of in 4 domains (38,000) in our inhabitants have adopted llms.txt, even if no main AI platform has ever dedicated to studying it.
Adoption has been pushed by hypothesis that AI platforms could begin consuming the file, fairly than by any affirmation that they do.


Nearly each llms.txt file in our research is unread.
Of the ~38,000 domains with a legitimate file, 97% noticed no requests for it in any respect in Might.
No bots. No people. Nothing.


The remaining 3% (1.1K domains) acquired all the llms.txt visitors we measured.
Our knowledge suggests John Mueller is true. Not solely will you discover little or no AI visitors because of this file—you can see little or no visitors, interval.
Should you publish an llms.txt file right this moment, the more than likely consequence by far is that nothing ever fetches it.
The three% of information that do get learn, although, get learn by attention-grabbing guests.
We’ll deal with them for the remainder of the research.
Llms.txt information are written for machines, and machines are practically the one issues studying them.
Throughout the information that acquired visitors, 96% of requests got here from bots.
People accounted for 4%, and a piece of these seem like SEOs sharing llms.txt hyperlinks in chat apps, the place unfurl bots dutifully fetch them.
Slackbot alone fetched llms.txt information extra typically than PerplexityBot did.
Perplexity is likely one of the AI search engines like google and yahoo llms.txt was seemingly designed to assist, so discovering {that a} chat app’s link-preview bot outfetched it speaks volumes about how a lot actual AI search curiosity these information are literally producing.
Many websites publish llms.txt exactly as a result of they assume it should enhance their possibilities of showing in ChatGPT solutions, or touchdown Perplexity citations, or successful an AI Overview.
However our knowledge tells a special story: 77% of the bots fetching llms.txt aren’t AI instruments at all.
To know which bots had been requesting llms.txt, we categorised each person agent into twelve classes.
| CATEGORY | TYPE | REQUESTS | % OF TOTAL |
|---|---|---|---|
| search engine optimisation audit instrumentsCrawl websites for conventional search engine optimisation well being checks, with no particular curiosity in llms.txte.g. SiteAuditBot, WebPageTest | Auditing | 4,776 | 21.7% |
| Different and unidentifiedNameless SDK defaults and bots whose function or operator we couldn’t decidee.g. node, satoric-indexer | Unknown | 3,278 | 14.9% |
| Common net crawlersIndex the net for search and product discovery, with no acknowledged AI-agent use casee.g. Googlebot, Amazonbot | Crawling | 2,871 | 13.1% |
| Tech profiling instrumentsCrawl websites to establish know-how stacks and enterprise intelligence knowledgee.g. BuiltWith, Dataprovider | Profiling | 2,546 | 11.6% |
| AI brokers & agentic infrastructureAI brokers performing on a person’s behalf, plus the crawlers and tooling constructed to serve theme.g. Claude-Code, IbouBot | AI | 2,302 | 10.5% |
| GEO/AEO instrumentsScan web sites and rating their readiness for AI search and agent discoverye.g. CairrotReadinessBot, AuditMetricBot | 1,278 | 5.8% | |
| AI coaching crawlersAccumulate knowledge for mannequin constructinge.g. GPTBot, ClaudeBot | AI | 1,179 | 5.3% |
| llms.txt discoverability botsParticularly scan, validate, or catalogue llms.txt informatione.g. LLMS-Txt-Scanner, txtfeed-bot | 793 | 3.6% | |
| Service and social botsFetch URLs to generate hyperlink previews in messaging apps and social platformse.g. Slackbot, Skype URI Preview | Social | 645 | 2.9% |
| Analysis botsCrawl for tutorial or investigative functions, together with safety analysise.g. prompt-injection-survey, ResearchProject | 585 | 2.7% | |
| AI assistantsBrowse the net on behalf of a person in response to a single questione.g. ChatGPT-Consumer, Claude-Consumer | AI | 559 | 2.5% |
| AI retrieval botsFetch pages to reply dwell person queries in AI search merchandisee.g. OAI-SearchBot, PerplexityBot | AI | 233 | 1.1% |
Individually, no AI bot class makes the highest 4.
search engine optimisation audit instruments (21.7%), Different and unidentified (14.9%), Common net crawlers (13.1%), and Tech profiling instruments (11.6%) all ship extra requests than anybody AI bot.
Sidenote.
That high class additionally incorporates Chrome’s Lighthouse audit, the test that reignited the llms.txt debate. It made simply 22 requests—roughly 1 in 1,000.
The largest standalone AI class, AI brokers, sits in fifth place at 10.5%.
However once you mix the 4 AI classes (coaching crawlers, retrieval bots, assistants, and brokers), AI bots develop into the biggest single bucket at 19.5%.
The bot visitors splits into three tales:
- AI bots consuming the file (19.5%)
- An extended tail of nameless scrapers (14.9%)
- An trade auditing it (12.1%)
We’ll dig into a few these under.
Of the requests that do attain llms.txt information, named AI bots account for 19.5%.
Whereas AI bots are the biggest identifiable readership of llms.txt, the breakdown by AI bot kind exhibits the file isn’t serving the AI instruments most individuals have in thoughts.
We group them 4 methods:
- AI brokers & agentic infrastructure that act on a person’s behalf, or crawl to serve the brokers that do.
- AI coaching crawlers that accumulate knowledge for mannequin constructing
- AI assistants that browse the net on behalf of a person in actual time
- AI retrieval bots that fetch pages to reply dwell person queries in AI platforms
Right here’s how they dimension up…

*statespace-indexer: operator recognized as Statespace (agentic infrastructure), IP ranges unconfirmed.
Sidenote.
Fast reminder: This evaluation covers the three% of information that acquired any requests in any respect, to not the full 137K domains. That equates to roughly 1.1K domains and 22K requests in whole—so we’re nonetheless solely learning a tiny pool. Additionally, “fetched” doesn’t imply “learn”. Many bots could have fetched the llms.txt file with out ever performing on what’s inside. Each determine on this research is subsequently a ceiling on precise llms.txt consumption. For example, 19.5% of requests from AI is essentially the most beneficiant attainable studying. Precise AI consumption is someplace at or under this.
The agentic net is the true client, sending 10.5% of requests
AI brokers, and the infrastructure constructed to serve them, drive 10.5% of llms.txt requests—greater than every other kind of AI bot.
This discovering traces up with a hunch that many within the trade already had.
We heard earlier from John Mueller that llms.txt works finest as reference materials for AI coding brokers.
Chris Long, Founding father of Nectiv, has additionally acknowledged that, even when llms.txt doesn’t allow you to in Google search, the file has utility in case your prospects “are utilizing Claude Code to supply suggestions”


Our Bot Analytics knowledge helps each concepts.
We see llms.txt information being fetched far much less by the search and AI bots which might be seemingly accountable for visibility, and way more by the agentic instruments that search out structured info and/or act on a person’s behalf.

*statespace-indexer: operator recognized as Statespace (agentic infrastructure), IP ranges unconfirmed.
Other than statespace-indexer and GPTBot, Claude-Code (Anthropic’s coding agent), out-fetched each AI retrieval bot, each AI assistant, and each AI coaching crawler.
Coaching crawlers are the second-largest AI class at 5.3%
Llms.txt information feed coaching corpora greater than they feed AI search retrieval.
In reality, AI coaching crawlers fetch llms.txt practically 5X greater than AI retrieval bots.


So if llms.txt had been to in any manner affect your model’s AI visibility, it will doubtless be upstream—not on the level of retrieval.
Of all coaching crawlers, GPTBot is way and away the largest fetcher of llms.txt.
You received’t discover a Gemini crawler on this listing, as a result of it doesn’t exist.
Google trains and grounds Gemini on content material fetched by common Googlebot, and Google-Extended, the opt-out publishers use, is a robots.txt token fairly than a crawler with its personal person agent.
Googlebot did fetch llms.txt information ~900 occasions in Might, however Googlebot routinely fetches any URL it discovers on a web site as a part of regular search indexing, so these fetches don’t point out particular curiosity in llms.txt—it’s crawling the file the identical manner it crawls a sitemap or every other web page.
Whether or not any of that content material then feeds Gemini is invisible to us.
AI retrieval bots barely register, with 1.1% of whole requests
Based on our knowledge, AI retrieval bots account for simply 1.1% of AI bot requests.
Even when taken along with AI assistants and AI coaching crawlers, these bots nonetheless depend for under 8.9% of requests (1.6% lower than AI brokers).
OAI-SearchBot, PerplexityBot, and Claude’s search crawler mixed made solely a few hundred fetches throughout hundreds of websites.


In case you are planning on producing an llms.txt in hopes of boosting your AI citations, you could wish to assume once more.
An entire ecosystem has fashioned round auditing, scoring, validating, and learning the llms.txt normal, earlier than we’ve even established whether or not any main AI platform really reads it.
Three classes account for 12% of all requests mixed.


GEO/AEO instruments ship 5.8% of requests
Business instruments scan web sites and rating their readiness for AI search and agent discovery, with llms.txt presence as one in every of many indicators.
Essentially the most energetic, CairrotReadinessBot, belongs to Cairrot, a WordPress-focused AEO platform launched in late 2025.
Then you’ve got the mainstream web site builders like Framer, Lovable, and Wix all baking AI-readiness checks into their merchandise.
Lms.txt adoption has develop into a platform default earlier than it’s even develop into a webmaster resolution.
llms.txt discoverability bots cowl 3.6% of requests
There’s an ecosystem of instruments that catalog the llms.txt information that nearly no person else reads.
Devoted scanners, validators, and directories constructed solely for llms.txt information ship extra requests than AI retrieval bots and AI assistants.
Analysis bots ship 2.7% of requests
The biggest single analysis crawler within the dataset identifies itself as prompt-injection-survey/1.0.
Somebody is systematically learning llms.txt as a immediate injection alternative that AI brokers are designed to ingest and belief.
The safety implications of brokers trusting llms.txt information at scale have barely been mentioned, and but potential dangerous actors are already on the case.
AI instruments by no means go in search of llms.txt information that aren’t there, so publishing one doesn’t put you on any AI radar.
We analyzed each request to /llms.txt paths that returned a 404 and located the cleanest cut up we’ve seen in bot knowledge: the place on the one hand legitimate information drew 96% bot visitors, lacking information drew 98% human visitors, and the AI bot share of these 404s was zero.
The individuals probing for absent llms.txt information are people typing the URL right into a browser, presumably SEOs checking on opponents.
This kills the belief that AI methods actively hunt for llms.txt information, and {that a} web site with out one is lacking a knock on the door.
AI instruments fetch llms.txt when a hyperlink, an index, or a person instruction tells them it exists.
The right way to test your individual llms.txt bot visitors
If you wish to see which bots are literally hitting your llms.txt file, head to Ahrefs Bot Analytics and add a filter for Web page URL → Accommodates → llms.txt, then hit Apply.


This narrows all the things all the way down to requests hitting your llms.txt file (or any pages with “llms.txt” within the URL, like weblog posts about it).
We don’t have an llms.txt file on the Ahrefs web site however we’re getting some bots hitting that web page, as indicated by the 404 standing.
From there, you’ll be able to test:
- Visits over time. Toggle between By bot and By class to see whether or not visitors is climbing, flat, or spiking.
- The Bots desk. See which precise bots are fetching the file.
- Final standing in Crawled pages. Verify the standing code. A
404on/llms.txtmeans bots are asking for a file that isn’t there.
That final level is the helpful gut-check. Loads of websites get bot requests for an llms.txt they by no means revealed. The visitors is actual; the file isn’t.
You can even use the AI bots filter at high of the web page to strip out different crawlers and see solely the LLM-related ones.
And, bear in mind, a bot requesting your llms.txt isn’t proof something learn or acted on it. It solely tells you the file was fetched.
In case your objective is displaying up in ChatGPT, Perplexity, or AI Overviews, an llms.txt file is basically ornament.
AI search bots barely fetch them, no AI system goes in search of them, and 97% of present information entice no readers of any form.
And keep in mind that requests are the beneficiant measure. Whether or not bots act on what they fetch is one other query
Listed below are the professionals and cons, side-by-side.
| PROS | CONS |
|---|---|
| Publishing llms.txt is affordable, and platforms like Wix will more and more do it for you. | The bottom fee is brutal: 97% of present llms.txt information entice no readers of any form. |
| The closest factor to an meant viewers in our knowledge is coding brokers. In case your prospects use coding brokers, or if brokers act in your web site, the file stands an actual likelihood of being learn. | It received’t assist your AI search visibility right this moment. AI retrieval bots barely fetch these information, and no AI system goes in search of one you haven’t revealed. |
| It could futureproof your technique. Google has made it clear that the way forward for search is agentic. If brokers find yourself mediating AI search, fairly than retrieval bots fetching pages straight, llms.txt might begin influencing AI visibility by means of the agent layer. | Publishing is simply half the job. Brokers fetch llms.txt when directed, not speculatively, so an unlinked file is unlikely to get picked up. |
| It’s a safety threat. Brokers are constructed to belief this file, and potential dangerous actors are already probing llms.txt for immediate injection. A stale or compromised file misleads each agent that reads it. |
My verdict: the cons outweigh the professionals proper now. If you wish to present up in AI search, there are extra dependable methods to enhance your visibility than this file.
However when you’re nonetheless toying with the concept of producing llms.txt, listed below are the steps it’s best to take:
- Verify your individual logs earlier than investing additional. A 97% likelihood of zero readership is the bottom fee.
- Get a website-building platform to do it for you. Wix already generates these information, and Framer and Lovable are scanning for them. Inside a yr, having an llms.txt could also be as a lot a CMS default as having a sitemap. If the payoff is unsure, it is sensible to maintain the trouble minimal.
- Route brokers to it. Hyperlink the file out of your HTML, reference it in your docs, or point out it anyplace brokers obtain directions about your web site. Brokers fetch llms.txt when directed, not speculatively.
- Offset the immediate injection threat by treating llms.txt like code. Model-control it, limit who can edit it, set an alert for unauthorized modifications, hold the content material to plain hyperlinks and descriptions (nothing instruction-shaped), solely hyperlink to sources you management, and evaluate something a platform auto-generates in your behalf.
This research solutions what number of websites publish llms.txt, and who reads it.
However there are a few different questions worthy of additional analysis that had been past the scope of this research:
- Do brokers fetch developer-docs extra typically? Is Claude-Code’s llms.txt curiosity targeting documentation paths like /docs/ and /api/, as Mueller’s framing predicts?
- Do bots really act on what they learn? When an AI agent fetches llms.txt, does it then fetch the sources the file hyperlinks to? search engine optimisation guide David McSweeney, Founding father of Queryburst, is already working an experiment alongside these traces: he’s serving AI person brokers a compressed, agent-friendly abstract of his check websites, full with directions for requesting deeper content material, and monitoring whether or not any agent really follows by means of. His outcomes are worth following.
Mueller known as llms.txt a brief crutch.
However that crutch appears to have already got its personal provide chain: platforms producing llms.txt information, an trade auditing them, and safety researchers learning them, all earlier than the “readers” really confirmed up.
Both we’re watching the early scaffolding of an actual normal, or we’re watching the search engine optimisation trade show it will possibly productize something. Our cash is on a little bit of each.

