Just adding a bunch of new/renamed AI scraper bots to the ol blocklist
robots.txt:
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: semantic-visions.com
Disallow: /
robots.txt:
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: semantic-visions.com
Disallow: /
October 28, 2025 at 1:41 PM
Just adding a bunch of new/renamed AI scraper bots to the ol blocklist
robots.txt:
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: semantic-visions.com
Disallow: /
robots.txt:
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: semantic-visions.com
Disallow: /
Their study found that most major AI crawlers can fetch JavaScript files (between 10%-25%) but do not execute it. GPTBot, ClaudeBot, PerplexityBot and more do not currently fully render JavaScript content.
This means that if you're using JS to load content, many AI crawlers will be missing them.
This means that if you're using JS to load content, many AI crawlers will be missing them.
January 7, 2025 at 1:43 PM
Their study found that most major AI crawlers can fetch JavaScript files (between 10%-25%) but do not execute it. GPTBot, ClaudeBot, PerplexityBot and more do not currently fully render JavaScript content.
This means that if you're using JS to load content, many AI crawlers will be missing them.
This means that if you're using JS to load content, many AI crawlers will be missing them.
Pretty extensive robots.txt so far
`DisallowAITraining` is also a newer option, again only for cooperative robots
www.ietf.org/archive/id/d...
`DisallowAITraining` is also a newer option, again only for cooperative robots
www.ietf.org/archive/id/d...
June 11, 2025 at 11:28 AM
Pretty extensive robots.txt so far
`DisallowAITraining` is also a newer option, again only for cooperative robots
www.ietf.org/archive/id/d...
`DisallowAITraining` is also a newer option, again only for cooperative robots
www.ietf.org/archive/id/d...
Just updated robots.txt.
𝗨𝘀𝗲𝗿-𝗮𝗴𝗲𝗻𝘁: 𝗚𝗣𝗧𝗕𝗼𝘁
𝗗𝗶𝘀𝗮𝗹𝗹𝗼𝘄: /
𝗨𝘀𝗲𝗿-𝗮𝗴𝗲𝗻𝘁: 𝗚𝗼𝗼𝗴𝗹𝗲-𝗘𝘅𝘁𝗲𝗻𝗱𝗲𝗱
𝗗𝗶𝘀𝗮𝗹𝗹𝗼𝘄: /
𝗨𝘀𝗲𝗿-𝗮𝗴𝗲𝗻𝘁: 𝗣𝗲𝗿𝗽𝗹𝗲𝘅𝗶𝘁𝘆𝗕𝗼𝘁
𝗗𝗶𝘀𝗮𝗹𝗹𝗼𝘄: /
LLMs:
𝗨𝘀𝗲𝗿-𝗮𝗴𝗲𝗻𝘁: 𝗚𝗣𝗧𝗕𝗼𝘁
𝗗𝗶𝘀𝗮𝗹𝗹𝗼𝘄: /
𝗨𝘀𝗲𝗿-𝗮𝗴𝗲𝗻𝘁: 𝗚𝗼𝗼𝗴𝗹𝗲-𝗘𝘅𝘁𝗲𝗻𝗱𝗲𝗱
𝗗𝗶𝘀𝗮𝗹𝗹𝗼𝘄: /
𝗨𝘀𝗲𝗿-𝗮𝗴𝗲𝗻𝘁: 𝗣𝗲𝗿𝗽𝗹𝗲𝘅𝗶𝘁𝘆𝗕𝗼𝘁
𝗗𝗶𝘀𝗮𝗹𝗹𝗼𝘄: /
LLMs:
May 1, 2025 at 11:41 AM
Just updated robots.txt.
𝗨𝘀𝗲𝗿-𝗮𝗴𝗲𝗻𝘁: 𝗚𝗣𝗧𝗕𝗼𝘁
𝗗𝗶𝘀𝗮𝗹𝗹𝗼𝘄: /
𝗨𝘀𝗲𝗿-𝗮𝗴𝗲𝗻𝘁: 𝗚𝗼𝗼𝗴𝗹𝗲-𝗘𝘅𝘁𝗲𝗻𝗱𝗲𝗱
𝗗𝗶𝘀𝗮𝗹𝗹𝗼𝘄: /
𝗨𝘀𝗲𝗿-𝗮𝗴𝗲𝗻𝘁: 𝗣𝗲𝗿𝗽𝗹𝗲𝘅𝗶𝘁𝘆𝗕𝗼𝘁
𝗗𝗶𝘀𝗮𝗹𝗹𝗼𝘄: /
LLMs:
𝗨𝘀𝗲𝗿-𝗮𝗴𝗲𝗻𝘁: 𝗚𝗣𝗧𝗕𝗼𝘁
𝗗𝗶𝘀𝗮𝗹𝗹𝗼𝘄: /
𝗨𝘀𝗲𝗿-𝗮𝗴𝗲𝗻𝘁: 𝗚𝗼𝗼𝗴𝗹𝗲-𝗘𝘅𝘁𝗲𝗻𝗱𝗲𝗱
𝗗𝗶𝘀𝗮𝗹𝗹𝗼𝘄: /
𝗨𝘀𝗲𝗿-𝗮𝗴𝗲𝗻𝘁: 𝗣𝗲𝗿𝗽𝗹𝗲𝘅𝗶𝘁𝘆𝗕𝗼𝘁
𝗗𝗶𝘀𝗮𝗹𝗹𝗼𝘄: /
LLMs:
‘I went into Perplexity and asked "What's on this page rknight․me/PerplexityBot?". Immediately I could see the log and just like Lewis, the user agent didn't include their custom user agent’
rknight.me/blog/perplex...
rknight.me/blog/perplex...
Perplexity AI Is Lying about Their User Agent
Perplexity AI claims it sends a user agent and respects robots.txt but it absolutely does not
rknight.me
June 15, 2024 at 10:51 PM
‘I went into Perplexity and asked "What's on this page rknight․me/PerplexityBot?". Immediately I could see the log and just like Lewis, the user agent didn't include their custom user agent’
rknight.me/blog/perplex...
rknight.me/blog/perplex...
just checked and the ones I'm blocking right now are:
AISearchBot anthropic-ai Applebot Bytespider ChatGPT-User Claude-Web cohere-ai Diffbot FacebookBot Google-Extended GPTBot omgili PerplexityBot YouBot
AISearchBot anthropic-ai Applebot Bytespider ChatGPT-User Claude-Web cohere-ai Diffbot FacebookBot Google-Extended GPTBot omgili PerplexityBot YouBot
July 21, 2025 at 8:37 AM
just checked and the ones I'm blocking right now are:
AISearchBot anthropic-ai Applebot Bytespider ChatGPT-User Claude-Web cohere-ai Diffbot FacebookBot Google-Extended GPTBot omgili PerplexityBot YouBot
AISearchBot anthropic-ai Applebot Bytespider ChatGPT-User Claude-Web cohere-ai Diffbot FacebookBot Google-Extended GPTBot omgili PerplexityBot YouBot
User-agent: GoogleOther-Image
User-agent: GoogleOther-Video
User-agent: ICC-Crawler
User-agent: ImagesiftBot
User-agent: Meta-ExternalAgent
User-agent: Meta-ExternalFetcher
User-agent: OAI-SearchBot
User-agent: PerplexityBot
User-agent: PetalBot
User-agent: Scrapy
User-agent: Timpibot
User-agent […]
User-agent: GoogleOther-Video
User-agent: ICC-Crawler
User-agent: ImagesiftBot
User-agent: Meta-ExternalAgent
User-agent: Meta-ExternalFetcher
User-agent: OAI-SearchBot
User-agent: PerplexityBot
User-agent: PetalBot
User-agent: Scrapy
User-agent: Timpibot
User-agent […]
Original post on mastodon.scot
mastodon.scot
June 1, 2025 at 8:06 PM
User-agent: GoogleOther-Image
User-agent: GoogleOther-Video
User-agent: ICC-Crawler
User-agent: ImagesiftBot
User-agent: Meta-ExternalAgent
User-agent: Meta-ExternalFetcher
User-agent: OAI-SearchBot
User-agent: PerplexityBot
User-agent: PetalBot
User-agent: Scrapy
User-agent: Timpibot
User-agent […]
User-agent: GoogleOther-Video
User-agent: ICC-Crawler
User-agent: ImagesiftBot
User-agent: Meta-ExternalAgent
User-agent: Meta-ExternalFetcher
User-agent: OAI-SearchBot
User-agent: PerplexityBot
User-agent: PetalBot
User-agent: Scrapy
User-agent: Timpibot
User-agent […]
Ok actually, because robots.txt is optional and ignorable, here is our current, full AI bot blocking solution using the non-optional .htaccess mod_rewrite (see alt text)
October 28, 2025 at 5:09 PM
Ok actually, because robots.txt is optional and ignorable, here is our current, full AI bot blocking solution using the non-optional .htaccess mod_rewrite (see alt text)
"O motor de busca da Google ainda é todo-poderoso, mas pela primeira vez em mais de uma década caiu abaixo dos 90% de quota global. Sobretudo no desktop, há cada vez mais utilizadores que optam por fazer perguntas ao ChatGPT, Claude, PerplexityBot ou outro sistema de inteligência artificial"
Uma entrada a pés juntos na guerra dos motores de busca
Uma entrada a pés juntos na guerra dos motores de busca
expresso.pt
August 18, 2025 at 4:49 PM
"O motor de busca da Google ainda é todo-poderoso, mas pela primeira vez em mais de uma década caiu abaixo dos 90% de quota global. Sobretudo no desktop, há cada vez mais utilizadores que optam por fazer perguntas ao ChatGPT, Claude, PerplexityBot ou outro sistema de inteligência artificial"
And (details in alt-text again)
December 19, 2024 at 6:45 PM
And (details in alt-text again)
it is frankly amazing how much internet traffic is just literal garbage
March 9, 2025 at 10:50 PM
it is frankly amazing how much internet traffic is just literal garbage
Search engine crawlers still outpace AI crawlers, but it's growing significantly.
Last month GPTBot, Claude, AppleBot, and PerplexityBot combined account for nearly 1.3 billion fetches—a little over 28% of Googlebot's volume.
Last month GPTBot, Claude, AppleBot, and PerplexityBot combined account for nearly 1.3 billion fetches—a little over 28% of Googlebot's volume.
December 17, 2024 at 6:16 PM
Search engine crawlers still outpace AI crawlers, but it's growing significantly.
Last month GPTBot, Claude, AppleBot, and PerplexityBot combined account for nearly 1.3 billion fetches—a little over 28% of Googlebot's volume.
Last month GPTBot, Claude, AppleBot, and PerplexityBot combined account for nearly 1.3 billion fetches—a little over 28% of Googlebot's volume.
Wake up babes, new version of Simple NoAI WordPress plugin just dropped. Adds more crawlers to deny including:
AppleBot-Extended, Bytespider, Cohere-ai, Diffbot, ImagesiftBot, PerplexityBot, FacebookBot, Omigili
Get 1.6.3 here: wordpress.org/plugins/simp...
AppleBot-Extended, Bytespider, Cohere-ai, Diffbot, ImagesiftBot, PerplexityBot, FacebookBot, Omigili
Get 1.6.3 here: wordpress.org/plugins/simp...
Simple NoAI and NoImageAI
This plugin very simply adds a line of code to your header that tells AIs not to use anything on your website for indexing.
wordpress.org
June 11, 2024 at 3:13 PM
Wake up babes, new version of Simple NoAI WordPress plugin just dropped. Adds more crawlers to deny including:
AppleBot-Extended, Bytespider, Cohere-ai, Diffbot, ImagesiftBot, PerplexityBot, FacebookBot, Omigili
Get 1.6.3 here: wordpress.org/plugins/simp...
AppleBot-Extended, Bytespider, Cohere-ai, Diffbot, ImagesiftBot, PerplexityBot, FacebookBot, Omigili
Get 1.6.3 here: wordpress.org/plugins/simp...
یادآوری: robots.txt میتواند خواندن محتوای شما توسط #چت_بات ها را روشن/خاموش کند. نمونه مینیمال: User-agent: GPTBot → Allow: / (برای مسدودسازی، Disallow: / بگذارید.) همین قانون را برای PerplexityBot، Claude، Google-Extended و… تکرار کنید.
August 20, 2025 at 9:30 AM
یادآوری: robots.txt میتواند خواندن محتوای شما توسط #چت_بات ها را روشن/خاموش کند. نمونه مینیمال: User-agent: GPTBot → Allow: / (برای مسدودسازی، Disallow: / بگذارید.) همین قانون را برای PerplexityBot، Claude، Google-Extended و… تکرار کنید.
.... ByteDance (Bytespider), Perplexity (PerplexityBot).
* Google's Gemini leverages Googlebot's infrastructure, enabling full JavaScript rendering.
* AppleBot renders JavaScript via browser crawler, like Googlebot. Processing JS, CSS, Ajax requests, and other resources for full-page rendering...
* Google's Gemini leverages Googlebot's infrastructure, enabling full JavaScript rendering.
* AppleBot renders JavaScript via browser crawler, like Googlebot. Processing JS, CSS, Ajax requests, and other resources for full-page rendering...
December 18, 2024 at 3:38 PM
.... ByteDance (Bytespider), Perplexity (PerplexityBot).
* Google's Gemini leverages Googlebot's infrastructure, enabling full JavaScript rendering.
* AppleBot renders JavaScript via browser crawler, like Googlebot. Processing JS, CSS, Ajax requests, and other resources for full-page rendering...
* Google's Gemini leverages Googlebot's infrastructure, enabling full JavaScript rendering.
* AppleBot renders JavaScript via browser crawler, like Googlebot. Processing JS, CSS, Ajax requests, and other resources for full-page rendering...
⚠️Perplexity is repeatedly modifying their user agent and changing IPs and ASNs to hide their crawling activity, in direct conflict with explicit no-crawl preferences expressed by websites. 🤨
blog.cloudflare.com/perplexity-i... #Perplexity #AICrawler #PerplexityBot
blog.cloudflare.com/perplexity-i... #Perplexity #AICrawler #PerplexityBot
Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives
Perplexity is repeatedly modifying their user agent and changing IPs and ASNs to hide their crawling activity, in direct conflict with explicit no-crawl preferences expressed by websites.
blog.cloudflare.com
August 5, 2025 at 4:41 PM
⚠️Perplexity is repeatedly modifying their user agent and changing IPs and ASNs to hide their crawling activity, in direct conflict with explicit no-crawl preferences expressed by websites. 🤨
blog.cloudflare.com/perplexity-i... #Perplexity #AICrawler #PerplexityBot
blog.cloudflare.com/perplexity-i... #Perplexity #AICrawler #PerplexityBot
I’ve read that ChatGPT's search results often match Bing’s, which makes sense given the underlying integration. Gemini taps into Google Search, while Perplexity uses its own crawler, PerplexityBot. Since Perplexity essentially built its own search engine, its results feel a bit more limited.
April 2, 2025 at 9:01 PM
I’ve read that ChatGPT's search results often match Bing’s, which makes sense given the underlying integration. Gemini taps into Google Search, while Perplexity uses its own crawler, PerplexityBot. Since Perplexity essentially built its own search engine, its results feel a bit more limited.
robots.txt
User-agent: ChatGPT-User
Disallow: /
User-agent: OAI-SearchBot
Disallow: /
User-agent: Perplexity-User
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Amazonbot
Disallow: /
User-agent: GPTBot
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: OAI-SearchBot
Disallow: /
User-agent: Perplexity-User
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Amazonbot
Disallow: /
User-agent: GPTBot
Disallow: /
User-agent: PerplexityBot
Disallow: /
July 21, 2025 at 3:49 AM
robots.txt
User-agent: ChatGPT-User
Disallow: /
User-agent: OAI-SearchBot
Disallow: /
User-agent: Perplexity-User
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Amazonbot
Disallow: /
User-agent: GPTBot
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: OAI-SearchBot
Disallow: /
User-agent: Perplexity-User
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Amazonbot
Disallow: /
User-agent: GPTBot
Disallow: /
User-agent: PerplexityBot
Disallow: /
You can keep up to date on which AI search companies haven’t bent the knee to Amazon pretty easily via their robots.txt:
November 22, 2024 at 8:34 PM
You can keep up to date on which AI search companies haven’t bent the knee to Amazon pretty easily via their robots.txt:
An interesting read. I've been really happy with Perplexity, but I was just able to verify that it pulled content from a site that explicitly excluded PerplexityBot via robots.txt That doesn't seem cool.
Perplexity Is a Bullshit Machine www.wired.com/story/...
#ai #manners
Perplexity Is a Bullshit Machine www.wired.com/story/...
#ai #manners
Perplexity Is a Bullshit Machine
A WIRED investigation shows that the AI-powered search startup Forbes has accused of stealing its content is surreptitiously scraping—and making things up out of thin air.
www.wired.com
June 20, 2024 at 7:36 PM
An interesting read. I've been really happy with Perplexity, but I was just able to verify that it pulled content from a site that explicitly excluded PerplexityBot via robots.txt That doesn't seem cool.
Perplexity Is a Bullshit Machine www.wired.com/story/...
#ai #manners
Perplexity Is a Bullshit Machine www.wired.com/story/...
#ai #manners
this is kind of weird but i guess it's interesting? one thing i still don't get after reading the post — do any of these integrations actually work?
like, i see in my logs: Meta-ExternalAgent, ChatGPT-User, PerplexityBot, PerplexityBot-user. would these actually pay? and what price should i set?
like, i see in my logs: Meta-ExternalAgent, ChatGPT-User, PerplexityBot, PerplexityBot-user. would these actually pay? and what price should i set?
If AI crawlers want access to the juicy training data on my site they’re going to have to pay up!
Introducing Pay per crawl- enabling content owners to charge AI crawlers for access
Pay per crawl is a new feature to allow content creators to charge AI crawlers for access to their content.
blog.cloudflare.com
July 1, 2025 at 11:44 PM
this is kind of weird but i guess it's interesting? one thing i still don't get after reading the post — do any of these integrations actually work?
like, i see in my logs: Meta-ExternalAgent, ChatGPT-User, PerplexityBot, PerplexityBot-user. would these actually pay? and what price should i set?
like, i see in my logs: Meta-ExternalAgent, ChatGPT-User, PerplexityBot, PerplexityBot-user. would these actually pay? and what price should i set?
In this example, I used WP Engine to look at the Go Fish Digital site and search for crawl instances from Perplexitybot (Perplexity) and GPTBot (OpenAI).
Reviewing this data, I can confidently say that both are discovering the site.
Reviewing this data, I can confidently say that both are discovering the site.
January 9, 2025 at 1:34 PM
In this example, I used WP Engine to look at the Go Fish Digital site and search for crawl instances from Perplexitybot (Perplexity) and GPTBot (OpenAI).
Reviewing this data, I can confidently say that both are discovering the site.
Reviewing this data, I can confidently say that both are discovering the site.
The Vercel article;
“The results consistently show that none of the major AI crawlers currently render JavaScript.
This includes: OpenAI (OAI-SearchBot, ChatGPT-User, GPTBot) Anthropic (ClaudeBot) Meta (Meta-ExternalAgent) ByteDance (Bytespider) Perplexity(PerplexityBot)”
7/10
“The results consistently show that none of the major AI crawlers currently render JavaScript.
This includes: OpenAI (OAI-SearchBot, ChatGPT-User, GPTBot) Anthropic (ClaudeBot) Meta (Meta-ExternalAgent) ByteDance (Bytespider) Perplexity(PerplexityBot)”
7/10
The rise of the AI crawler - Vercel
New research reveals how ChatGPT, Claude, and other AI crawlers process web content, including JavaScript rendering, assets, and other behavior and patterns—with recommendations for site owners, devs,...
vercel.com
January 29, 2025 at 2:00 PM
The Vercel article;
“The results consistently show that none of the major AI crawlers currently render JavaScript.
This includes: OpenAI (OAI-SearchBot, ChatGPT-User, GPTBot) Anthropic (ClaudeBot) Meta (Meta-ExternalAgent) ByteDance (Bytespider) Perplexity(PerplexityBot)”
7/10
“The results consistently show that none of the major AI crawlers currently render JavaScript.
This includes: OpenAI (OAI-SearchBot, ChatGPT-User, GPTBot) Anthropic (ClaudeBot) Meta (Meta-ExternalAgent) ByteDance (Bytespider) Perplexity(PerplexityBot)”
7/10
Are there any plans to block AI crawler bots within Bluesky's robots.txt file?
November 15, 2024 at 6:03 PM
Are there any plans to block AI crawler bots within Bluesky's robots.txt file?
Are AI Bots Reading llms.txt Files? – Live Tracking Dashboard – Live experiment tracks whether major AI bots like GPTBot, ClaudeBot, and PerplexityBot are reading llms.txt files across the web. Each time an AI crawler accesses an llm.txt file, we log an anonymo... https://tinyurl.com/yutpad7u #AIBot
Are AI Bots Reading llms.txt Files? – Live Tracking Dashboard
This live experiment tracks whether AI bots like GPTBot, ClaudeBot, and PerplexityBot are reading llms.txt files across real websites. View crawler activity in real time.
llmstxt.ryanhoward.dev
July 5, 2025 at 6:39 PM
Are AI Bots Reading llms.txt Files? – Live Tracking Dashboard – Live experiment tracks whether major AI bots like GPTBot, ClaudeBot, and PerplexityBot are reading llms.txt files across the web. Each time an AI crawler accesses an llm.txt file, we log an anonymo... https://tinyurl.com/yutpad7u #AIBot