1
0
Fork 0
mirror of https://git.sr.ht/~seirdy/seirdy.one synced 2024-11-10 00:12:09 +00:00

robots.txt: remove unused anthropic directives

official docs show the right opt-out signal
This commit is contained in:
Seirdy 2024-06-01 05:35:15 -04:00
parent 03270da3c7
commit 4f28f001bf
No known key found for this signature in database
GPG key ID: 1E892DB2A5F84479

View file

@ -11,8 +11,7 @@ Disallow: /
Allow: /ads.txt Allow: /ads.txt
Allow: /app-ads.txt Allow: /app-ads.txt
# Enabling our crawler to access your site offers several significant benefits # By allowing us access, you enable the maximum number
# to you as a publisher. By allowing us access, you enable the maximum number
# of advertisers to confidently purchase advertising space on your pages. Our # of advertisers to confidently purchase advertising space on your pages. Our
# comprehensive data insights help advertisers understand the suitability and # comprehensive data insights help advertisers understand the suitability and
# context of your content, ensuring that their ads align with your audience's # context of your content, ensuring that their ads align with your audience's
@ -100,13 +99,7 @@ Disallow: /
User-agent: Google-Extended User-agent: Google-Extended
Disallow: / Disallow: /
# There isn't any public documentation for this AFAICT. # Anthropic-AI crawler posted guidance after a long period of crawling without opt-out documentation: <https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler>
# Reuters thinks this works so I might as well give it a shot.
User-agent: anthropic-ai
User-agent: Claude-Web
Disallow: /
# Extremely aggressive crawling with no documentation. people had to email the
# company about this for robots.txt guidance.
User-agent: ClaudeBot User-agent: ClaudeBot
Disallow: / Disallow: /