1
0
Fork 0
mirror of https://git.sr.ht/~seirdy/seirdy.one synced 2024-11-23 21:02:09 +00:00

Slow down MJ12bot

This commit is contained in:
Seirdy 2024-08-08 02:21:00 -04:00
parent b5a238b3e3
commit 1701c4b254
No known key found for this signature in database
GPG key ID: 1E892DB2A5F84479

View file

@ -87,6 +87,10 @@ Disallow: /
User-agent: PiplBot User-agent: PiplBot
Disallow: / Disallow: /
# Well-known overly-aggressive bot that claims to respect robots.txt: http://mj12bot.com/
User-agent: MJ12bot
Crawl-Delay: 10
## Gen-AI data scrapers ## ## Gen-AI data scrapers ##
# Eat shit, OpenAI. # Eat shit, OpenAI.
@ -117,6 +121,9 @@ User-Agent: FacebookBot
User-Agent: meta-externalagent User-Agent: meta-externalagent
Disallow: / Disallow: /
# This one doesn't support robots.txt: https://www.allenai.org/crawler
# block it with your reverse-proxy or WAF or something.
# I'm not blocking CCBot for now. It publishes a free index for anyone to use. # I'm not blocking CCBot for now. It publishes a free index for anyone to use.
# Googe used this to train the initial version of Bard (now called Gemini). # Googe used this to train the initial version of Bard (now called Gemini).
# I allow CCBot since its index is also used for upstart/hobbyist search engines # I allow CCBot since its index is also used for upstart/hobbyist search engines