mirror of
https://git.sr.ht/~seirdy/seirdy.one
synced 2024-12-24 01:42:10 +00:00
Slow down MJ12bot
This commit is contained in:
parent
b5a238b3e3
commit
1701c4b254
1 changed files with 7 additions and 0 deletions
|
@ -87,6 +87,10 @@ Disallow: /
|
|||
User-agent: PiplBot
|
||||
Disallow: /
|
||||
|
||||
# Well-known overly-aggressive bot that claims to respect robots.txt: http://mj12bot.com/
|
||||
User-agent: MJ12bot
|
||||
Crawl-Delay: 10
|
||||
|
||||
## Gen-AI data scrapers ##
|
||||
|
||||
# Eat shit, OpenAI.
|
||||
|
@ -117,6 +121,9 @@ User-Agent: FacebookBot
|
|||
User-Agent: meta-externalagent
|
||||
Disallow: /
|
||||
|
||||
# This one doesn't support robots.txt: https://www.allenai.org/crawler
|
||||
# block it with your reverse-proxy or WAF or something.
|
||||
|
||||
# I'm not blocking CCBot for now. It publishes a free index for anyone to use.
|
||||
# Googe used this to train the initial version of Bard (now called Gemini).
|
||||
# I allow CCBot since its index is also used for upstart/hobbyist search engines
|
||||
|
|
Loading…
Reference in a new issue