mirror of
https://git.sr.ht/~seirdy/seirdy.one
synced 2024-11-23 12:52:10 +00:00
Slow down MJ12bot
This commit is contained in:
parent
b5a238b3e3
commit
1701c4b254
1 changed files with 7 additions and 0 deletions
|
@ -87,6 +87,10 @@ Disallow: /
|
||||||
User-agent: PiplBot
|
User-agent: PiplBot
|
||||||
Disallow: /
|
Disallow: /
|
||||||
|
|
||||||
|
# Well-known overly-aggressive bot that claims to respect robots.txt: http://mj12bot.com/
|
||||||
|
User-agent: MJ12bot
|
||||||
|
Crawl-Delay: 10
|
||||||
|
|
||||||
## Gen-AI data scrapers ##
|
## Gen-AI data scrapers ##
|
||||||
|
|
||||||
# Eat shit, OpenAI.
|
# Eat shit, OpenAI.
|
||||||
|
@ -117,6 +121,9 @@ User-Agent: FacebookBot
|
||||||
User-Agent: meta-externalagent
|
User-Agent: meta-externalagent
|
||||||
Disallow: /
|
Disallow: /
|
||||||
|
|
||||||
|
# This one doesn't support robots.txt: https://www.allenai.org/crawler
|
||||||
|
# block it with your reverse-proxy or WAF or something.
|
||||||
|
|
||||||
# I'm not blocking CCBot for now. It publishes a free index for anyone to use.
|
# I'm not blocking CCBot for now. It publishes a free index for anyone to use.
|
||||||
# Googe used this to train the initial version of Bard (now called Gemini).
|
# Googe used this to train the initial version of Bard (now called Gemini).
|
||||||
# I allow CCBot since its index is also used for upstart/hobbyist search engines
|
# I allow CCBot since its index is also used for upstart/hobbyist search engines
|
||||||
|
|
Loading…
Reference in a new issue