1
0
Fork 0
mirror of https://git.sr.ht/~seirdy/seirdy.one synced 2024-10-22 16:52:10 +00:00

block another LLM scraper

This commit is contained in:
Seirdy 2024-09-26 10:47:07 -04:00
parent 185311289d
commit 100a6f3d11
No known key found for this signature in database
GPG key ID: 1E892DB2A5F84479

View file

@ -124,6 +124,11 @@ Disallow: /
# This one doesn't support robots.txt: https://www.allenai.org/crawler # This one doesn't support robots.txt: https://www.allenai.org/crawler
# block it with your reverse-proxy or WAF or something. # block it with your reverse-proxy or WAF or something.
# See <https://ds.rois.ac.jp/center8/crawler/>
# Parent page says it builds LLMs in the infographic: <https://ds.rois.ac.jp/center8/>
User-agent: Cotoyogi
Disallow: /
# I'm not blocking CCBot for now. It publishes a free index for anyone to use. # I'm not blocking CCBot for now. It publishes a free index for anyone to use.
# Googe used this to train the initial version of Bard (now called Gemini). # Googe used this to train the initial version of Bard (now called Gemini).
# I allow CCBot since its index is also used for upstart/hobbyist search engines # I allow CCBot since its index is also used for upstart/hobbyist search engines