mirror of
https://git.sr.ht/~seirdy/seirdy.one
synced 2024-11-23 12:52:10 +00:00
block another LLM scraper
This commit is contained in:
parent
185311289d
commit
100a6f3d11
1 changed files with 5 additions and 0 deletions
|
@ -124,6 +124,11 @@ Disallow: /
|
||||||
# This one doesn't support robots.txt: https://www.allenai.org/crawler
|
# This one doesn't support robots.txt: https://www.allenai.org/crawler
|
||||||
# block it with your reverse-proxy or WAF or something.
|
# block it with your reverse-proxy or WAF or something.
|
||||||
|
|
||||||
|
# See <https://ds.rois.ac.jp/center8/crawler/>
|
||||||
|
# Parent page says it builds LLMs in the infographic: <https://ds.rois.ac.jp/center8/>
|
||||||
|
User-agent: Cotoyogi
|
||||||
|
Disallow: /
|
||||||
|
|
||||||
# I'm not blocking CCBot for now. It publishes a free index for anyone to use.
|
# I'm not blocking CCBot for now. It publishes a free index for anyone to use.
|
||||||
# Googe used this to train the initial version of Bard (now called Gemini).
|
# Googe used this to train the initial version of Bard (now called Gemini).
|
||||||
# I allow CCBot since its index is also used for upstart/hobbyist search engines
|
# I allow CCBot since its index is also used for upstart/hobbyist search engines
|
||||||
|
|
Loading…
Reference in a new issue