From 247ec11daea82f2fa166ba1952666a9e0f6cd4cd Mon Sep 17 00:00:00 2001 From: Rohan Kumar Date: Wed, 20 Mar 2024 21:34:55 -0400 Subject: [PATCH] Add some more docs to robots.txt --- static/robots.txt | 3 +++ 1 file changed, 3 insertions(+) diff --git a/static/robots.txt b/static/robots.txt index 5f5fe3c..0100a69 100644 --- a/static/robots.txt +++ b/static/robots.txt @@ -88,4 +88,7 @@ Disallow: / # I'm not familiar enough with Omgili to make a call here. # In the long run, my embedded robots meta-tags and headers could cover gen-AI +# I don't block cohere-ai or Perplexitybot: they don't appear to actually scrape data for LLM training purposes. The crawling powers search engines with integrated pre-trained LLMs. +# TODO: investigate whether YouBot scrapes to train its own in-house LLM. + Sitemap: https://seirdy.one/sitemap.xml