mirror of
https://git.sr.ht/~seirdy/seirdy.one
synced 2024-12-26 18:32:10 +00:00
1.1 KiB
1.1 KiB
title | date | replyURI | replyTitle | replyType | replyAuthor | replyAuthorURI | syndicatedCopies | |||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Opting out of LLM indexing | 2023-04-21T22:40:04-07:00 | https://chriscoyier.net/2023/04/21/the-secret-list-of-websites/ | “the secret list of websites” | BlogPosting | Chris Coyier | https://chriscoyier.net/ |
|
I added an entry to my robots.txt to block ChatGPT's crawler, but blocking crawling isn't the same as blocking indexing; it looks like Google chose to use the Common Crawl for this and sidestep the need to do crawling of its own. That's a strange decision; after all, Google has a much larger proprietary index at its disposal.
A "secret list of websites" was an ironic choice of words, given that this originates from the Common Crawl. It's sad to see Common Crawl (ab)used for this, but I suppose we should have seen it coming.
I know Google tells authors how to qualify/disqualify from rich results, but I don't see any docs for opting a site out of LLM/Bard training.