seirdy.one/content/notes/opting-out-of-llm-indexing.md at 7c777f60b7ed9e5ffc928d5f4289b8c287a3a726

fmaury/seirdy.one

Fork 0

mirror of https://git.sr.ht/~seirdy/seirdy.one synced 2024-11-27 14:12:09 +00:00

Rohan Kumar e4592387a3

syndicate

2023-04-21 22:45:43 -07:00

1.1 KiB

Raw Blame History

title

date

replyURI

replyTitle

replyType

replyAuthor

replyAuthorURI

syndicatedCopies

Opting out of LLM indexing

2023-04-21T22:40:04-07:00

https://chriscoyier.net/2023/04/21/the-secret-list-of-websites/

“the secret list of websites”

BlogPosting

Chris Coyier

https://chriscoyier.net/

title	url
The Fediverse	https://pleroma.envs.net/notice/AUttq9kpOmeYZDHRTc

I added an entry to my robots.txt to block ChatGPT's crawler, but blocking crawling isn't the same as blocking indexing; it looks like Google chose to use the Common Crawl for this and sidestep the need to do crawling of its own. That's a strange decision; after all, Google has a much larger proprietary index at its disposal.

A "secret list of websites" was an ironic choice of words, given that this originates from the Common Crawl. It's sad to see Common Crawl (ab)used for this, but I suppose we should have seen it coming.

I know Google tells authors how to qualify/disqualify from rich results, but I don't see any docs for opting a site out of LLM/Bard training.

1.1 KiB Raw Blame History

1.1 KiB

Raw Blame History