1
0
Fork 0
mirror of https://git.sr.ht/~seirdy/seirdy.one synced 2024-12-25 02:02:11 +00:00
seirdy.one/content/notes/opting-out-of-llm-indexing.md
2023-04-21 22:45:43 -07:00

1.1 KiB

title date replyURI replyTitle replyType replyAuthor replyAuthorURI syndicatedCopies
Opting out of LLM indexing 2023-04-21T22:40:04-07:00 https://chriscoyier.net/2023/04/21/the-secret-list-of-websites/ “the secret list of websites” BlogPosting Chris Coyier https://chriscoyier.net/
title url
The Fediverse https://pleroma.envs.net/notice/AUttq9kpOmeYZDHRTc

I added an entry to my robots.txt to block ChatGPT's crawler, but blocking crawling isn't the same as blocking indexing; it looks like Google chose to use the Common Crawl for this and sidestep the need to do crawling of its own. That's a strange decision; after all, Google has a much larger proprietary index at its disposal.

A "secret list of websites" was an ironic choice of words, given that this originates from the Common Crawl. It's sad to see Common Crawl (ab)used for this, but I suppose we should have seen it coming.

I know Google tells authors how to qualify/disqualify from rich results, but I don't see any docs for opting a site out of LLM/Bard training.