diff --git a/content/meta/scrapers-i-block.md b/content/meta/scrapers-i-block.md index 7ab8ec5..917b3c4 100644 --- a/content/meta/scrapers-i-block.md +++ b/content/meta/scrapers-i-block.md @@ -7,9 +7,13 @@ tags: - AI outputs: - html -#syndicatedCopies: -# - title: 'The Fediverse' -# url: '' +syndicatedCopies: + - title: 'The Fediverse' + url: 'https://pleroma.envs.net/objects/c022c39f-5cab-49c8-bf84-7504326e87dc' + - title: 'BlueSky' + url: 'https://bsky.app/profile/seirdy.one/post/3lcwjwqkc5k2z' + - title: 'jstpst' + url: 'https://jstpst.net/f/articles/12091/scrapers-i-block-and-allow-with-explanations' ---
@@ -67,7 +71,7 @@ That said, many robots do not respect `X-Robots` tags, or an `X-Robots` tag does ### Comparing robots.txt, X-Robots, and the TDM Reservation Protocol -Ideally, `X-Robots` would support per-engine rules in a standard format,such as robots.txt. Unfortunately, it doesn't. Unlike robots.txt, it has per-page rules that persist on archives of a given page, potentially saved in other data sets. +Ideally, `X-Robots` would support per-engine rules in a standard format, such as robots.txt. Unfortunately, it doesn't. Unlike robots.txt, it has per-page rules that persist on archives of a given page, potentially saved in other data sets. A successor to `X-Robots` called the [Text and Data Mining (TDM) Reservation Protocol](https://w3c.github.io/tdm-reservation-protocol/) may close the gap between `X-Robots` and robots.txt. It allows setting policies on data-mining to forbid LLM training, among other things.