1
0
Fork 0
mirror of https://git.sr.ht/~seirdy/seirdy.one synced 2024-09-19 20:02:10 +00:00

Compare commits

..

11 commits

Author SHA1 Message Date
Seirdy
3d1ef48f22
Update for increased consensus requirement 2024-08-18 00:25:34 -04:00
Seirdy
969c5e0637
major typo 2024-08-14 16:49:55 -04:00
Seirdy
ddbbbd63c2
Typos 2024-08-09 23:54:40 -04:00
Seirdy
c6f5476e65
Add another upcoming engine 2024-08-09 23:06:34 -04:00
Seirdy
12f87d3ba0
Add future engines 2024-08-09 22:17:58 -04:00
Seirdy
e02dcff4ca
Add focus on self-hostability to Peekr 2024-08-09 22:00:33 -04:00
Seirdy
1f437024f6
Clairfy Peekr/Peekier 2024-08-09 21:55:07 -04:00
Seirdy
4b22946fc8
Replace SvMetaSearch with Peekr, add PriEco 2024-08-09 21:43:52 -04:00
Seirdy
1701c4b254
Slow down MJ12bot 2024-08-08 02:21:00 -04:00
Seirdy
b5a238b3e3
More receipts 2024-08-08 02:20:45 -04:00
Seirdy
d11f411a88
Typo in x-robots tag 2024-08-07 02:11:36 -04:00
6 changed files with 39 additions and 17 deletions

View file

@ -80,7 +80,7 @@ Searches performed on search boxes in the site footer and on the search page are
No other information is automatically shared with any third-parties, to my knowledge. No other information is automatically shared with any third-parties, to my knowledge.
I may share excerpts of of server logs with third parties if I am trying to resolve a technical issue. For example, I may submit an excerpt of an error log when filing a bug report. Any time I have to share such an excerpt, I remove or alter all identifying information. This includes, but is not limited to: IP addresses, timestamps, and any uniquely-identifying user-agent strings. I may share excerpts of server logs with third parties if I am trying to resolve a technical issue. For example, I may submit an excerpt of an error log when filing a bug report. Any time I have to share such an excerpt, I remove or alter all identifying information. This includes, but is not limited to: IP addresses, timestamps, and any uniquely-identifying user-agent strings.
I do not remove or alter identifying information when sharing excerpts of bot traffic. I do not remove or alter identifying information when sharing excerpts of bot traffic.
@ -111,4 +111,4 @@ By default, Web browsers may share characteristics about the user's hardware, co
By default, many networks and Internet service providers often alter requests by redirecting them or injecting content. I have prevented this behavior by using a secure TLS cipher suite. By default, many networks and Internet service providers often alter requests by redirecting them or injecting content. I have prevented this behavior by using a secure TLS cipher suite.
By default, most web browsers connect to a website over insecure HTTP when users don't specify don't specify a URL scheme; this is frequently exploited by hostile networks to inject content or re-direct traffic. I mitigate this to the extent I can by using a `Strict-Transport-Security` header, participating in HSTS-Preload lists, and adding an HTTPS DNS record for HTTP/2 and HTTP/3 DNS-based APLN. By default, most web browsers connect to a website over insecure HTTP when users don't specify a URL scheme; this is frequently exploited by hostile networks to inject content or re-direct traffic. I mitigate this to the extent I can by using a `Strict-Transport-Security` header, participating in HSTS-Preload lists, and adding an HTTPS DNS record for HTTP/2 and HTTP/3 DNS-based APLN.

View file

@ -47,9 +47,9 @@ This post is an attempt to document how they are made, their differences, their
## How Tier-0 and FediNuke work ## How Tier-0 and FediNuke work
[My tier-0 list](https://seirdy.one/pb/tier0.csv) is a subset of the `pleroma.envs.net` blocklist. It contains entries that appeared on at least **14 out of 27** other hand-picked instance blocklists ("bias sources"), with exceptions detailed below. Not all Tier-0 entries have the same level of severity; a smaller list containing what I personally deem the "worse half" of Tier 0 is [FediNuke.txt](https://seirdy.one/pb/FediNuke.txt). **Consensus** builds Tier-0; **severity** builds FediNuke. [My tier-0 list](https://seirdy.one/pb/tier0.csv) is a subset of the `pleroma.envs.net` blocklist. It contains entries that appeared on at least **15 out of 27** other hand-picked instance blocklists ("bias sources"), with exceptions detailed below. Not all Tier-0 entries have the same level of severity; a smaller list containing what I personally deem the "worse half" of Tier 0 is [FediNuke.txt](https://seirdy.one/pb/FediNuke.txt). **Consensus** builds Tier-0; **severity** builds FediNuke.
When I add a bias source, I may also increase the minimum number of votes required if I find that its blocklist is too close to (or mainly just imports all of) tier-0 or the blocklist of a bias source's blocklist. That's the reason why the threshold is 14 instead of 13. When I add a bias source, I may also increase the minimum number of votes required if I find that its blocklist is too close to (or mainly just imports all of) tier-0 or the blocklist of a bias source's blocklist. That's the reason why the threshold is 15 instead of 13 or 14.
All entries use the root domains when applicable, or are as close to the root domain as possible without triggering false-positives. All entries use the root domains when applicable, or are as close to the root domain as possible without triggering false-positives.
@ -57,7 +57,7 @@ All entries use the root domains when applicable, or are as close to the root do
There were some block-overrides for instances with fewer than 15 votes. Here's how I went about overriding: There were some block-overrides for instances with fewer than 15 votes. Here's how I went about overriding:
- If an instance has **10 votes,** I may elect to add it after additional review instead of waiting for it to hit 14 votes. - If an instance has **10 votes,** I may elect to add it after additional review instead of waiting for it to hit 15 votes.
- If an instance is run by **the same staff as another Tier-0 instance** and has **at least 5 votes,** I may add it after asking other admins about it and getting multiple thumbs-up from admins who import tier-0. - If an instance is run by **the same staff as another Tier-0 instance** and has **at least 5 votes,** I may add it after asking other admins about it and getting multiple thumbs-up from admins who import tier-0.
- If an instance contains **blatant/unapologetic bigotry** (something really undeniable, like Nazi imagery or excessive use of slurs in violent/hateful/definitely-not-reclaimed contexts) with staff approval or involvement, I may add it to both tier-0 and `FediNuke.txt` after I get multiple thumbs-up. - If an instance contains **blatant/unapologetic bigotry** (something really undeniable, like Nazi imagery or excessive use of slurs in violent/hateful/definitely-not-reclaimed contexts) with staff approval or involvement, I may add it to both tier-0 and `FediNuke.txt` after I get multiple thumbs-up.
- If an instance becomes **risky even to many tier-0 instances** (untagged gore, dox attempts, significant cybersecurity risk, <abbr title="child sexual abuse material">CSAM</abbr>, etc. with staff approval or involvement): I may add it to both right away, skipping any process. This is rare. - If an instance becomes **risky even to many tier-0 instances** (untagged gore, dox attempts, significant cybersecurity risk, <abbr title="child sexual abuse material">CSAM</abbr>, etc. with staff approval or involvement): I may add it to both right away, skipping any process. This is rare.
@ -70,7 +70,7 @@ I also excluded Twitter mirrors such as BirdSiteLive and bird.makeup, and bridge
Criteria for a bias source: Criteria for a bias source:
1. Has a public blocklist I can easily download. 1. Has a blocklist I can easily download, possibly with an API key.
2. Practices timely and proactive moderation: doesn't just wait for another instance start interacting and cause trouble, and updates more often than once a month. Evaluating this takes time. 2. Practices timely and proactive moderation: doesn't just wait for another instance start interacting and cause trouble, and updates more often than once a month. Evaluating this takes time.
3. Blocks at least half of `FediNuke.txt`. 3. Blocks at least half of `FediNuke.txt`.
@ -254,13 +254,13 @@ Criteria for a good FediNuke receipt is stricter than the aforementioned criteri
13bells.com {#13bells} 13bells.com {#13bells}
: Admin posts [Sandy Hook school shooting conspiracy theories](https://archive.ph/dT9Am), [transmisia](https://archive.ph/Km8Ju), and [queermisia](https://web.archive.org/web/20230810193444/https://13bells.com/@amiko/110810353137172742). : Admin posts [Sandy Hook school shooting conspiracy theories](https://archive.ph/dT9Am), [transmisia](https://archive.ph/Km8Ju), and [queermisia](https://web.archive.org/web/20230810193444/https://13bells.com/@amiko/110810353137172742). Admin [spreads vaccine misinformation](https://ghostarchive.org/archive/9Kvug).
1611.social {#1611} 1611.social {#1611}
: [antisemitism from admin](https://web.archive.org/web/20230628203218/https://1611.social/@tyler/posts/AX9r68rwjhEQzMKJbU), [anti-vax from admin](https://archive.li/qFhQQ) with other local members, [antisemitic caricature from admin](https://archive.ph/3wIRL). : [antisemitism from admin](https://web.archive.org/web/20230628203218/https://1611.social/@tyler/posts/AX9r68rwjhEQzMKJbU), [anti-vax from admin](https://archive.li/qFhQQ) with other local members, [antisemitic caricature from admin](https://archive.ph/3wIRL).
4aem.com {#4aem} 4aem.com {#4aem}
: Freeze peach PeerTube instance on the "tube" subdomain, hosting blatant antisemitic content. ["Clown World" dogwhistles](https://archive.ph/80Zwu), antisemitic ["Why Kanye West is right"](https://archive.ph/06UiV). : Freeze peach PeerTube instance on the "tube" subdomain, hosting blatant antisemitic content. ["Clown World" dogwhistles](https://archive.ph/80Zwu), antisemitic ["Why Kanye West is right"](https://archive.ph/06UiV), [more antisemitism](https://ghostarchive.org/archive/itBDB).
5dollah.click {#5dollah} 5dollah.click {#5dollah}
: [Blatantly racist use of slurs](https://web.archive.org/web/20230803174643/https://5dollah.click/notice/AYFv0JRxfVez3K6ftQ) from staff account, [more racism from same account](https://web.archive.org/web/20230803174620/https://5dollah.click/notice/AYGifHRMwMFURlzgye). : [Blatantly racist use of slurs](https://web.archive.org/web/20230803174643/https://5dollah.click/notice/AYFv0JRxfVez3K6ftQ) from staff account, [more racism from same account](https://web.archive.org/web/20230803174620/https://5dollah.click/notice/AYGifHRMwMFURlzgye).
@ -287,7 +287,6 @@ anon-kenkai.net {#anon-kenkai}
asbestos.cafe {#asbestos} asbestos.cafe {#asbestos}
: [Racism and ableism from admin](https://archive.ph/d7cfQ). : [Racism and ableism from admin](https://archive.ph/d7cfQ).
: [Admin sharing a dox attempt](https://archive.ph/LUA10) : [Admin sharing a dox attempt](https://archive.ph/LUA10)
: [Transmedical gatekeeping from a cis user](https://web.archive.org/web/20230731041522/https://the.asbestos.cafe/notice/AYCymnXAtifMLgNzJg).
: [Antisemitism from local user](https://web.archive.org/web/20230803032450/https://shrine.moe/notice/AYJfESHlqB0IvHPfqS) followed by ableism from admin. [Admin defends antisemitism](https://web.archive.org/web/20240110032818/https://the.asbestos.cafe/notice/AdgBSAo0q5L63JPXtY). : [Antisemitism from local user](https://web.archive.org/web/20230803032450/https://shrine.moe/notice/AYJfESHlqB0IvHPfqS) followed by ableism from admin. [Admin defends antisemitism](https://web.archive.org/web/20240110032818/https://the.asbestos.cafe/notice/AdgBSAo0q5L63JPXtY).
: [Queermisic user](https://web.archive.org/web/20230803032850/https://pl.starnix.network/notice/AY1JMsQpMH4NukiNE0). : [Queermisic user](https://web.archive.org/web/20230803032850/https://pl.starnix.network/notice/AY1JMsQpMH4NukiNE0).
@ -364,7 +363,7 @@ crucible.world {#crucible}
: [Even more transmisia](https://archive.ph/WVFrK). : [Even more transmisia](https://archive.ph/WVFrK).
cum.camp {#cumcamp} OR cum.salon cum.camp {#cumcamp} OR cum.salon
: Instance has MRF policies to reject deletes and run a blockbot [on cum.camp](https://web.archive.org/web/20230730232539/https://cum.camp/about) and [on cum.salon](https://web.archive.org/web/20221228172530/https://cum.salon/about). Staff members "pernia" and "nimt" are known for [overly-creepy posts related to sexual assault, esp. CSA](https://web.archive.org/web/20230730234254/https://boymoder.biz/notice/AXyuRlHglkmt1AHPn6), [another](https://web.archive.org/web/20230730233035/https://marsey.moe/@pernia@cum.salon/posts/AY8crsXbKZHmCIApgu). : Instance has MRF policies to reject deletes and run a blockbot [on cum.camp](https://web.archive.org/web/20230730232539/https://cum.camp/about) and [on cum.salon](https://web.archive.org/web/20221228172530/https://cum.salon/about). Staff members "pernia" and "nimt" are known for [overly-creepy posts related to sexual assault, esp. CSA](https://web.archive.org/web/20230730234254/https://boymoder.biz/notice/AXyuRlHglkmt1AHPn6), [another](https://ghostarchive.org/archive/gDzD1).
: The cum.salon domain name was recently terminated by PorkBun after several people reported it for publishing dox materials. Other instances have locally overridden their DNS to continue federating until its TLS certificate expires; however, [it came back after transferring to Epik](https://web.archive.org/web/20230819012541/https://shitposter.club/notice/AYpWAIw53KQXoohBbM). : The cum.salon domain name was recently terminated by PorkBun after several people reported it for publishing dox materials. Other instances have locally overridden their DNS to continue federating until its TLS certificate expires; however, [it came back after transferring to Epik](https://web.archive.org/web/20230819012541/https://shitposter.club/notice/AYpWAIw53KQXoohBbM).
cunnyborea.space {#cunnyborea} cunnyborea.space {#cunnyborea}

View file

@ -52,7 +52,7 @@ These are large engines that pass all my standard tests and more.
* PrivacyWall * PrivacyWall
* Lilo * Lilo
* SearchScene * SearchScene
* Peekier * Peekier (not to be confused with Peekr, a metasearch engine with its own index)
* Oscobo * Oscobo
* Million Short * Million Short
* Yippy search⁶ * Yippy search⁶
@ -107,11 +107,13 @@ Yep supports Open Graph and some JSON-LD at the moment. A look through the sourc
These engines fail badly at a few important tests. Otherwise, they seem to work well enough. These engines fail badly at a few important tests. Otherwise, they seem to work well enough.
* Peekr (formerly SvMetaSearch, not to be confused with Peekier): Originally a SearxNG metasearch engine that also included results from its own index, it's since diverged. It now appears to return all results from its own growing ElasticSearch index. Open source, with an emphasis on self-hostability.
* Infotiger: My favorite engine in this section. It offers advanced result filtering and sports a somewhat large index. It allows site submission for English and German pages. The fastest-improving engine in this section; I look forward to the day it "graduates" to the previous section. Infotiger also has a Tor hidden service. * Infotiger: My favorite engine in this section. It offers advanced result filtering and sports a somewhat large index. It allows site submission for English and German pages. The fastest-improving engine in this section; I look forward to the day it "graduates" to the previous section. Infotiger also has a Tor hidden service.
* seekport: The interface is in German but it supports searching in English just fine. The default language is selected by your locale. Its really good considering its small index; it hasnt heard of less common terms, but its able to find relevant results in other tests. It's the second-fastest-improving engines in this section. * seekport: The interface is in German but it supports searching in English just fine. The default language is selected by your locale. Its really good considering its small index; it hasnt heard of less common terms, but its able to find relevant results in other tests. It's the second-fastest-improving engines in this section.
* Exalead: slow, quality is hit-and-miss. Its indexer claims to crawl the DMOZ directory, which has since shut down and been replaced by the Curlie directory. No relevant results for “Oppenheimer” and some other history-related queries. Allows submitting individual URLs for indexing, but requires solving a Google reCAPTCHA and entering an email address. * Exalead: slow, quality is hit-and-miss. Its indexer claims to crawl the DMOZ directory, which has since shut down and been replaced by the Curlie directory. No relevant results for “Oppenheimer” and some other history-related queries. Allows submitting individual URLs for indexing, but requires solving a Google reCAPTCHA and entering an email address.
* ExactSeek: small index, disproportionately dominated by big sites. Failed multiple tests. Allows submitting individual URLs for crawling, but requires entering an email address and receiving a newsletter. Webmaster tools seem to heavily push for paid SEO options. It also powers SitesOnDisplay and Blog-search.com. * ExactSeek: small index, disproportionately dominated by big sites. Failed multiple tests. Allows submitting individual URLs for crawling, but requires entering an email address and receiving a newsletter. Webmaster tools seem to heavily push for paid SEO options. It also powers SitesOnDisplay and Blog-search.com.
=> https://peekr.org/
=> https://infotiger.com/ Infotiger => https://infotiger.com/ Infotiger
=> http://infotiger4xywbfq45mvd5drh43jpqeurakg2ya7gqwvjf2bbwnixzqd.onion/ Infotiger hidden service => http://infotiger4xywbfq45mvd5drh43jpqeurakg2ya7gqwvjf2bbwnixzqd.onion/ Infotiger hidden service
=> http://www.seekport.com/ seekport (HTTP only) => http://www.seekport.com/ seekport (HTTP only)
@ -200,8 +202,9 @@ Brave Search partially powers Kagi (documented in 2023, unclear after docs remov
=> https://kagifeedback.org/d/2808-reconsider-your-partnership-with-brave Kagi feedback ticket on partnership with Brave, allowing blatant homophobia in the discussion. => https://kagifeedback.org/d/2808-reconsider-your-partnership-with-brave Kagi feedback ticket on partnership with Brave, allowing blatant homophobia in the discussion.
=> https://kagifeedback.org/d/865-suicide-results-should-probably-have-a-dont-do-that-widget-like-google/50 Kagi feedback ticket on suicide results => https://kagifeedback.org/d/865-suicide-results-should-probably-have-a-dont-do-that-widget-like-google/50 Kagi feedback ticket on suicide results
* SVMetaSearch: A SearxNG metasearch engine that also includes results from its own index. All other sources can be turned off. Like most public Searx/SearxNG instances, reliability is very poor. * PriEco: A metasearch engine with one option for using its own index. Found in my access logs. All other sources can be turned off, allowing you to see its unique results. At the time of writing, its own index is unfortunately quite tiny.
=> https://svmetasearch.eu.org/ => https://prieco.net/
## Non-generalist search ## Non-generalist search
@ -392,6 +395,11 @@ These engines were originally included in the article, but have since been disco
=> https://www.parsijoo.ir/ Parsijoo => https://www.parsijoo.ir/ Parsijoo
=> https://www.moose.at Moose.at => https://www.moose.at Moose.at
## Upcoming engines
=> https://cyberfind.net/bot.html Cyberfind
=> https://fynd.bot/ fynd
## Exclusions ## Exclusions
Three engines were excluded from this list for having a far-right focus. Three engines were excluded from this list for having a far-right focus.

View file

@ -83,7 +83,7 @@ Bing
- PrivacyWall - PrivacyWall
- Lilo - Lilo
- Search&shy;Scene - Search&shy;Scene
- Peekier - Peekier (not to be confused with Peekr, a metasearch engine with its own index)
- Oscobo - Oscobo
- Million Short - Million Short
- Yippy search[^6] - Yippy search[^6]
@ -137,6 +137,9 @@ Yep supports Open Graph and some JSON-LD at the moment. A look through the sourc
These engines fail badly at a few important tests. Otherwise, they seem to work well enough for users who'd like some more serendipity in less-specific searches. These engines fail badly at a few important tests. Otherwise, they seem to work well enough for users who'd like some more serendipity in less-specific searches.
[Peekr (formerly SvMetaSearch, not to be confused with Peekier)](https://peekr.org/)
: Originally a SearxNG metasearch engine that also included results from its own index, it's since diverged. It now appears to return all results from its own growing ElasticSearch index. Open source, with an emphasis on self-hostability.
[Infotiger](https://alpha.infotiger.com/) [Infotiger](https://alpha.infotiger.com/)
: My favorite engine in this section. It offers advanced result filtering and sports a somewhat large index. It allows site submission for English and German pages. The fastest-improving engine in this section: I use it often to discover new sites, and look forward to the day it "graduates" to the previous section. [Infotiger also has a Tor hidden service](http://infotiger4xywbfq45mvd5drh43jpqeurakg2ya7gqwvjf2bbwnixzqd.onion/). : My favorite engine in this section. It offers advanced result filtering and sports a somewhat large index. It allows site submission for English and German pages. The fastest-improving engine in this section: I use it often to discover new sites, and look forward to the day it "graduates" to the previous section. [Infotiger also has a Tor hidden service](http://infotiger4xywbfq45mvd5drh43jpqeurakg2ya7gqwvjf2bbwnixzqd.onion/).
@ -227,8 +230,8 @@ Engines in this category fall back to GBY when their own indexes don't have enou
[Kagi Search](https://kagi.com/) [Kagi Search](https://kagi.com/)
: The most interesting entry in this category, IMO. Like Neeva, it requires an account and limits use without payment. It's powered by its own Teclis index (Teclis can be used independently; see the [non-commercial section](#small-or-non-commercial-web) below), and claims to also use results from Google and Bing. The result seems somewhat unique: I'm able to recognize some results from the Teclis index mixed in with the mainstream ones. In addition to Teclis, Kagi's other products include the [Kagi.ai](https://kagi.ai/) intelligent answer service and the [TinyGem](https://tinygem.org/) social bookmarking service, both of which play a role in Kagi.com in the present or future. Unrelatedly: I'm concerned about the company's biases, as it seems happy to [use Brave's commercial API](https://kagifeedback.org/d/2808-reconsider-your-partnership-with-brave) (allowing blatant homophobia in the comments) and [allow its results to recommend suicide methods without intervention](https://kagifeedback.org/d/865-suicide-results-should-probably-have-a-dont-do-that-widget-like-google/50). I reject the idea that avoiding an option that may seem politically biased is the same as being unbiased if such a decision has real political implications. : The most interesting entry in this category, IMO. Like Neeva, it requires an account and limits use without payment. It's powered by its own Teclis index (Teclis can be used independently; see the [non-commercial section](#small-or-non-commercial-web) below), and claims to also use results from Google and Bing. The result seems somewhat unique: I'm able to recognize some results from the Teclis index mixed in with the mainstream ones. In addition to Teclis, Kagi's other products include the [Kagi.ai](https://kagi.ai/) intelligent answer service and the [TinyGem](https://tinygem.org/) social bookmarking service, both of which play a role in Kagi.com in the present or future. Unrelatedly: I'm concerned about the company's biases, as it seems happy to [use Brave's commercial API](https://kagifeedback.org/d/2808-reconsider-your-partnership-with-brave) (allowing blatant homophobia in the comments) and [allow its results to recommend suicide methods without intervention](https://kagifeedback.org/d/865-suicide-results-should-probably-have-a-dont-do-that-widget-like-google/50). I reject the idea that avoiding an option that may seem politically biased is the same as being unbiased if such a decision has real political implications.
[SVMetaSearch](https://svmetasearch.eu.org/) [PriEco](https://prieco.net/)
: A SearxNG metasearch engine that also includes results from its own index. All other sources can be turned off. Like most public Searx/SearxNG instances, reliability is very poor. : A metasearch engine with one option for using its own index. Found in my access logs. All other sources can be turned off, allowing you to see its unique results. At the time of writing, its own index is unfortunately quite tiny.
## Non-generalist search ## Non-generalist search
@ -419,6 +422,11 @@ Dead engines I don't have an extended description for:
- [Moose.at](https://www.moose.at): German (Austria-based). The site is still up but redirects searches to Brave. - [Moose.at](https://www.moose.at): German (Austria-based). The site is still up but redirects searches to Brave.
## Upcoming engines
- [Cyberfind](https://cyberfind.net/bot.html)
- [fynd](https://fynd.bot/)
## Exclusions ## Exclusions
Three engines were excluded from this list for having a far-right focus. Three engines were excluded from this list for having a far-right focus.

View file

@ -16,7 +16,7 @@
<!-- Only index the canonical locations, not the envs.net mirror. --> <!-- Only index the canonical locations, not the envs.net mirror. -->
{{ if or (eq (trim site.BaseURL "/") site.Params.CanonicalBaseURL) (in site.BaseURL "wgq3bd2kqoybhstp77i3wrzbfnsyd27wt34psaja4grqiezqircorkyd.onion") -}} {{ if or (eq (trim site.BaseURL "/") site.Params.CanonicalBaseURL) (in site.BaseURL "wgq3bd2kqoybhstp77i3wrzbfnsyd27wt34psaja4grqiezqircorkyd.onion") -}}
<!-- See https://noml.info/, https://www.deviantart.com/team/journal/UPDATE-All-Deviations-Are-Opted-Out-of-AI-Datasets-934500371 --> <!-- See https://noml.info/, https://www.deviantart.com/team/journal/UPDATE-All-Deviations-Are-Opted-Out-of-AI-Datasets-934500371 -->
<meta name="robots" content="index,follow,max-image-preview:large,max-snippet=-1,noai,noimageai,noml" /> <meta name="robots" content="index,follow,max-image-preview:large,max-snippet:-1,noai,noimageai,noml" />
{{ else -}} {{ else -}}
<meta name="robots" content="noindex,nofollow,noimageindex,noai,noimageai" /> <meta name="robots" content="noindex,nofollow,noimageindex,noai,noimageai" />
{{ end -}} {{ end -}}

View file

@ -87,6 +87,10 @@ Disallow: /
User-agent: PiplBot User-agent: PiplBot
Disallow: / Disallow: /
# Well-known overly-aggressive bot that claims to respect robots.txt: http://mj12bot.com/
User-agent: MJ12bot
Crawl-Delay: 10
## Gen-AI data scrapers ## ## Gen-AI data scrapers ##
# Eat shit, OpenAI. # Eat shit, OpenAI.
@ -117,6 +121,9 @@ User-Agent: FacebookBot
User-Agent: meta-externalagent User-Agent: meta-externalagent
Disallow: / Disallow: /
# This one doesn't support robots.txt: https://www.allenai.org/crawler
# block it with your reverse-proxy or WAF or something.
# I'm not blocking CCBot for now. It publishes a free index for anyone to use. # I'm not blocking CCBot for now. It publishes a free index for anyone to use.
# Googe used this to train the initial version of Bard (now called Gemini). # Googe used this to train the initial version of Bard (now called Gemini).
# I allow CCBot since its index is also used for upstart/hobbyist search engines # I allow CCBot since its index is also used for upstart/hobbyist search engines