mirror of
https://git.sr.ht/~seirdy/seirdy.one
synced 2024-11-09 16:02:10 +00:00
Compare commits
11 commits
e123d0d2e9
...
3d1ef48f22
Author | SHA1 | Date | |
---|---|---|---|
|
3d1ef48f22 | ||
|
969c5e0637 | ||
|
ddbbbd63c2 | ||
|
c6f5476e65 | ||
|
12f87d3ba0 | ||
|
e02dcff4ca | ||
|
1f437024f6 | ||
|
4b22946fc8 | ||
|
1701c4b254 | ||
|
b5a238b3e3 | ||
|
d11f411a88 |
6 changed files with 39 additions and 17 deletions
|
@ -80,7 +80,7 @@ Searches performed on search boxes in the site footer and on the search page are
|
|||
|
||||
No other information is automatically shared with any third-parties, to my knowledge.
|
||||
|
||||
I may share excerpts of of server logs with third parties if I am trying to resolve a technical issue. For example, I may submit an excerpt of an error log when filing a bug report. Any time I have to share such an excerpt, I remove or alter all identifying information. This includes, but is not limited to: IP addresses, timestamps, and any uniquely-identifying user-agent strings.
|
||||
I may share excerpts of server logs with third parties if I am trying to resolve a technical issue. For example, I may submit an excerpt of an error log when filing a bug report. Any time I have to share such an excerpt, I remove or alter all identifying information. This includes, but is not limited to: IP addresses, timestamps, and any uniquely-identifying user-agent strings.
|
||||
|
||||
I do not remove or alter identifying information when sharing excerpts of bot traffic.
|
||||
|
||||
|
@ -111,4 +111,4 @@ By default, Web browsers may share characteristics about the user's hardware, co
|
|||
|
||||
By default, many networks and Internet service providers often alter requests by redirecting them or injecting content. I have prevented this behavior by using a secure TLS cipher suite.
|
||||
|
||||
By default, most web browsers connect to a website over insecure HTTP when users don't specify don't specify a URL scheme; this is frequently exploited by hostile networks to inject content or re-direct traffic. I mitigate this to the extent I can by using a `Strict-Transport-Security` header, participating in HSTS-Preload lists, and adding an HTTPS DNS record for HTTP/2 and HTTP/3 DNS-based APLN.
|
||||
By default, most web browsers connect to a website over insecure HTTP when users don't specify a URL scheme; this is frequently exploited by hostile networks to inject content or re-direct traffic. I mitigate this to the extent I can by using a `Strict-Transport-Security` header, participating in HSTS-Preload lists, and adding an HTTPS DNS record for HTTP/2 and HTTP/3 DNS-based APLN.
|
||||
|
|
|
@ -47,9 +47,9 @@ This post is an attempt to document how they are made, their differences, their
|
|||
|
||||
## How Tier-0 and FediNuke work
|
||||
|
||||
[My tier-0 list](https://seirdy.one/pb/tier0.csv) is a subset of the `pleroma.envs.net` blocklist. It contains entries that appeared on at least **14 out of 27** other hand-picked instance blocklists ("bias sources"), with exceptions detailed below. Not all Tier-0 entries have the same level of severity; a smaller list containing what I personally deem the "worse half" of Tier 0 is [FediNuke.txt](https://seirdy.one/pb/FediNuke.txt). **Consensus** builds Tier-0; **severity** builds FediNuke.
|
||||
[My tier-0 list](https://seirdy.one/pb/tier0.csv) is a subset of the `pleroma.envs.net` blocklist. It contains entries that appeared on at least **15 out of 27** other hand-picked instance blocklists ("bias sources"), with exceptions detailed below. Not all Tier-0 entries have the same level of severity; a smaller list containing what I personally deem the "worse half" of Tier 0 is [FediNuke.txt](https://seirdy.one/pb/FediNuke.txt). **Consensus** builds Tier-0; **severity** builds FediNuke.
|
||||
|
||||
When I add a bias source, I may also increase the minimum number of votes required if I find that its blocklist is too close to (or mainly just imports all of) tier-0 or the blocklist of a bias source's blocklist. That's the reason why the threshold is 14 instead of 13.
|
||||
When I add a bias source, I may also increase the minimum number of votes required if I find that its blocklist is too close to (or mainly just imports all of) tier-0 or the blocklist of a bias source's blocklist. That's the reason why the threshold is 15 instead of 13 or 14.
|
||||
|
||||
All entries use the root domains when applicable, or are as close to the root domain as possible without triggering false-positives.
|
||||
|
||||
|
@ -57,7 +57,7 @@ All entries use the root domains when applicable, or are as close to the root do
|
|||
|
||||
There were some block-overrides for instances with fewer than 15 votes. Here's how I went about overriding:
|
||||
|
||||
- If an instance has **10 votes,** I may elect to add it after additional review instead of waiting for it to hit 14 votes.
|
||||
- If an instance has **10 votes,** I may elect to add it after additional review instead of waiting for it to hit 15 votes.
|
||||
- If an instance is run by **the same staff as another Tier-0 instance** and has **at least 5 votes,** I may add it after asking other admins about it and getting multiple thumbs-up from admins who import tier-0.
|
||||
- If an instance contains **blatant/unapologetic bigotry** (something really undeniable, like Nazi imagery or excessive use of slurs in violent/hateful/definitely-not-reclaimed contexts) with staff approval or involvement, I may add it to both tier-0 and `FediNuke.txt` after I get multiple thumbs-up.
|
||||
- If an instance becomes **risky even to many tier-0 instances** (untagged gore, dox attempts, significant cybersecurity risk, <abbr title="child sexual abuse material">CSAM</abbr>, etc. with staff approval or involvement): I may add it to both right away, skipping any process. This is rare.
|
||||
|
@ -70,7 +70,7 @@ I also excluded Twitter mirrors such as BirdSiteLive and bird.makeup, and bridge
|
|||
|
||||
Criteria for a bias source:
|
||||
|
||||
1. Has a public blocklist I can easily download.
|
||||
1. Has a blocklist I can easily download, possibly with an API key.
|
||||
2. Practices timely and proactive moderation: doesn't just wait for another instance start interacting and cause trouble, and updates more often than once a month. Evaluating this takes time.
|
||||
3. Blocks at least half of `FediNuke.txt`.
|
||||
|
||||
|
@ -254,13 +254,13 @@ Criteria for a good FediNuke receipt is stricter than the aforementioned criteri
|
|||
|
||||
|
||||
13bells.com {#13bells}
|
||||
: Admin posts [Sandy Hook school shooting conspiracy theories](https://archive.ph/dT9Am), [transmisia](https://archive.ph/Km8Ju), and [queermisia](https://web.archive.org/web/20230810193444/https://13bells.com/@amiko/110810353137172742).
|
||||
: Admin posts [Sandy Hook school shooting conspiracy theories](https://archive.ph/dT9Am), [transmisia](https://archive.ph/Km8Ju), and [queermisia](https://web.archive.org/web/20230810193444/https://13bells.com/@amiko/110810353137172742). Admin [spreads vaccine misinformation](https://ghostarchive.org/archive/9Kvug).
|
||||
|
||||
1611.social {#1611}
|
||||
: [antisemitism from admin](https://web.archive.org/web/20230628203218/https://1611.social/@tyler/posts/AX9r68rwjhEQzMKJbU), [anti-vax from admin](https://archive.li/qFhQQ) with other local members, [antisemitic caricature from admin](https://archive.ph/3wIRL).
|
||||
|
||||
4aem.com {#4aem}
|
||||
: Freeze peach PeerTube instance on the "tube" subdomain, hosting blatant antisemitic content. ["Clown World" dogwhistles](https://archive.ph/80Zwu), antisemitic ["Why Kanye West is right"](https://archive.ph/06UiV).
|
||||
: Freeze peach PeerTube instance on the "tube" subdomain, hosting blatant antisemitic content. ["Clown World" dogwhistles](https://archive.ph/80Zwu), antisemitic ["Why Kanye West is right"](https://archive.ph/06UiV), [more antisemitism](https://ghostarchive.org/archive/itBDB).
|
||||
|
||||
5dollah.click {#5dollah}
|
||||
: [Blatantly racist use of slurs](https://web.archive.org/web/20230803174643/https://5dollah.click/notice/AYFv0JRxfVez3K6ftQ) from staff account, [more racism from same account](https://web.archive.org/web/20230803174620/https://5dollah.click/notice/AYGifHRMwMFURlzgye).
|
||||
|
@ -287,7 +287,6 @@ anon-kenkai.net {#anon-kenkai}
|
|||
asbestos.cafe {#asbestos}
|
||||
: [Racism and ableism from admin](https://archive.ph/d7cfQ).
|
||||
: [Admin sharing a dox attempt](https://archive.ph/LUA10)
|
||||
: [Transmedical gatekeeping from a cis user](https://web.archive.org/web/20230731041522/https://the.asbestos.cafe/notice/AYCymnXAtifMLgNzJg).
|
||||
: [Antisemitism from local user](https://web.archive.org/web/20230803032450/https://shrine.moe/notice/AYJfESHlqB0IvHPfqS) followed by ableism from admin. [Admin defends antisemitism](https://web.archive.org/web/20240110032818/https://the.asbestos.cafe/notice/AdgBSAo0q5L63JPXtY).
|
||||
: [Queermisic user](https://web.archive.org/web/20230803032850/https://pl.starnix.network/notice/AY1JMsQpMH4NukiNE0).
|
||||
|
||||
|
@ -364,7 +363,7 @@ crucible.world {#crucible}
|
|||
: [Even more transmisia](https://archive.ph/WVFrK).
|
||||
|
||||
cum.camp {#cumcamp} OR cum.salon
|
||||
: Instance has MRF policies to reject deletes and run a blockbot [on cum.camp](https://web.archive.org/web/20230730232539/https://cum.camp/about) and [on cum.salon](https://web.archive.org/web/20221228172530/https://cum.salon/about). Staff members "pernia" and "nimt" are known for [overly-creepy posts related to sexual assault, esp. CSA](https://web.archive.org/web/20230730234254/https://boymoder.biz/notice/AXyuRlHglkmt1AHPn6), [another](https://web.archive.org/web/20230730233035/https://marsey.moe/@pernia@cum.salon/posts/AY8crsXbKZHmCIApgu).
|
||||
: Instance has MRF policies to reject deletes and run a blockbot [on cum.camp](https://web.archive.org/web/20230730232539/https://cum.camp/about) and [on cum.salon](https://web.archive.org/web/20221228172530/https://cum.salon/about). Staff members "pernia" and "nimt" are known for [overly-creepy posts related to sexual assault, esp. CSA](https://web.archive.org/web/20230730234254/https://boymoder.biz/notice/AXyuRlHglkmt1AHPn6), [another](https://ghostarchive.org/archive/gDzD1).
|
||||
: The cum.salon domain name was recently terminated by PorkBun after several people reported it for publishing dox materials. Other instances have locally overridden their DNS to continue federating until its TLS certificate expires; however, [it came back after transferring to Epik](https://web.archive.org/web/20230819012541/https://shitposter.club/notice/AYpWAIw53KQXoohBbM).
|
||||
|
||||
cunnyborea.space {#cunnyborea}
|
||||
|
|
|
@ -52,7 +52,7 @@ These are large engines that pass all my standard tests and more.
|
|||
* PrivacyWall
|
||||
* Lilo
|
||||
* SearchScene
|
||||
* Peekier
|
||||
* Peekier (not to be confused with Peekr, a metasearch engine with its own index)
|
||||
* Oscobo
|
||||
* Million Short
|
||||
* Yippy search⁶
|
||||
|
@ -107,11 +107,13 @@ Yep supports Open Graph and some JSON-LD at the moment. A look through the sourc
|
|||
|
||||
These engines fail badly at a few important tests. Otherwise, they seem to work well enough.
|
||||
|
||||
* Peekr (formerly SvMetaSearch, not to be confused with Peekier): Originally a SearxNG metasearch engine that also included results from its own index, it's since diverged. It now appears to return all results from its own growing ElasticSearch index. Open source, with an emphasis on self-hostability.
|
||||
* Infotiger: My favorite engine in this section. It offers advanced result filtering and sports a somewhat large index. It allows site submission for English and German pages. The fastest-improving engine in this section; I look forward to the day it "graduates" to the previous section. Infotiger also has a Tor hidden service.
|
||||
* seekport: The interface is in German but it supports searching in English just fine. The default language is selected by your locale. It’s really good considering its small index; it hasn’t heard of less common terms, but it’s able to find relevant results in other tests. It's the second-fastest-improving engines in this section.
|
||||
* Exalead: slow, quality is hit-and-miss. Its indexer claims to crawl the DMOZ directory, which has since shut down and been replaced by the Curlie directory. No relevant results for “Oppenheimer” and some other history-related queries. Allows submitting individual URLs for indexing, but requires solving a Google reCAPTCHA and entering an email address.
|
||||
* ExactSeek: small index, disproportionately dominated by big sites. Failed multiple tests. Allows submitting individual URLs for crawling, but requires entering an email address and receiving a newsletter. Webmaster tools seem to heavily push for paid SEO options. It also powers SitesOnDisplay and Blog-search.com.
|
||||
|
||||
=> https://peekr.org/
|
||||
=> https://infotiger.com/ Infotiger
|
||||
=> http://infotiger4xywbfq45mvd5drh43jpqeurakg2ya7gqwvjf2bbwnixzqd.onion/ Infotiger hidden service
|
||||
=> http://www.seekport.com/ seekport (HTTP only)
|
||||
|
@ -200,8 +202,9 @@ Brave Search partially powers Kagi (documented in 2023, unclear after docs remov
|
|||
=> https://kagifeedback.org/d/2808-reconsider-your-partnership-with-brave Kagi feedback ticket on partnership with Brave, allowing blatant homophobia in the discussion.
|
||||
=> https://kagifeedback.org/d/865-suicide-results-should-probably-have-a-dont-do-that-widget-like-google/50 Kagi feedback ticket on suicide results
|
||||
|
||||
* SVMetaSearch: A SearxNG metasearch engine that also includes results from its own index. All other sources can be turned off. Like most public Searx/SearxNG instances, reliability is very poor.
|
||||
=> https://svmetasearch.eu.org/
|
||||
* PriEco: A metasearch engine with one option for using its own index. Found in my access logs. All other sources can be turned off, allowing you to see its unique results. At the time of writing, its own index is unfortunately quite tiny.
|
||||
=> https://prieco.net/
|
||||
|
||||
|
||||
## Non-generalist search
|
||||
|
||||
|
@ -392,6 +395,11 @@ These engines were originally included in the article, but have since been disco
|
|||
=> https://www.parsijoo.ir/ Parsijoo
|
||||
=> https://www.moose.at Moose.at
|
||||
|
||||
## Upcoming engines
|
||||
|
||||
=> https://cyberfind.net/bot.html Cyberfind
|
||||
=> https://fynd.bot/ fynd
|
||||
|
||||
## Exclusions
|
||||
|
||||
Three engines were excluded from this list for having a far-right focus.
|
||||
|
|
|
@ -83,7 +83,7 @@ Bing
|
|||
- PrivacyWall
|
||||
- Lilo
|
||||
- Search­Scene
|
||||
- Peekier
|
||||
- Peekier (not to be confused with Peekr, a metasearch engine with its own index)
|
||||
- Oscobo
|
||||
- Million Short
|
||||
- Yippy search[^6]
|
||||
|
@ -137,6 +137,9 @@ Yep supports Open Graph and some JSON-LD at the moment. A look through the sourc
|
|||
These engines fail badly at a few important tests. Otherwise, they seem to work well enough for users who'd like some more serendipity in less-specific searches.
|
||||
|
||||
|
||||
[Peekr (formerly SvMetaSearch, not to be confused with Peekier)](https://peekr.org/)
|
||||
: Originally a SearxNG metasearch engine that also included results from its own index, it's since diverged. It now appears to return all results from its own growing ElasticSearch index. Open source, with an emphasis on self-hostability.
|
||||
|
||||
[Infotiger](https://alpha.infotiger.com/)
|
||||
: My favorite engine in this section. It offers advanced result filtering and sports a somewhat large index. It allows site submission for English and German pages. The fastest-improving engine in this section: I use it often to discover new sites, and look forward to the day it "graduates" to the previous section. [Infotiger also has a Tor hidden service](http://infotiger4xywbfq45mvd5drh43jpqeurakg2ya7gqwvjf2bbwnixzqd.onion/).
|
||||
|
||||
|
@ -227,8 +230,8 @@ Engines in this category fall back to GBY when their own indexes don't have enou
|
|||
[Kagi Search](https://kagi.com/)
|
||||
: The most interesting entry in this category, IMO. Like Neeva, it requires an account and limits use without payment. It's powered by its own Teclis index (Teclis can be used independently; see the [non-commercial section](#small-or-non-commercial-web) below), and claims to also use results from Google and Bing. The result seems somewhat unique: I'm able to recognize some results from the Teclis index mixed in with the mainstream ones. In addition to Teclis, Kagi's other products include the [Kagi.ai](https://kagi.ai/) intelligent answer service and the [TinyGem](https://tinygem.org/) social bookmarking service, both of which play a role in Kagi.com in the present or future. Unrelatedly: I'm concerned about the company's biases, as it seems happy to [use Brave's commercial API](https://kagifeedback.org/d/2808-reconsider-your-partnership-with-brave) (allowing blatant homophobia in the comments) and [allow its results to recommend suicide methods without intervention](https://kagifeedback.org/d/865-suicide-results-should-probably-have-a-dont-do-that-widget-like-google/50). I reject the idea that avoiding an option that may seem politically biased is the same as being unbiased if such a decision has real political implications.
|
||||
|
||||
[SVMetaSearch](https://svmetasearch.eu.org/)
|
||||
: A SearxNG metasearch engine that also includes results from its own index. All other sources can be turned off. Like most public Searx/SearxNG instances, reliability is very poor.
|
||||
[PriEco](https://prieco.net/)
|
||||
: A metasearch engine with one option for using its own index. Found in my access logs. All other sources can be turned off, allowing you to see its unique results. At the time of writing, its own index is unfortunately quite tiny.
|
||||
|
||||
## Non-generalist search
|
||||
|
||||
|
@ -419,6 +422,11 @@ Dead engines I don't have an extended description for:
|
|||
|
||||
- [Moose.at](https://www.moose.at): German (Austria-based). The site is still up but redirects searches to Brave.
|
||||
|
||||
## Upcoming engines
|
||||
|
||||
- [Cyberfind](https://cyberfind.net/bot.html)
|
||||
- [fynd](https://fynd.bot/)
|
||||
|
||||
## Exclusions
|
||||
|
||||
Three engines were excluded from this list for having a far-right focus.
|
||||
|
|
|
@ -16,7 +16,7 @@
|
|||
<!-- Only index the canonical locations, not the envs.net mirror. -->
|
||||
{{ if or (eq (trim site.BaseURL "/") site.Params.CanonicalBaseURL) (in site.BaseURL "wgq3bd2kqoybhstp77i3wrzbfnsyd27wt34psaja4grqiezqircorkyd.onion") -}}
|
||||
<!-- See https://noml.info/, https://www.deviantart.com/team/journal/UPDATE-All-Deviations-Are-Opted-Out-of-AI-Datasets-934500371 -->
|
||||
<meta name="robots" content="index,follow,max-image-preview:large,max-snippet=-1,noai,noimageai,noml" />
|
||||
<meta name="robots" content="index,follow,max-image-preview:large,max-snippet:-1,noai,noimageai,noml" />
|
||||
{{ else -}}
|
||||
<meta name="robots" content="noindex,nofollow,noimageindex,noai,noimageai" />
|
||||
{{ end -}}
|
||||
|
|
|
@ -87,6 +87,10 @@ Disallow: /
|
|||
User-agent: PiplBot
|
||||
Disallow: /
|
||||
|
||||
# Well-known overly-aggressive bot that claims to respect robots.txt: http://mj12bot.com/
|
||||
User-agent: MJ12bot
|
||||
Crawl-Delay: 10
|
||||
|
||||
## Gen-AI data scrapers ##
|
||||
|
||||
# Eat shit, OpenAI.
|
||||
|
@ -117,6 +121,9 @@ User-Agent: FacebookBot
|
|||
User-Agent: meta-externalagent
|
||||
Disallow: /
|
||||
|
||||
# This one doesn't support robots.txt: https://www.allenai.org/crawler
|
||||
# block it with your reverse-proxy or WAF or something.
|
||||
|
||||
# I'm not blocking CCBot for now. It publishes a free index for anyone to use.
|
||||
# Googe used this to train the initial version of Bard (now called Gemini).
|
||||
# I allow CCBot since its index is also used for upstart/hobbyist search engines
|
||||
|
|
Loading…
Reference in a new issue