1
0
Fork 0
mirror of https://git.sr.ht/~seirdy/seirdy.one synced 2024-09-19 20:02:10 +00:00

Compare commits

..

11 commits

Author SHA1 Message Date
Rohan Kumar
dd7462e26a
fix dead links 2024-05-24 11:05:12 -04:00
Rohan Kumar
9184e7897c
Normalize lobste.rs reply permalinks
Lobste.rs replies have real permalinks that are sort of hidden, which
work better than anchor links for IndieWeb purposes.
2024-05-24 10:48:25 -04:00
Rohan Kumar
9c52666c37
Fix dead links 2024-05-24 10:24:27 -04:00
Rohan Kumar
5583210c6e
Update retronaut prev/next links 2024-05-24 09:28:45 -04:00
Rohan Kumar
c423c3656d
Leave JFF webring
Webring home page is down
2024-05-24 09:25:05 -04:00
Rohan Kumar
2a7d51a5bc
merge two paragraphs 2024-05-24 09:02:29 -04:00
Rohan Kumar
79b8f13096
Syndicate 2024-05-24 08:52:21 -04:00
Rohan Kumar
a018ae6499
New note: next steps for search engine article 2024-05-24 08:43:41 -04:00
Rohan Kumar
5dda4c2df3
Update number of excluded engines
I discovered a third far-right engine a long time ago
2024-05-24 08:30:47 -04:00
Rohan Kumar
3928df3ab3
Elaborate on word substitutions 2024-05-24 08:27:17 -04:00
Rohan Kumar
d18e4862c8
Update methodology
Add info on word substitutions and deprecate existing word substitution
list.
2024-05-24 07:56:38 -04:00
16 changed files with 68 additions and 22 deletions

View file

@ -61,7 +61,7 @@ The {{<mention-work itemtype="WebSite">}}{{<cited-work name="1MB Club" url="http
- [Writer's Lane, Nightfall City](https://nightfall.city/writers-lane/)
- [Just Another Useless Page](https://www.geocities.ws/jaup/jaup.htm)
- [Webrings Fanlisting](https://fanlistings.nickifaulk.com/webrings/)
- ~~[Yesterlinks](https://links.yesterweb.org/)~~
- [Yesterlinks (archived)](https://web.archive.org/web/20230817122434/https://links.yesterweb.org/)
- [Gossip's Web](https://gossipsweb.net/personal-websites)
- [Nixers](https://github.com/nixers-projects/sites/wiki/List-of-nixers.net-user-sites)
- [Smooth Sailing](https://smoothsailing.asclaria.org/)

View file

@ -1,7 +1,7 @@
---
title: "JS-enabled engines"
date: 2022-06-02T18:30:30-07:00
replyURI: "https://mk.nixnet.social/notes/911asmc9rn"
replyURI: "http://archive.today/2022.09.10-195228/https://mk.nixnet.social/notes/911asmc9rn"
replyTitle: "if search engine crawlers didn't run JavaScript the Web would be better"
replyType: "SocialMediaPosting"
replyAuthor: "Alexandra"

View file

@ -0,0 +1,24 @@
---
title: "Next steps for my search engine collection"
date: 2024-05-24T08:48:41-04:00
replyURI: "https://seirdy.one/posts/2021/03/10/search-engines-with-own-indexes/"
replyTitle: "A look at search engines with their own indexes"
replyType: "BlogPosting"
replyAuthor: "Seirdy"
replyAuthorURI: "https://seirdy.one/"
syndicatedCopies:
- title: 'The Fediverse'
url: 'https://pleroma.envs.net/notice/AiDUlrCthb1fPcvSgi'
---
My search engine article blew up recently, as [yet another major publication linked it](https://arstechnica.com/gadgets/2024/05/bing-outage-shows-just-how-little-competition-google-search-really-has/2/) (yay! /gen), so I made some fixes:
- Moved a couple engines to the graveyard. h/t to {{<indieweb-person url="https://dequbed.space/" name="dequbed" itemprop="mentions">}} for telling me about moose.at's demise, and to my broken link checker for telling me about Siik being down for a while now.
- Updated my methodology section to emphasize how I now use word-substitutions to fingerprint an engine's source. Queries that lend themselves to word-substitution are now **my primary approach to uncovering which provider an engine uses for search results,** followed by some long-tail queries and search operator support.
The full list of changes is in [the Git log](https://git.sr.ht/~seirdy/seirdy.one/log/master/item/content/posts/search-engines-with-own-indexes.md). Things I should add in the future:
- I ought to add a section to describe why I don't personally like metasearch engines as much as I used to. TLDR: each engine has its own quirks and search operators that I learn to use, and throwing them all on one page forces me to use whatever quirks they have in common. This is really bad for long-tail queries.
- I should put more effort into categorizing engines by strength as well as index size. I'll have to come up with appropriate terms for things like "ability to find specific pages with specific topics" (less aggressive word substitutions, less focus on semantic search: Mojeek is one of the best at this, Yep is terrible), or "ability to discover pages about a given broad topic" (Yep is good at this once you learn to use it well, but Mojeek isn't).
That second bullet point is really important. Part of the point of this article is to show that **nobody can beat Google at being Google** (except perhaps Bing), but we can beat Google at more specific niches.

View file

@ -1,11 +1,12 @@
---
title: "RDF versus semantic HTML"
date: 2022-09-13T21:30:02-07:00
replyURI: "https://cybre.space/@jauntywunderkind420/108993489770129012"
lastMod: 2022-09-13T21:30:02-07:00
replyURI: "https://web.archive.org/web/20231201000536/https://cybre.space/@jauntywunderkind420/108993489770129012"
replyTitle: "Microdata and rdfa are excellent and wonderful ways to describe individual html elements"
replyType: "SocialMediaPosting"
replyAuthor: "@jauntywunderkind420@cybre.space"
replyAuthorURI: "https://cybre.space/@jauntywunderkind420/"
replyAuthorURI: "https://web.archive.org/web/20230202143104/https://cybre.space/@jauntywunderkind420/"
---
> microdata and rdfa both directly mark up existing html content.

View file

@ -5,7 +5,7 @@ replyURI: "https://archive.today/hxOsO"
replyTitle: "The amount of water other food need to produce 1kg of food"
replyType: "SocialMediaPosting"
replyAuthor: "Fristi"
replyAuthorURI: "https://comfitu.re/"
replyAuthorURI: "https://croc-monsieur.nl/"
---
I have mixed feelings about infographics that reduce ecological footprints to single scalar non-fungible values.

View file

@ -8,7 +8,7 @@ replyAuthor: "toastal"
replyAuthorURI: "https://toast.al"
syndicatedCopies:
- title: 'Lobsters'
url: 'https://lobste.rs/s/ehzhcw/semantic_markup_for_callouts#c_ddiqt8'
url: 'https://lobste.rs/comments/ddiqt8/reply'
- title: 'The Fediverse'
url: 'https://pleroma.envs.net/notice/AZFO77yIoQhSicea1I'
---

View file

@ -8,7 +8,7 @@ syndicatedCopies:
- title: 'The Fediverse'
url: 'https://pleroma.envs.net/notice/ASl0lOGNcl5GNJL6Jc'
- title: 'Lobsters'
url: 'https://lobste.rs/s/coy6gt/why_is_building_ui_rust_so_hard#c_f6rvfi'
url: 'https://lobste.rs/comments/f6rvfi/reply'
---
How does Warp stack against other toolkits when it comes to accessibility and system integration?

View file

@ -1,7 +1,8 @@
---
title: "User agents set the terms"
date: 2022-08-12T00:27:26-07:00
replyURI: "https://lobste.rs/s/dusuzt/let_websites_framebust_out_native_apps#c_dqolcq"
lastMod: 2022-08-12T00:27:26-07:00
replyURI: "https://lobste.rs/comments/dqolcq/reply"
replyTitle: "I have the freedom to set the terms on which I will offer access to a website of mine."
replyType: "DiscussionForumPosting"
replyAuthor: "James Bennet"

View file

@ -1,7 +1,8 @@
---
title: "User choice and progressive enhancement"
date: 2022-06-27T14:31:21-07:00
replyURI: "https://lobste.rs/s/mvw7zd/details_as_menu#c_lxwjcc"
lastMod: 2022-06-27T14:31:21-07:00
replyURI: "https://lobste.rs/comments/lxwjcc/reply"
replyTitle: "These browsers are mostly used by tech-savvy people"
replyType: "SocialMediaPosting"
replyAuthor: "Matt Campbell"

View file

@ -1,7 +1,8 @@
---
title: "Using BoringSSL"
date: 2022-10-30T13:10:29-07:00
replyURI: "https://lobste.rs/s/9eas9d/you_should_prepare_for_openssl_3_x_secvuln#c_sk5f3v"
lastMod: 2023-05-27T03:57:41Z
replyURI: "https://lobste.rs/comments/sk5f3v/reply"
replyTitle: "“BoringSSL…is not intended for general use”"
replyType: "Comment"
replyAuthor: "AJ Jordan"
@ -10,7 +11,7 @@ syndicatedCopies:
- title: 'The Fediverse'
url: 'https://pleroma.envs.net/notice/AUjf1wCr0xk0yCVpKK'
- title: 'Lobsters'
url: 'https://lobste.rs/s/9eas9d/you_should_prepare_for_openssl_3_x_secvuln#c_lreowa'
url: 'https://lobste.rs/comments/lreowa/reply'
---
Despite BoringSSL's "not intended for general use" warning, it's used by many projects:

View file

@ -1,11 +1,12 @@
---
title: "Website security scanners"
date: 2022-11-02T11:56:02-07:00
replyURI: "https://plem.sapphic.site/notice/APB6VSqinvWjm1yHgW"
replyURI: "https://pleroma.envs.net/notice/APB6Va7FFvgXN801L6"
replyTitle: "why does hardenize still check for Expect-CT when the header is deprecated"
replyType: "SocialMediaPosting"
replyAuthor: "r3g_5z"
replyAuthorURI: "https://blog.girlboss.ceo/"
replyAuthorURI: "https://girlboss.ceo/"
lastMod: 2022-11-26T19:20:46Z
---
Speaking generally: I think most website security scanners (Webbkoll, Observatory, et al) lend themselves to cargo-cults. You don't need [most Content Security Policy directives](https://w3c.github.io/webappsec-csp/#csp-directives) for a PNG file, for instance. Warning against a missing `X-Frame-Options` feels wrong: even the latest version of iOS 9---the oldest iOS release to support secure TLS 1.2 <abbr>ECDSA</abbr> ciphers---seems to support `frame-ancestors` (correct me if I'm wrong).

View file

@ -389,7 +389,7 @@ These engines were originally included in the article, but have since been disco
## Exclusions
Two engines were excluded from this list for having a far-right focus.
Three engines were excluded from this list for having a far-right focus.
One engine was excluded because it seems to be built using cryptocurrency in a way I'd rather not support.
@ -456,13 +456,21 @@ I tried to pick queries that should have a good number of results and show varia
* “vim”, “emacs”, “neovim”, and “nvimrc”: Search engines with relevant results for “nvimrc” typically have a big index. Finding relevant results for the text editors “vim” and “emacs” instead of other topics that share the name is a challenging task.
* “vim cleaner”: should return results related to a line of cleaning products rather than the Correct Text Editor.
* “Seirdy”: My site is relatively low-traffic, but my nickname is pretty unique and visible on several of the highest-traffic sites out there.
* “Project London”: a small movie made with volunteers and FLOSS without much advertising. If links related to the movie show up, the engines really good.
* “oppenheimer”: a name that could refer to many things. Without context, it should refer to the physicist who worked on the atomic bomb in Los Alamos. Other historical queries: “magna carta” (intermediate), “the prince” (very hard).
* “Project London”: a small movie made with volunteers and FLOSS without much advertising. If links related to small independent projects like this show up, the index has really good coverage of movies.
* “oppenheimer” versus "J Robert Oppenheimer": a name that could refer to many things. Without context, it could refer to a high-budget movie or the physicist who led the Manhattan Project in Los Alamos. Other historical queries: “magna carta” (intermediate), “the prince” (very hard).
(Update: I don't use these queries anymore; I've found better tests in recent months).
Some less-mainstream engines have noticed this article, which is great! I've had excellent discussions with people who work on several of these engines. Unfortunately, this article's visibility also incentivizes some engines to optimize specifically for any methodology I describe. I've addressed this by keeping a long list of test queries to myself. The simple queries above are a decent starting point for simple quick evaluations, but I also test for common search operators, keyword length, and types of domain-specific jargon. I also use queries designed to pull up specific pages with varying levels of popularity and recency to gauge the size, scope, and growth of an index.
Professional critics often work anonymously because personalization can damage the integrity of their reviews. For similar reasons, I attempt to try each engine anonymously at least once by using a VPN and/or my standard anonymous setup: an amnesiac Whonix VM with the Tor Browser. I also often test using a fresh profile when travelling, or via a Searx instance if it supports a given engine. When avoiding personalization, I use "varied" queries that I don't repeat verbatim across search engines; this reduces the likelihood of identifying me. I also attempt to spread these tests out over time so admins won't notice an unusual uptick in unpredictable and esoteric searches. This might seem overkill, but I already regularly employ similar methods for a variety of different scenarios.
### Unique results without unique indexes
Some engines, like Kagi and the Ask.com family of engines, have unique-looking results from external indexes. Unique results alone don't always imply independence, as an engine could alter ranking or add filters (something that very few engines are permitted to do; Google and Microsoft generally impose a strict ToS forbidding modification).
Another possible indicator I look for is word substitutions. Nearly every engine supports substitutions for verb tense or singular/plural word forms, but more advanced semantic substitutions are less common. Returning the same results for "matza gebrent" and "matzo brei" requires a deep understanding of related food topics. Google and Bing return nearly identical results for the two queries, but engines like Mojeek return entirely different results. I often compare an engine's word substitutions to see if they're similar to another engine's, and see how many results from the top 20 are not present in the top 30-40 on other engines. I have a working list of other word substitutions I test.
### Caveats
I didn't try to avoid personalization when testing engines that require account creation. Entries in the "hit-and-miss" and "unusable" sections got less attention: for instance, I didn't spend a lot of effort tracking results over time to see how new entries got added to them.

View file

@ -414,7 +414,7 @@ Dead engines I don't have an extended description for:
## Exclusions
Two engines were excluded from this list for having a far-right focus.
Three engines were excluded from this list for having a far-right focus.
One engine was excluded because it seems to be built using cryptocurrency in a way I'd rather not support.
@ -480,14 +480,22 @@ I tried to pick queries that should have a good number of results and show varia
- "Seirdy": My site is relatively low-traffic, but my nickname is pretty unique and visible on several of the highest-traffic sites out there.
- "Project London": a small movie made with volunteers and <abbr title="Free, Libre, Open-Source Software">FLOSS</abbr> without much advertising. If links related to the movie show up, the engine's really good.
- "Project London": a small movie made with volunteers and <abbr title="Free, Libre, Open-Source Software">FLOSS</abbr> without much advertising. If links related to small independent projects like this show up, the index has really good coverage of movies.
- "oppenheimer": a name that could refer to many things. Without context, it should refer to the physicist who worked on the atomic bomb in Los Alamos. Other historical queries: "magna carta" (intermediate), "the prince" (very hard).
- “oppenheimer” versus "J Robert Oppenheimer": a name that could refer to many things. Without context, it could refer to a high-budget movie or the physicist who led the Manhattan Project in Los Alamos. Other historical queries: “magna carta” (intermediate), “the prince” (very hard).
(Update: I don't use these queries anymore; I've found better tests in recent months).
Some less-mainstream engines have noticed this article, which is great! I've had excellent discussions with people who work on several of these engines. Unfortunately, this article's visibility also incentivizes some engines to optimize specifically for any methodology I describe. I've addressed this by keeping a long list of test queries to myself. The simple queries above are a decent starting point for simple quick evaluations, but I also test for common search operators, keyword length, and types of domain-specific jargon. I also use queries designed to pull up specific pages with varying levels of popularity and recency to gauge the size, scope, and growth of an index.
Professional critics often work anonymously because personalization can damage the integrity of their reviews. For similar reasons, I attempt to try each engine anonymously at least once by using a VPN and/or my standard anonymous setup: an amnesiac Whonix VM with the Tor Browser. I also often test using a fresh profile when travelling, or via a Searx instance if it supports a given engine. When avoiding personalization, I use "varied" queries that I don't repeat verbatim across search engines; this reduces the likelihood of identifying me. I also attempt to spread these tests out over time so admins won't notice an unusual uptick in unpredictable and esoteric searches. This might seem overkill, but I already regularly employ similar methods for a variety of different scenarios.
### Unique results without unique indexes
Some engines, like Kagi and the Ask.com family of engines, have unique-looking results from external indexes. Unique results alone don't always imply independence, as an engine could alter ranking or add filters (something that very few engines are permitted to do; Google and Microsoft generally impose a strict ToS forbidding modification).
Another possible indicator I look for is word substitutions. Nearly every engine supports substitutions for verb tense or singular/plural word forms, but more advanced semantic substitutions are less common. Returning the same results for "matza gebrent" and "matzo brei" requires a deep understanding of related food topics. Google and Bing return nearly identical results for the two queries, but engines like Mojeek return entirely different results. I often compare an engine's word substitutions to see if they're similar to another engine's, and see how many results from the top 20 are not present in the top 30-40 on other engines. I have a working list of other word substitutions I test.
### Caveats
I didn't try to avoid personalization when testing engines that require account creation. Entries in the "hit-and-miss" and "unusable" sections got less attention: I didn't spend a lot of effort tracking results over time to see how new entries got added to them.

View file

@ -1612,7 +1612,7 @@ Imagine your typical "modern" website's deployment pipeline. It requires thousan
Ten years from now, how much of this will still work?
Try to ensure that your website can be archived, and/or easily re-built and served on an ordinary server. This way, your work can still be made accessible after you're gone. For example: all my site requires to build is a tarball of statically-linked binaries, a POSIX shell, and a decent Make implementation (bmake and GNU make work) to build; see [my build manifest](https://git.sr.ht/~seirdy/seirdy.one/tree/master/item/.build.yml). To serve, it just needs a static web server.
Try to ensure that your website can be archived, and/or easily re-built and served on an ordinary server. This way, your work can still be made accessible after you're gone. For example: all my site requires to build is a tarball of statically-linked binaries, a POSIX shell, and a decent Make implementation (bmake and GNU make work) to build; see [my build manifest](https://git.sr.ht/~seirdy/seirdy.one/tree/e591c9d1ee54c16c40f4b8f2c1eab9e830577681/item/.build.yml). To serve, it just needs a static web server.
Testing
-------

View file

@ -33,3 +33,5 @@ https://brid.gy/comment/reddit/Seirdy/sy3r4t/hxws016,https://old.reddit.com/comm
https://snowdin.town/notice/AOevybwoSx4xW4lX3w,https://web.archive.org/web/20230422173223/https://snowdin.town/notice/AOevybwoSx4xW4lX3w
https://aroace.space/opinions,https://web.archive.org/web/20231208204623/https://aroace.space/opinions
https://mastodon.art/@TerryHancock/108392881014059285,https://web.archive.org/web/20221117225952/https://mastodon.art/@TerryHancock/108392881014059285
https://lobste.rs/s/dmkw4d/how_back_up_your_git_repositories#c_ktr31d,https://lobste.rs/comments/ktr31d/reply
https://lobste.rs/s/rbocxw/my_thoughts_on_writing_minecraft_server#c_xsjz1w,https://lobste.rs/comments/xsjz1w/reply

1 https://www.tinybrain.fans/accessibility.html https://tinybrain.fans/accessibility.html
33 https://snowdin.town/notice/AOevybwoSx4xW4lX3w https://web.archive.org/web/20230422173223/https://snowdin.town/notice/AOevybwoSx4xW4lX3w
34 https://aroace.space/opinions https://web.archive.org/web/20231208204623/https://aroace.space/opinions
35 https://mastodon.art/@TerryHancock/108392881014059285 https://web.archive.org/web/20221117225952/https://mastodon.art/@TerryHancock/108392881014059285
36 https://lobste.rs/s/dmkw4d/how_back_up_your_git_repositories#c_ktr31d https://lobste.rs/comments/ktr31d/reply
37 https://lobste.rs/s/rbocxw/my_thoughts_on_writing_minecraft_server#c_xsjz1w https://lobste.rs/comments/xsjz1w/reply

View file

@ -7,9 +7,8 @@ no ai,https://baccyflap.com/noai/?prv&s=srd,https://baccyflap.com/noai,https://b
TheOldNet,https://webring.theoldnet.com/member/ba438275f00f5df1a2e78e547424d05e/previous/navigate,https://webring.theoldnet.com/,https://webring.theoldnet.com/member/ba438275f00f5df1a2e78e547424d05e/next/navigate,https://webring.theoldnet.com/member/ba438275f00f5df1a2e78e547424d05e/random/navigate
geekring,https://geekring.net/site/167/previous,https://geekring.net/,https://geekring.net/site/167/next,https://geekring.net/site/167/random
Loop (JS),https://loop.graycot.dev/webring.html?action=prev,https://docs.graycot.dev/s/MFowZsw_F,https://loop.graycot.dev/webring.html?action=next,https://loop.graycot.dev/webring.html?action=rand
Retronaut,https://tonicfunk.neocities.org/,https://webring.dinhe.net/,https://members.vistaserv.net/jrs/index.html,null
Retronaut,https://webring.dinhe.net/prev/https://seirdy.one/,https://webring.dinhe.net/,https://webring.dinhe.net/next/https://seirdy.one/,null
Hotline,https://hotlinewebring.club/seirdy/previous,https://hotlinewebring.club,https://hotlinewebring.club/seirdy/next,null
Bucket (JS),https://webring.bucketfish.me/redirect.html?to=prev&name=seirdy,https://webring.bucketfish.me/,https://webring.bucketfish.me/redirect.html?to=next&name=seirdy,null
Devring,https://devring.club/sites/5/prev,https://devring.club,https://devring.club/sites/5/next,https://devring.club/random
Cuddler,https://cuddler-webring.netlify.app/seirdy/previous,https://cuddler-webring.netlify.app/,https://cuddler-webring.netlify.app/seirdy/next,null
Just For Fun,https://webri.ng/webring/emjustforfun/previous?via=https://seirdy.one/,https://em.elliotsmoon.net/webring/index.html,https://webri.ng/webring/emjustforfun/next?via=https://seirdy.one/,https://webri.ng/webring/emjustforfun/random?via=https://seirdy.one/

1 name prev home next random
7 TheOldNet https://webring.theoldnet.com/member/ba438275f00f5df1a2e78e547424d05e/previous/navigate https://webring.theoldnet.com/ https://webring.theoldnet.com/member/ba438275f00f5df1a2e78e547424d05e/next/navigate https://webring.theoldnet.com/member/ba438275f00f5df1a2e78e547424d05e/random/navigate
8 geekring https://geekring.net/site/167/previous https://geekring.net/ https://geekring.net/site/167/next https://geekring.net/site/167/random
9 Loop (JS) https://loop.graycot.dev/webring.html?action=prev https://docs.graycot.dev/s/MFowZsw_F https://loop.graycot.dev/webring.html?action=next https://loop.graycot.dev/webring.html?action=rand
10 Retronaut https://tonicfunk.neocities.org/ https://webring.dinhe.net/prev/https://seirdy.one/ https://webring.dinhe.net/ https://members.vistaserv.net/jrs/index.html https://webring.dinhe.net/next/https://seirdy.one/ null
11 Hotline https://hotlinewebring.club/seirdy/previous https://hotlinewebring.club https://hotlinewebring.club/seirdy/next null
12 Bucket (JS) https://webring.bucketfish.me/redirect.html?to=prev&name=seirdy https://webring.bucketfish.me/ https://webring.bucketfish.me/redirect.html?to=next&name=seirdy null
13 Devring https://devring.club/sites/5/prev https://devring.club https://devring.club/sites/5/next https://devring.club/random
14 Cuddler https://cuddler-webring.netlify.app/seirdy/previous https://cuddler-webring.netlify.app/ https://cuddler-webring.netlify.app/seirdy/next null
Just For Fun https://webri.ng/webring/emjustforfun/previous?via=https://seirdy.one/ https://em.elliotsmoon.net/webring/index.html https://webri.ng/webring/emjustforfun/next?via=https://seirdy.one/ https://webri.ng/webring/emjustforfun/random?via=https://seirdy.one/