mirror of
https://git.sr.ht/~seirdy/seirdy.one
synced 2024-11-23 21:02:09 +00:00
Compare commits
No commits in common. "dd7462e26a9e8e8df35773c1ba802672e00cd290" and "48992ff91a88b6a2edcd80105de495fb9b083e89" have entirely different histories.
dd7462e26a
...
48992ff91a
16 changed files with 22 additions and 68 deletions
|
@ -61,7 +61,7 @@ The {{<mention-work itemtype="WebSite">}}{{<cited-work name="1MB Club" url="http
|
|||
- [Writer's Lane, Nightfall City](https://nightfall.city/writers-lane/)
|
||||
- [Just Another Useless Page](https://www.geocities.ws/jaup/jaup.htm)
|
||||
- [Webrings Fanlisting](https://fanlistings.nickifaulk.com/webrings/)
|
||||
- [Yesterlinks (archived)](https://web.archive.org/web/20230817122434/https://links.yesterweb.org/)
|
||||
- ~~[Yesterlinks](https://links.yesterweb.org/)~~
|
||||
- [Gossip's Web](https://gossipsweb.net/personal-websites)
|
||||
- [Nixers](https://github.com/nixers-projects/sites/wiki/List-of-nixers.net-user-sites)
|
||||
- [Smooth Sailing](https://smoothsailing.asclaria.org/)
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
---
|
||||
title: "JS-enabled engines"
|
||||
date: 2022-06-02T18:30:30-07:00
|
||||
replyURI: "http://archive.today/2022.09.10-195228/https://mk.nixnet.social/notes/911asmc9rn"
|
||||
replyURI: "https://mk.nixnet.social/notes/911asmc9rn"
|
||||
replyTitle: "if search engine crawlers didn't run JavaScript the Web would be better"
|
||||
replyType: "SocialMediaPosting"
|
||||
replyAuthor: "Alexandra"
|
||||
|
|
|
@ -1,24 +0,0 @@
|
|||
---
|
||||
title: "Next steps for my search engine collection"
|
||||
date: 2024-05-24T08:48:41-04:00
|
||||
replyURI: "https://seirdy.one/posts/2021/03/10/search-engines-with-own-indexes/"
|
||||
replyTitle: "A look at search engines with their own indexes"
|
||||
replyType: "BlogPosting"
|
||||
replyAuthor: "Seirdy"
|
||||
replyAuthorURI: "https://seirdy.one/"
|
||||
syndicatedCopies:
|
||||
- title: 'The Fediverse'
|
||||
url: 'https://pleroma.envs.net/notice/AiDUlrCthb1fPcvSgi'
|
||||
---
|
||||
|
||||
My search engine article blew up recently, as [yet another major publication linked it](https://arstechnica.com/gadgets/2024/05/bing-outage-shows-just-how-little-competition-google-search-really-has/2/) (yay! /gen), so I made some fixes:
|
||||
|
||||
- Moved a couple engines to the graveyard. h/t to {{<indieweb-person url="https://dequbed.space/" name="dequbed" itemprop="mentions">}} for telling me about moose.at's demise, and to my broken link checker for telling me about Siik being down for a while now.
|
||||
- Updated my methodology section to emphasize how I now use word-substitutions to fingerprint an engine's source. Queries that lend themselves to word-substitution are now **my primary approach to uncovering which provider an engine uses for search results,** followed by some long-tail queries and search operator support.
|
||||
|
||||
The full list of changes is in [the Git log](https://git.sr.ht/~seirdy/seirdy.one/log/master/item/content/posts/search-engines-with-own-indexes.md). Things I should add in the future:
|
||||
|
||||
- I ought to add a section to describe why I don't personally like metasearch engines as much as I used to. TLDR: each engine has its own quirks and search operators that I learn to use, and throwing them all on one page forces me to use whatever quirks they have in common. This is really bad for long-tail queries.
|
||||
- I should put more effort into categorizing engines by strength as well as index size. I'll have to come up with appropriate terms for things like "ability to find specific pages with specific topics" (less aggressive word substitutions, less focus on semantic search: Mojeek is one of the best at this, Yep is terrible), or "ability to discover pages about a given broad topic" (Yep is good at this once you learn to use it well, but Mojeek isn't).
|
||||
|
||||
That second bullet point is really important. Part of the point of this article is to show that **nobody can beat Google at being Google** (except perhaps Bing), but we can beat Google at more specific niches.
|
|
@ -1,12 +1,11 @@
|
|||
---
|
||||
title: "RDF versus semantic HTML"
|
||||
date: 2022-09-13T21:30:02-07:00
|
||||
lastMod: 2022-09-13T21:30:02-07:00
|
||||
replyURI: "https://web.archive.org/web/20231201000536/https://cybre.space/@jauntywunderkind420/108993489770129012"
|
||||
replyURI: "https://cybre.space/@jauntywunderkind420/108993489770129012"
|
||||
replyTitle: "Microdata and rdfa are excellent and wonderful ways to describe individual html elements"
|
||||
replyType: "SocialMediaPosting"
|
||||
replyAuthor: "@jauntywunderkind420@cybre.space"
|
||||
replyAuthorURI: "https://web.archive.org/web/20230202143104/https://cybre.space/@jauntywunderkind420/"
|
||||
replyAuthorURI: "https://cybre.space/@jauntywunderkind420/"
|
||||
---
|
||||
|
||||
> microdata and rdfa both directly mark up existing html content.
|
||||
|
|
|
@ -5,7 +5,7 @@ replyURI: "https://archive.today/hxOsO"
|
|||
replyTitle: "The amount of water other food need to produce 1kg of food"
|
||||
replyType: "SocialMediaPosting"
|
||||
replyAuthor: "Fristi"
|
||||
replyAuthorURI: "https://croc-monsieur.nl/"
|
||||
replyAuthorURI: "https://comfitu.re/"
|
||||
---
|
||||
I have mixed feelings about infographics that reduce ecological footprints to single scalar non-fungible values.
|
||||
|
||||
|
|
|
@ -8,7 +8,7 @@ replyAuthor: "toastal"
|
|||
replyAuthorURI: "https://toast.al"
|
||||
syndicatedCopies:
|
||||
- title: 'Lobsters'
|
||||
url: 'https://lobste.rs/comments/ddiqt8/reply'
|
||||
url: 'https://lobste.rs/s/ehzhcw/semantic_markup_for_callouts#c_ddiqt8'
|
||||
- title: 'The Fediverse'
|
||||
url: 'https://pleroma.envs.net/notice/AZFO77yIoQhSicea1I'
|
||||
---
|
||||
|
|
|
@ -8,7 +8,7 @@ syndicatedCopies:
|
|||
- title: 'The Fediverse'
|
||||
url: 'https://pleroma.envs.net/notice/ASl0lOGNcl5GNJL6Jc'
|
||||
- title: 'Lobsters'
|
||||
url: 'https://lobste.rs/comments/f6rvfi/reply'
|
||||
url: 'https://lobste.rs/s/coy6gt/why_is_building_ui_rust_so_hard#c_f6rvfi'
|
||||
---
|
||||
How does Warp stack against other toolkits when it comes to accessibility and system integration?
|
||||
|
||||
|
|
|
@ -1,8 +1,7 @@
|
|||
---
|
||||
title: "User agents set the terms"
|
||||
date: 2022-08-12T00:27:26-07:00
|
||||
lastMod: 2022-08-12T00:27:26-07:00
|
||||
replyURI: "https://lobste.rs/comments/dqolcq/reply"
|
||||
replyURI: "https://lobste.rs/s/dusuzt/let_websites_framebust_out_native_apps#c_dqolcq"
|
||||
replyTitle: "I have the freedom to set the terms on which I will offer access to a website of mine."
|
||||
replyType: "DiscussionForumPosting"
|
||||
replyAuthor: "James Bennet"
|
||||
|
|
|
@ -1,8 +1,7 @@
|
|||
---
|
||||
title: "User choice and progressive enhancement"
|
||||
date: 2022-06-27T14:31:21-07:00
|
||||
lastMod: 2022-06-27T14:31:21-07:00
|
||||
replyURI: "https://lobste.rs/comments/lxwjcc/reply"
|
||||
replyURI: "https://lobste.rs/s/mvw7zd/details_as_menu#c_lxwjcc"
|
||||
replyTitle: "These browsers are mostly used by tech-savvy people"
|
||||
replyType: "SocialMediaPosting"
|
||||
replyAuthor: "Matt Campbell"
|
||||
|
|
|
@ -1,8 +1,7 @@
|
|||
---
|
||||
title: "Using BoringSSL"
|
||||
date: 2022-10-30T13:10:29-07:00
|
||||
lastMod: 2023-05-27T03:57:41Z
|
||||
replyURI: "https://lobste.rs/comments/sk5f3v/reply"
|
||||
replyURI: "https://lobste.rs/s/9eas9d/you_should_prepare_for_openssl_3_x_secvuln#c_sk5f3v"
|
||||
replyTitle: "“BoringSSL…is not intended for general use”"
|
||||
replyType: "Comment"
|
||||
replyAuthor: "AJ Jordan"
|
||||
|
@ -11,7 +10,7 @@ syndicatedCopies:
|
|||
- title: 'The Fediverse'
|
||||
url: 'https://pleroma.envs.net/notice/AUjf1wCr0xk0yCVpKK'
|
||||
- title: 'Lobsters'
|
||||
url: 'https://lobste.rs/comments/lreowa/reply'
|
||||
url: 'https://lobste.rs/s/9eas9d/you_should_prepare_for_openssl_3_x_secvuln#c_lreowa'
|
||||
---
|
||||
|
||||
Despite BoringSSL's "not intended for general use" warning, it's used by many projects:
|
||||
|
|
|
@ -1,12 +1,11 @@
|
|||
---
|
||||
title: "Website security scanners"
|
||||
date: 2022-11-02T11:56:02-07:00
|
||||
replyURI: "https://pleroma.envs.net/notice/APB6Va7FFvgXN801L6"
|
||||
replyURI: "https://plem.sapphic.site/notice/APB6VSqinvWjm1yHgW"
|
||||
replyTitle: "why does hardenize still check for Expect-CT when the header is deprecated"
|
||||
replyType: "SocialMediaPosting"
|
||||
replyAuthor: "r3g_5z"
|
||||
replyAuthorURI: "https://girlboss.ceo/"
|
||||
lastMod: 2022-11-26T19:20:46Z
|
||||
replyAuthorURI: "https://blog.girlboss.ceo/"
|
||||
---
|
||||
|
||||
Speaking generally: I think most website security scanners (Webbkoll, Observatory, et al) lend themselves to cargo-cults. You don't need [most Content Security Policy directives](https://w3c.github.io/webappsec-csp/#csp-directives) for a PNG file, for instance. Warning against a missing `X-Frame-Options` feels wrong: even the latest version of iOS 9---the oldest iOS release to support secure TLS 1.2 <abbr>ECDSA</abbr> ciphers---seems to support `frame-ancestors` (correct me if I'm wrong).
|
||||
|
|
|
@ -389,7 +389,7 @@ These engines were originally included in the article, but have since been disco
|
|||
|
||||
## Exclusions
|
||||
|
||||
Three engines were excluded from this list for having a far-right focus.
|
||||
Two engines were excluded from this list for having a far-right focus.
|
||||
|
||||
One engine was excluded because it seems to be built using cryptocurrency in a way I'd rather not support.
|
||||
|
||||
|
@ -456,21 +456,13 @@ I tried to pick queries that should have a good number of results and show varia
|
|||
* “vim”, “emacs”, “neovim”, and “nvimrc”: Search engines with relevant results for “nvimrc” typically have a big index. Finding relevant results for the text editors “vim” and “emacs” instead of other topics that share the name is a challenging task.
|
||||
* “vim cleaner”: should return results related to a line of cleaning products rather than the Correct Text Editor.
|
||||
* “Seirdy”: My site is relatively low-traffic, but my nickname is pretty unique and visible on several of the highest-traffic sites out there.
|
||||
* “Project London”: a small movie made with volunteers and FLOSS without much advertising. If links related to small independent projects like this show up, the index has really good coverage of movies.
|
||||
* “oppenheimer” versus "J Robert Oppenheimer": a name that could refer to many things. Without context, it could refer to a high-budget movie or the physicist who led the Manhattan Project in Los Alamos. Other historical queries: “magna carta” (intermediate), “the prince” (very hard).
|
||||
|
||||
(Update: I don't use these queries anymore; I've found better tests in recent months).
|
||||
* “Project London”: a small movie made with volunteers and FLOSS without much advertising. If links related to the movie show up, the engine’s really good.
|
||||
* “oppenheimer”: a name that could refer to many things. Without context, it should refer to the physicist who worked on the atomic bomb in Los Alamos. Other historical queries: “magna carta” (intermediate), “the prince” (very hard).
|
||||
|
||||
Some less-mainstream engines have noticed this article, which is great! I've had excellent discussions with people who work on several of these engines. Unfortunately, this article's visibility also incentivizes some engines to optimize specifically for any methodology I describe. I've addressed this by keeping a long list of test queries to myself. The simple queries above are a decent starting point for simple quick evaluations, but I also test for common search operators, keyword length, and types of domain-specific jargon. I also use queries designed to pull up specific pages with varying levels of popularity and recency to gauge the size, scope, and growth of an index.
|
||||
|
||||
Professional critics often work anonymously because personalization can damage the integrity of their reviews. For similar reasons, I attempt to try each engine anonymously at least once by using a VPN and/or my standard anonymous setup: an amnesiac Whonix VM with the Tor Browser. I also often test using a fresh profile when travelling, or via a Searx instance if it supports a given engine. When avoiding personalization, I use "varied" queries that I don't repeat verbatim across search engines; this reduces the likelihood of identifying me. I also attempt to spread these tests out over time so admins won't notice an unusual uptick in unpredictable and esoteric searches. This might seem overkill, but I already regularly employ similar methods for a variety of different scenarios.
|
||||
|
||||
### Unique results without unique indexes
|
||||
|
||||
Some engines, like Kagi and the Ask.com family of engines, have unique-looking results from external indexes. Unique results alone don't always imply independence, as an engine could alter ranking or add filters (something that very few engines are permitted to do; Google and Microsoft generally impose a strict ToS forbidding modification).
|
||||
|
||||
Another possible indicator I look for is word substitutions. Nearly every engine supports substitutions for verb tense or singular/plural word forms, but more advanced semantic substitutions are less common. Returning the same results for "matza gebrent" and "matzo brei" requires a deep understanding of related food topics. Google and Bing return nearly identical results for the two queries, but engines like Mojeek return entirely different results. I often compare an engine's word substitutions to see if they're similar to another engine's, and see how many results from the top 20 are not present in the top 30-40 on other engines. I have a working list of other word substitutions I test.
|
||||
|
||||
### Caveats
|
||||
|
||||
I didn't try to avoid personalization when testing engines that require account creation. Entries in the "hit-and-miss" and "unusable" sections got less attention: for instance, I didn't spend a lot of effort tracking results over time to see how new entries got added to them.
|
||||
|
|
|
@ -414,7 +414,7 @@ Dead engines I don't have an extended description for:
|
|||
|
||||
## Exclusions
|
||||
|
||||
Three engines were excluded from this list for having a far-right focus.
|
||||
Two engines were excluded from this list for having a far-right focus.
|
||||
|
||||
One engine was excluded because it seems to be built using cryptocurrency in a way I'd rather not support.
|
||||
|
||||
|
@ -480,22 +480,14 @@ I tried to pick queries that should have a good number of results and show varia
|
|||
|
||||
- "Seirdy": My site is relatively low-traffic, but my nickname is pretty unique and visible on several of the highest-traffic sites out there.
|
||||
|
||||
- "Project London": a small movie made with volunteers and <abbr title="Free, Libre, Open-Source Software">FLOSS</abbr> without much advertising. If links related to small independent projects like this show up, the index has really good coverage of movies.
|
||||
- "Project London": a small movie made with volunteers and <abbr title="Free, Libre, Open-Source Software">FLOSS</abbr> without much advertising. If links related to the movie show up, the engine's really good.
|
||||
|
||||
- “oppenheimer” versus "J Robert Oppenheimer": a name that could refer to many things. Without context, it could refer to a high-budget movie or the physicist who led the Manhattan Project in Los Alamos. Other historical queries: “magna carta” (intermediate), “the prince” (very hard).
|
||||
|
||||
(Update: I don't use these queries anymore; I've found better tests in recent months).
|
||||
- "oppenheimer": a name that could refer to many things. Without context, it should refer to the physicist who worked on the atomic bomb in Los Alamos. Other historical queries: "magna carta" (intermediate), "the prince" (very hard).
|
||||
|
||||
Some less-mainstream engines have noticed this article, which is great! I've had excellent discussions with people who work on several of these engines. Unfortunately, this article's visibility also incentivizes some engines to optimize specifically for any methodology I describe. I've addressed this by keeping a long list of test queries to myself. The simple queries above are a decent starting point for simple quick evaluations, but I also test for common search operators, keyword length, and types of domain-specific jargon. I also use queries designed to pull up specific pages with varying levels of popularity and recency to gauge the size, scope, and growth of an index.
|
||||
|
||||
Professional critics often work anonymously because personalization can damage the integrity of their reviews. For similar reasons, I attempt to try each engine anonymously at least once by using a VPN and/or my standard anonymous setup: an amnesiac Whonix VM with the Tor Browser. I also often test using a fresh profile when travelling, or via a Searx instance if it supports a given engine. When avoiding personalization, I use "varied" queries that I don't repeat verbatim across search engines; this reduces the likelihood of identifying me. I also attempt to spread these tests out over time so admins won't notice an unusual uptick in unpredictable and esoteric searches. This might seem overkill, but I already regularly employ similar methods for a variety of different scenarios.
|
||||
|
||||
### Unique results without unique indexes
|
||||
|
||||
Some engines, like Kagi and the Ask.com family of engines, have unique-looking results from external indexes. Unique results alone don't always imply independence, as an engine could alter ranking or add filters (something that very few engines are permitted to do; Google and Microsoft generally impose a strict ToS forbidding modification).
|
||||
|
||||
Another possible indicator I look for is word substitutions. Nearly every engine supports substitutions for verb tense or singular/plural word forms, but more advanced semantic substitutions are less common. Returning the same results for "matza gebrent" and "matzo brei" requires a deep understanding of related food topics. Google and Bing return nearly identical results for the two queries, but engines like Mojeek return entirely different results. I often compare an engine's word substitutions to see if they're similar to another engine's, and see how many results from the top 20 are not present in the top 30-40 on other engines. I have a working list of other word substitutions I test.
|
||||
|
||||
### Caveats
|
||||
|
||||
I didn't try to avoid personalization when testing engines that require account creation. Entries in the "hit-and-miss" and "unusable" sections got less attention: I didn't spend a lot of effort tracking results over time to see how new entries got added to them.
|
||||
|
|
|
@ -1612,7 +1612,7 @@ Imagine your typical "modern" website's deployment pipeline. It requires thousan
|
|||
|
||||
Ten years from now, how much of this will still work?
|
||||
|
||||
Try to ensure that your website can be archived, and/or easily re-built and served on an ordinary server. This way, your work can still be made accessible after you're gone. For example: all my site requires to build is a tarball of statically-linked binaries, a POSIX shell, and a decent Make implementation (bmake and GNU make work) to build; see [my build manifest](https://git.sr.ht/~seirdy/seirdy.one/tree/e591c9d1ee54c16c40f4b8f2c1eab9e830577681/item/.build.yml). To serve, it just needs a static web server.
|
||||
Try to ensure that your website can be archived, and/or easily re-built and served on an ordinary server. This way, your work can still be made accessible after you're gone. For example: all my site requires to build is a tarball of statically-linked binaries, a POSIX shell, and a decent Make implementation (bmake and GNU make work) to build; see [my build manifest](https://git.sr.ht/~seirdy/seirdy.one/tree/master/item/.build.yml). To serve, it just needs a static web server.
|
||||
|
||||
Testing
|
||||
-------
|
||||
|
|
|
@ -33,5 +33,3 @@ https://brid.gy/comment/reddit/Seirdy/sy3r4t/hxws016,https://old.reddit.com/comm
|
|||
https://snowdin.town/notice/AOevybwoSx4xW4lX3w,https://web.archive.org/web/20230422173223/https://snowdin.town/notice/AOevybwoSx4xW4lX3w
|
||||
https://aroace.space/opinions,https://web.archive.org/web/20231208204623/https://aroace.space/opinions
|
||||
https://mastodon.art/@TerryHancock/108392881014059285,https://web.archive.org/web/20221117225952/https://mastodon.art/@TerryHancock/108392881014059285
|
||||
https://lobste.rs/s/dmkw4d/how_back_up_your_git_repositories#c_ktr31d,https://lobste.rs/comments/ktr31d/reply
|
||||
https://lobste.rs/s/rbocxw/my_thoughts_on_writing_minecraft_server#c_xsjz1w,https://lobste.rs/comments/xsjz1w/reply
|
||||
|
|
|
|
@ -7,8 +7,9 @@ no ai,https://baccyflap.com/noai/?prv&s=srd,https://baccyflap.com/noai,https://b
|
|||
TheOldNet,https://webring.theoldnet.com/member/ba438275f00f5df1a2e78e547424d05e/previous/navigate,https://webring.theoldnet.com/,https://webring.theoldnet.com/member/ba438275f00f5df1a2e78e547424d05e/next/navigate,https://webring.theoldnet.com/member/ba438275f00f5df1a2e78e547424d05e/random/navigate
|
||||
geekring,https://geekring.net/site/167/previous,https://geekring.net/,https://geekring.net/site/167/next,https://geekring.net/site/167/random
|
||||
Loop (JS),https://loop.graycot.dev/webring.html?action=prev,https://docs.graycot.dev/s/MFowZsw_F,https://loop.graycot.dev/webring.html?action=next,https://loop.graycot.dev/webring.html?action=rand
|
||||
Retronaut,https://webring.dinhe.net/prev/https://seirdy.one/,https://webring.dinhe.net/,https://webring.dinhe.net/next/https://seirdy.one/,null
|
||||
Retronaut,https://tonicfunk.neocities.org/,https://webring.dinhe.net/,https://members.vistaserv.net/jrs/index.html,null
|
||||
Hotline,https://hotlinewebring.club/seirdy/previous,https://hotlinewebring.club,https://hotlinewebring.club/seirdy/next,null
|
||||
Bucket (JS),https://webring.bucketfish.me/redirect.html?to=prev&name=seirdy,https://webring.bucketfish.me/,https://webring.bucketfish.me/redirect.html?to=next&name=seirdy,null
|
||||
Devring,https://devring.club/sites/5/prev,https://devring.club,https://devring.club/sites/5/next,https://devring.club/random
|
||||
Cuddler,https://cuddler-webring.netlify.app/seirdy/previous,https://cuddler-webring.netlify.app/,https://cuddler-webring.netlify.app/seirdy/next,null
|
||||
Just For Fun,https://webri.ng/webring/emjustforfun/previous?via=https://seirdy.one/,https://em.elliotsmoon.net/webring/index.html,https://webri.ng/webring/emjustforfun/next?via=https://seirdy.one/,https://webri.ng/webring/emjustforfun/random?via=https://seirdy.one/
|
||||
|
|
|
Loading…
Reference in a new issue