From a018ae6499351e7ed3785e1f0f9fd8494ea57fcb Mon Sep 17 00:00:00 2001 From: Rohan Kumar Date: Fri, 24 May 2024 08:43:41 -0400 Subject: [PATCH] New note: next steps for search engine article --- ...t-steps-for-my-search-engine-collection.md | 23 +++++++++++++++++++ 1 file changed, 23 insertions(+) create mode 100644 content/notes/next-steps-for-my-search-engine-collection.md diff --git a/content/notes/next-steps-for-my-search-engine-collection.md b/content/notes/next-steps-for-my-search-engine-collection.md new file mode 100644 index 0000000..934f105 --- /dev/null +++ b/content/notes/next-steps-for-my-search-engine-collection.md @@ -0,0 +1,23 @@ +--- +title: "Next steps for my search engine collection" +date: 2024-05-24T08:48:41-04:00 +replyURI: "https://seirdy.one/posts/2021/03/10/search-engines-with-own-indexes/" +replyTitle: "A look at search engines with their own indexes" +replyType: "BlogPosting" +replyAuthor: "Seirdy" +replyAuthorURI: "https://seirdy.one/" +--- + +My search engine article blew up recently, as [yet another major publication linked it](https://arstechnica.com/gadgets/2024/05/bing-outage-shows-just-how-little-competition-google-search-really-has/2/) (yay! /gen), so I made some fixes: + +- Moved a couple engines to the graveyard. h/t to {{}} for telling me about moose.at's demise, and to my broken link checker for telling me about Siik being down for a while now. +- Updated my methodology section to emphasize how I now use word-substitutions to fingerprint an engine's source. Queries that lend themselves to word-substitution are now **my primary approach to uncovering which provider an engine uses for search results,** followed by some long-tail queries and search operator support. + +The full list of changes is in [the Git log](https://git.sr.ht/~seirdy/seirdy.one/log/master/item/content/posts/search-engines-with-own-indexes.md). + +Things I should add in the future: + +- I ought to add a section to describe why I don't personally like metasearch engines as much as I used to. TLDR: each engine has its own quirks and search operators that I learn to use, and throwing them all on one page forces me to use whatever quirks they have in common. This is really bad for long-tail queries. +- I should put more effort into categorizing engines by strength as well as index size. I'll have to come up with appropriate terms for things like "ability to find specific pages with specific topics" (less aggressive word substitutions, less focus on semantic search: Mojeek is one of the best at this, Yep is terrible), or "ability to discover pages about a given broad topic" (Yep is good at this once you learn to use it well, but Mojeek isn't). + +That second bullet point is really important. Part of the point of this article is to show that **nobody can beat Google at being Google** (except perhaps Bing), but we can beat Google at more specific niches.