seirdy.one/content/posts/search-engines-with-own-indexes.gmi

This is a cursory review of all the indexing search engines I have been able to find. Gemini engines are at the bottom; the rest of this post is about Web search engines.

The three dominant English search engines with their own indexes¹ are Google, Bing, and Yandex (GBY). Many alternatives to GBY exist, but almost none of them have their own results; instead, they just source their results from GBY.

With that in mind, I decided to test and catalog all the different indexing search engines I could find. I prioritized breadth over depth, and encourage readers to try the engines out themselves if they’d like more information.

This page is a “living document” that I plan on updating indefinitely. Check for updates once in a while if you find this page interesting. Feel free to send me suggestions, updates, and corrections; I’d especially appreciate help from those who speak languages besides English and can evaluate a non-English indexing search engine. Contact info is in the article footer.

I plan on updating the engines in the top two categories with more info comparing the structured/linked data the engines leverage (RDFa vocabularies, microdata, microformats, JSON-LD, etc.) to help authors determine which formats to use.

## Rationale

Google, Microsoft (the company behind Bing), and Yandex aren't just search engine companies; they're content and ad companies as well. For example, Google hosts video content on YouTube and Microsoft hosts social media content on LinkedIn. This gives these companies a powerful incentive to prioritize their own content. They are able to do so even if they claim that they treat their own content the same as any other: since they have complete access to their search engines' inner workings, they can tailor their content pages to better fit their algorithms and tailor their algorithms to work well on their own content. They can also index their own content without limitations but throttle indexing for other crawlers.²

One way to avoid this conflict of interest is to *use search engines that aren't linked to major content providers;* i.e., use engines with their own independent indexes.

There's also a practical, non-ideological reason to try other engines: different providers have different results. Websites that are hard to find on one search engine might be easy to find on another, so using more indexes and ranking algorithms results in access to more content.

## About the list

I primarily evaluated English-speaking search engines because that’s my primary language. With some difficulty, I could probably evaluate a Spanish one; however, I wasn’t able to find many Spanish-language engines powered by their own crawlers.

See the "Methodology" section at the bottom to see how I evaluated each one.

## General indexing search-engines

### Large indexes, good results

These are large engines that pass all the above tests and more.

1. Google: the biggest index. Allows submitting pages and sitemaps for crawling, but requires login. Powers a few other engines:

* Startpage
* GMX search
* (discontinued) Runnaroo
* SAPO (Portuguese interface, can work with English results)

2. Bing: the runner-up. Allows submitting pages and sitemaps for crawling, but requires login. Its index powers many other engines:

* Yahoo (and its sibling engine, OneSearch)
* DuckDuckGo³
* AOL
* Qwant (partial)⁴
* Ecosia
* Ekoru
* Privado
* Findx
* Disconnect Search⁵
* PrivacyWall
* Lilo
* SearchScene
* Peekier
* Oscobo
* Million Short
* Yippy search⁶
* Lycos
* Givero
* Swisscows
* Fireball
* Ask.moe⁷
* Partially powers MetaGer by default; this can be turned off
* At this point, I mostly stopped adding Bing-based search engines. There are just too many.

3. Yandex: originally a Russian search engine, it now has an English version. Some Russian results bleed into its English site. Allows submitting pages and sitemaps for crawling, but requires login. Powers:

* Epic Search (went paid-only by June 2021)
* Occasionally powers DuckDuckGo’s link results instead of Bing.

4. Mojeek: Seems privacy-oriented with a large index containing billions of pages. Quality isn’t at Google/Bing/Yandex’s level, but it’s not bad either. If I had to use Mojeek as my default general search engine, I’d live. Partially powers eTools.ch. At this moment, I think that Mojeek is the best alternative to GBY for general web search.

5. Petal search: A search engine by Huawei that recently switched from searching for Android apps to general search. Despite its surprisingly good results, I wouldn't recommend it due to privacy concerns. Requires an account to submit sites. I discovered this via my access logs. Be aware that in some jurisdictions, it doesn't use its own index: in Russia and some EU regions it uses Yandex and Qwant, respectively.

=> https://petalsearch.com/  petalsearch.com

### Smaller indexes, relevant results

These engines pass most of the tests listed in the "methodology" section. All of them seem relatively privacy-friendly.

* Right Dao: very fast, good results. Passes the tests fairly well. It plans on including query-based ads if/when its userbase grows.⁸

=> https://rightdao.com  Right Dao

* Gigablast: It’s been around for a while and also sports a classic web directory. Searches are a bit slow, and it charges to submit sites for crawling. It powers Private.sh. Gigablast is tied with Right Dao for quality.

=> https://gigablast.com/  Gigablast
=> https://private.sh  Private.sh

* Alexandria: A pretty new "non-profit, ad free" engine, with freely-licensed code. Surprisingly good at finding recent pages. Its index is built from the Common Crawl; it isn't as big as Gigablast or Right Dao but its ranking is great.

=> https://www.alexandria.org/ Alexandria
=> https://github.com/alexandria-org/alexandria Alexandria engine source code

### Smaller indexes, hit-and-miss

These engines fail badly at a few important tests. Otherwise, they seem to work well enough.

* seekport: The interface is in German but it supports searching in English just fine. The default language is selected by your locale. It’s really good considering its small index; it hasn’t heard of less common terms (e.g. “Seirdy”), but it’s able to find relevant results in other tests. The server does not support TLS.
* Exalead: slow, quality is hit-and-miss. Its indexer claims to crawl the DMOZ directory, which has since shut down and been replaced by the Curlie directory. No relevant results for “Oppenheimer” and some other history-related queries. Allows submitting individual URLs for indexing, but requires solving a Google reCAPTCHA and entering an email address.
* ExactSeek: small index, disproportionately dominated by big sites. Failed multiple tests. Allows submitting individual URLs for crawling, but requires entering an email address and receiving a newsletter. Webmaster tools seem to heavily push for paid SEO options. It also powers SitesOnDisplay and Blog-search.com.

=> http://www.seekport.com/  seekport (HTTP only)
=> https://www.exalead.com/search/  Exalead
=> https://curlie.org  Curlie
=> https://www.exactseek.com/  ExactSeek

* Infotiger: A small index that seems to find relevant results. It allows site submission for English and German pages. It also features a "similarity" search to query pages similar to a given link, with mixed results.
* Kozmonavt: Has a small index of almost 5 million sites. If I want to find the website for a certain project, Kozmonavt works well (provided its index has crawled said website). It works poorly for learning things and finding general information. I cannot recommend it for anything serious since it lacks contact information, a privacy policy, or any other information about the org/people who made it. Discovered in the seirdy.one access logs.
* Burf.co: Very small index, but seems fine at ranking more relevant results higher. Allows site submission without any extra steps.
* Entfer: a newcomer that lets registered users upvote/downvote search results to customize ranking. Doesn't offer much information on who made it. Its index is small, but it does seem to return results related to the query.
* Siik: Lacks contact info, and the ToS and Privacy Policy links are dead. Seems to have PHP errors in the backend for some of its instant-answer widgets. If you scroll past all that, it does have web results powered by what seems to be its own index. These results do tend to be somewhat relevant, but the index seems too small for more specific queries.

=> https://alpha.infotiger.com/ Infotiger
=> https://kozmonavt.ml/ Kozmonavt
=> https://burf.co/ Burf.co
=> https://entfer.com/ Entfer
=> https://siik.co/ Siik

* ChatNoir: An experimental engine by researchers that uses the Common Crawl index. The engine is open source. There's more information in its announcement on the Common Crawl mailing list (Google Groups).
* Secret Search Engine Labs: Very small index with very little SEO spam; it toes the line between a "search engine" and a "surf engine". It's best for reading about broad topics that would otherwise be dominated by SEO spam, thanks to its CashRank algorithm. Allows site submission.

=> https://www.chatnoir.eu/ ChatNoir
=> https://commoncrawl.org/ Common Crawl
=> https://github.com/chatnoir-eu ChatNoir source code (GitHub)
=> https://groups.google.com/g/common-crawl/c/3o2dOHpeRxo/m/H2Osqz9dAAAJ ChatNoir Announcement
=> http://www.secretsearchenginelabs.com/ Secret Search Engine Labs
=> http://www.secretsearchenginelabs.com/tech/cashrank.php CashRank Algorithm

### Unusable engines, irrelevant results

Results from these search engines don’t seem at all useful.

* YaCy: community-made index; slow. Results are awful/irrelevant, but can be useful for intranet or custom search.
* Scopia: only seems to be available via the MetaGer metasearch engine after turning off Bing and news results. Tiny index, very low-quality.
* Artado Search: Primarily Turkish, but it also seems to support English results. Like Plumb, it uses client-side JS to fetch results from existing engines (Google, Bing, Yahoo, Petal, and others); like MetaGer, it has an option to use its own independent index. Results from its index are almost always empty. Very simple queries ("twitter", "wikipedia", "reddit") give some answers. Supports site submission and crowdsourced instant answers.
* Active Search Results: very poor quality
* Crawlson: young, slow. In this category because its index has a cap of 10 URLs per domain. I initially discovered Crawlson in the seirdy.one access logs.
* Anoox: Results are few and irrelevant; fails to find any results for basic terms. Allows site submission. It's also a lightweight social network and claims to be powered by its users, letting members vote on listings to alter rankings.
* Yioop!: A FLOSS search engine that boasts a very impressive feature-set: it can parse sitemaps, feeds, and a variety of markup formats; it can import pre-curated data in forms such as access logs, Usenet posts, and WARC archives; it also supports feed-based news search. Despite the impressive feature set, Yioop's results are few and irrelevant due to its small index. It allows submitting sites for crawling. Like Meorca, Yioop has social features such as blogs, wikis, and a chat bot API.

=> https://metager.org  MetaGer
=> https://www.artadosearch.com/ Artado Search
=> https://www.activesearchresults.com  Active Search Results
=> https://crawlson.com Crawlson
=> https://www.anoox.com/  Anoox
=> https://archive.is/oVAre Plumb CPO
=> https://www.yioop.com Yioop!

### Semi-independent indexes

Engines in this category fall back to GBY when their own indexes don't have enough results. As their own indexes grow, some claim that this should happen less often.

* Brave Search: Many tests (including all the tests I listed in the "Methodology" section) resulted results identical to Google, revealed by a side-by-side comparison with Google, Startpage, and a Searx instance with only Google enabled. Brave claims that this is due to how Cliqz (the discontinued engine acquired by Brave) used query logs to build its page models and was optimized to match Google.¹⁰ The index is independent, but optimizing against Google resulted in too much similarity for the real benefit of an independent index to show. Furthermore, many queries have Bing results mixed in; users can click an "info" button to see the percentage of results that came from its own index. The independent percentage is typically quite high (often close to 100%) but can drop for advanced queries.

=> https://search.brave.com/ Brave Search

* Plumb: Almost all queries return no results; when this happens, it falls back to Google. It's fairly transparent about the fallback process, but I'm concerned about *how* it does this: it loads Google's Custom Search scripts from "cse.google.com" onto the page to do a client-side Google search. This can be mitigated by using a browser addon to block "cse.google.com" from loading any scripts. Plumb claims that this is a temporary measure while its index grows, and they're planning on getting rid of this. Allows submitting URLs, but requires solving an hCaptcha. This engine is very new; hopefully as it improves, it could graduate from this section. Its Chief Product Officer previously founded the Gibiru search engine which shares the same affiliates and (for now) the same index; the indexes will diverge with time.

=> https://plumb.one/ Plumb

* Neeva: Combines Bing results with results from its own index. Bing normally isn't okay with this, but Neeva is one of few exceptions. As of right now, results are mostly identical to Bing but original links not found by Bing frequently pop up. Long and esoteric queries are less likely to feature original results. Requires signing up with an email address or OAuth to use, and offers a paid tier with additional benefits.

=> https://neeva.com/ Neeva

* Qwant: Qwant claims to use its own index, but it still relies on Bing for most results. It seems to be in a position similar to Neeva. Try a side-by-side comparison to see if or how it compares with Bing.

=> https://www.qwant.com Qwant

* Kagi Search: The most interesting entry in this category, IMO. Like Neeva, it requires an account; it will eventually require payment. It's powered by its own Teclis index (Teclis can be used independently; see the non-commercial section below), and claims to also use results from Google and Bing. The result seems somewhat unique: I'm able to recognize some results from the Teclis index mixed in with the mainstream ones. In addition to Teclis, Kagi's other products include the Kagi.ai intelligent answer service and the TinyGem social bookmarking service, both of which play a role in Kagi.com in the present or future.

=> https://kagi.com/ Kagi Search
=> https://kagi.ai/ Kagi.ai
=> https://tinygem.org/ TinyGem

## Non-generalist search

These indexing search engines don’t have a Google-like “ask me anything” endgame; they’re trying to do something different. You aren't supposed to use these engines the same way you use GBY.

### Small/non-commercial Web

* Wiby: I love this one. It focuses on smaller independent sites that capture the spirit of the “early” web. It’s more focused on “discovering” new interesting pages that match a set of keywords than finding a specific resources. I like to think of Wiby as an engine for surfing, not searching. Runnaroo occasionally features a hit from Wiby. If you have a small site or blog that isn’t very “commercial”, consider submitting it to the index.
* Marginalia Search: A recent addition similar to Wiby, and *my favorite entry on this page*. It has its own crawler but is strongly biased towards non-commercial, personal, and/or minimal sites. It's a great response to the increasingly SEO-spam-filled SERPs of GBY. Partially powers Teclis, which in turn partially powers Kagi.
* Search My Site: Similar to Wiby, but only indexes user-submitted personal and independent sites. It optionally supports IndieAuth.
* Teclis: A project by the creator of Kagi search. Uses its own crawler that measures content blocked by uBlock Origin, and extracts content with the open-source article scrapers Trafilatura and Readability.js. This is quite an interesting approach: tracking blocked elements discourages tracking and advertising; using Trafilatura and Readability.js encourages the use of semantic HTML and Semantic Web standards such as microformats, microdata, and RDFa. It claims to also use some results from Marginalia.

=> https://wiby.me  wiby.me
=> https://search.marginalia.nu/ search.marginalia.nu
=> https://searchmysite.net Search My site
=> http://teclis.com/ Teclis

### Site finders

These engines try to find a website, typically at the domain-name level. They don't focus on capturing particular pages within websites.

* search.tl: Generalist search for one TLD at a time (defaults to .com). I'm not sure why you'd want to always limit your searches to a single TLD, but now you can.⁹ There isn't any visible UI for changing the TLD for available results; you need to add/change the "tld" URL paramater. For example, to search .org sites, append "&tld=org" to the URL. It seems to be connected to Amidalla.de. Amidalla allows users to manually add URLs to its index and directory; I have yet to see if doing so impacts search.tl results.
* Thunderstone: A combined website catalog and search engine that focuses on categorization. Its about page claims: "We continuously survey all primary COM, NET, and ORG web-servers and distill their contents to produce this database. This is an index of *sites* not pages. It is very good at finding companies and organizations by purpose, product, subject matter, or location. If you’re trying to finding things like 'BillyBob's personal beer can page on AOL', try Yahoo or Dogpile." This seems to be the polar opposite of the engines in the "small or non-commercial Web" category.
* sengine.info: Developed by netEstate GmbH, which specializes in content extraction for inprints and job ads. Also has a German-only version available. Discovered in my access logs.

=> http://www.search.tl  search.tl
=> https://search.thunderstone.com/texis/websearch21/ Thunderstone
=> https://www.sengine.info/ sengine.info

### Other

* Keybot: A must-have for anyone who does translation work. It crawls the web looking for multilingual websites. Translators who are unsure about how to translate a given word or phrase can see its usage in two given languages, to learn from other human translators. My parents are fluent English speakers but sometimes struggle to express a given Hindi idiom in English; something like this could be useful to them, since machine translation isn't nuanced enough for every situation. Part of the TTN Translation Network. Discovered in my access logs.
* Quor: seems to mainly index large news sites. Site is down as of June 2021. Originally available at www dot quor dot com.
* Ninfex: a "people-powered" search engine that combines aspects of link aggregators and search. It lets users vote on submissions and it also displays links to forums about submissions.
* Semantic Scholar: a search engine by the Allen Institute for AI focused on academic PDFs, with a couple hundred million papers indexed. Discovered in my access logs.

=> https://www.keybot.com/ Keybot Translation Search Machine.
=> https://ninfex.com Ninfex
=> https://www.semanticscholar.org/ Semantic Scholar

## Other languages

I’m unable to evaluate these engines properly since I don’t speak the necessary languages. English searches on these are a hit-or-miss. I might have made a few mistakes in this category.

### Big indexes

* Baidu: Chinese. Very large index; it's a major engine alongside GBY. Offers webmaster tools for site submission.
* Qihoo 360: Chinese. I’m not sure how independent this one is.
* Toutiao: Chinese. Not sure how independent this one is either.
* Sogou: Chinese
* Yisou: Chinese
* Naver: Korean. Allows submitting sitemaps and feeds. Discovered via some Searx metasearch instances.
* Seznam: Czech, seems relatively privacy-friendly. Discovered in the seirdy.one access logs. It allows site submission with webmaster tools.
* Cốc Cốc: Vietnamese
* go.mail.ru: Russian

=> https://search.naver.com  Naver
=> https://www.seznam.cz/  Seznam
=> https://coccoc.com/search  Cốc Cốc
=> https://go.mail.ru/ go.mail.ru

### Smaller indexes

* Vuhuv: Turkish
* Parsijoo: Persian
* search.ch: Regional search engine for Switzerland; users can restrict searches to their local regions.
* fastbot: German
* Moose.at: German (Austria-based)

=> https://www.vuhuv.com.tr/ Vuhuv
=> https://tr.vuhuv.com/ Yuhuv (alternate domain)
=> https://www.parsijoo.ir/  Parsijoo
=> https://search.ch  search.ch
=> https://www.fastbot.de/  fastbot
=> https://www.moose.at  Moose.at

## Misc

* Ask.com: The site is back. They claim to outsource search results. The results seem similar to Google, Bing, and Yandex; however, I can’t pinpoint exactly where their results are coming from. Also, several sites from the "ask.com network" such as directhit.com, info.com, and kensaq.com have uniqe-looking results.
* Not evaluated: Apple’s search. It’s only accessible through a search widget in iOS and macOS and shows very few results. This might change; see the next section.
* Not evaluated: Kagi Search. It's in a closed beta and I haven't yet gotten an invitation.
* Partially evaluated: Infinity Search. It has a young, small index. It recently split into a paid offering with the main index and Infinity Decentralized, the latter of which allows users to select from community-hosted crawlers. I managed to try it out before it became a paid offering, and it seemed decent; however, I wasn’t able to run the tests listed in the “Methodology” section. Allows submitting URLs and sitemaps into a text box, no other work required.

=> https://uk.ask.com  uk.ask.com
=> https://infinitysearch.co  Infinity Search
=> https://infinitydecentralized.com/  Infinity Decentralized

## Upcoming engines

These engines aren’t ready yet; their indexes are either in a proof-of-concept phase with a handful of sites or aren’t available yet.

* Apple: given the activity of the AppleBot crawler lately, their index will almost certainly grow to a size large enough to power a general search engine soon. Check your server’s access logs; there’s a good chance it’s crawled your site if you have a few backlinks.
* Ahrefs: Dmitry Gerasimenko from Ahrefs has announced plans for Ahrefs to release a search engine to "share ad revenue with content creators 90/10". This isn’t surprising: its crawlers are quite active and have probably built quite a large index.

=> https://twitter.com/botsbreeder/status/1110889488706760704 Initial announcement
=> https://medium.com/swlh/investor-money-vs-public-interest-did-google-fail-to-build-a-non-evil-platform-3a054f996ea9 Blog post on Ahrefs' motivation for a new engine
=> https://twitter.com/botsbreeder/status/1405920654877028357 An update on Ahrefs' search engine (June 2021)

## Gemini search engines

Time for my first Gemini-exclusive content! A Gemini page about search engines wouldn't be complete without a few search engines for the Gemini space.
* geminispace.info: A GUS instance, but with an updated index. Supports submitting content. The biggest search engine on Gemini.
=> gemini://geminispace.info/ geminispace.info

* AuraGem Search Engine: part of the Ponix capsule, written in Go. A relative newcomer.
=> gemini://auragem.space/searchengine/ AuraGem Search Engine
=> https://github.com/krixano/geminiserver Ponix source code
=> gemini://auragem.space/devlog Ponix devlog


## Graveyard

These engines were originally included in the article, but have since been discontinued.

* Meorca: A UK-based search engine that claimed not to "index pornography or illegal content websites". It also featured a public blog with a marketplace and free games. Allowed submitting URLs, but required a full name, email, phone number, and "business name" to do so. Discovered in the seirdy.one access logs. It seems to have dropped everything and pivoted to image-search, which is out of scope for this post.
* gus.guru: the original Gemini search engine. The index doesn't seem to be updated anymore.
* wbsrch: In addition to its generalist search, it also had many other utilities related to domain name statistics. Failed multiple tests. Its index was a bit dated; it had an old backlog of sites it hadn’t finished indexing. It also had several dedicated per-language indexes.
* Gowiki: Very young, small index, but showed promise. I discovered this in the seirdy.one access logs. It was only available in the US. Seems down as of early 2022.

=> https://meorca.com/  Meorca Search Engine
=> gemini://gus.guru/ gus.guru
=> https://xangis.com/the-wbsrch-experiment/ The Wbsrch Experiment
=> https://gowiki.com  Gowiki

## Exclusions

Two engines were excluded from this list for having a far-right focus.

One engine was excluded because it seems to be built using cryptocurrency in a way I'd rather not support.

Some fascinating little engines seem like hobbyist proofs-of-concept. I decided not to include them in this list, but watch them with interest to see if they can become something viable.

## Methodology

### Discovery

I find new engines by:

* Monitoring certain web directories for changes in their search engine listings.
* Checking other curated lists of "good/bad bots" to spot search engines.
* Using search engines to discover search engines: searching for the names of less-popular engines often pulls up similar lists.
* Receiving suggestions from readers
* Compiling a list of regular expressions for user-agent strings I'm familiar with. Before I delete my server access logs, I extract user-agents that don't match that list along with the pages they request.
* Checking the Searx and Searxng projects for new integrations.

### Evaluation

I focused almost entirely on "organic results" (the classic link results), and didn't focus too much on (often glaring) privacy issues, "enhanced" or "instant" results (e.g. Wikipedia sidebars, related searches, Stack Exchange answers), or other elements.

I compared results for esoteric queries side-by-side; if the first 20 results were (nearly) identical to another engine’s results (though perhaps in a slightly different order), they were likely sourced externally and not from an independent index.

I tried to pick queries that should have a good number of results and show variance between search engines. An incomplete selection of queries I tested:

* “vim”, “emacs”, “neovim”, and “nvimrc”: Search engines with relevant results for “nvimrc” typically have a big index. Finding relevant results for the text editors “vim” and “emacs” instead of other topics that share the name is a challenging task.
* “vim cleaner”: should return results related to a line of cleaning products rather than the Correct Text Editor.
* “Seirdy”: My site is relatively low-traffic, but my nickname is pretty unique and visible on several of the highest-traffic sites out there.
* “Project London”: a small movie made with volunteers and FLOSS without much advertising. If links related to the movie show up, the engine’s really good.
* “oppenheimer”: a name that could refer to many things. Without context, it should refer to the physicist who worked on the atomic bomb in Los Alamos. Other historical queries: “magna carta” (intermediate), “the prince” (very hard).

Some less-mainstream engines have noticed this article, which is great! I've had excellent discussions with people who work on several of these engines. Unfortunately, this article's visibility also incentivizes some engines to optimize specifically for any methodology I describe. I've addressed this by keeping a long list of test queries to myself. The simple queries above are a decent starting point for simple quick evaluations, but I also test for common search operators, keyword length, and types of domain-specific jargon. I also use queries designed to pull up specific pages with varying levels of popularity and recency to gauge the size, scope, and growth of an index.

Professional critics often work anonymously because personalization can damage the integrity of their reviews. For similar reasons, I attempt to try each engine anonymously at least once by using a VPN and/or my standard anonymous setup: an amnesiac Whonix VM with the Tor Browser. I also often test using a fresh profile when travelling, or via a Searx instance if it supports a given engine. When avoiding personalization, I use "varied" queries that I don't repeat verbatim across search engines; this reduces the likelihood of identifying me. I also attempt to spread these tests out over time so admins won't notice an unusual uptick in unpredictable and esoteric searches. This might seem overkill, but I already regularly employ similar methods for a variety of different scenarios.

### Caveats

I didn't try to avoid personalization when testing engines that require account creation. Entries in the "hit-and-miss" and "unusable" sections got less attention: for instance, I didn't spend a lot of effort tracking results over time to see how new entries got added to them.

I avoided "natural language" queries like questions, focusing instead on keyword searches and search operators. I also mostly ignored infoboxes (also known as "instant answers").

## Acknowledgements

Some of this content came from the Search Engine Map and Search Engine Party. A few web directories also proved useful.

=> https://www.searchenginemap.com/  Search Engine Map
=> https://searchengine.party/  Search Engine Party

Matt from Gigablast also gave me some helpful information on GBY which I included in the "Rationale" section. He's written more about big tech in the Gigablast blog:

=> https://gigablast.com/blog.html Gigablast blog

Nicholas A. Ferrell of The New Leaf Journal wrote a great post on alternative search engines.

=> https://thenewleafjournal.com/a-2021-list-of-alternative-search-engines-and-search-resources/ A 2021 List of Alternative Search Engines and Search Resources
=> gemini://gemlog.blue/users/naferrell/ N.A. Ferrell's Gemlog

He also gave me some useful details about Seznam, Naver, Baidu, and Goo:

=> https://lists.sr.ht/~seirdy/seirdy.one-comments/%3C20210618031450.rb2twu4ypek6vvl3%40rkumarlappie.attlocal.net%3E Re: Editor of The New Leaf Journal - Added Your Guestbook Comment Info to My Post + Feedback

## Notes

¹ Yes, “indexes” is an acceptable plural form of the word “index”. The word “indices” sounds weird to me outside a math class.

² Matt from Gigablast told me that indexing YouTube or LinkedIn will get you blocked if you aren't Google or Microsoft. I imagine that you could do so by getting special permission if you're a megacorporation.

³ DuckDuckGo has a crawler called DuckDuckBot. This crawler doesn't impact the linked results displayed; it just grabs favicons and scrapes data for a few instant answers. DuckDuckGo's help pages claim that the engine uses over 400 sources; my interpretation is that at least 398 sources don't impact organic results. I don't think DuckDuckGo is transparent enough about the fact that their organic results are proxied. Compare DuckDuckGo side-by-side with Bing and Yandex and you'll see it's sourcing organic results from one of them (probably Bing).

⁴ Qwant claims to also use its own crawler for results, but it’s still mostly Bing in my experience. See the "semi-independent" section.

⁵ Disconnect Search allows users to have results proxied from Bing or Yahoo, but Yahoo sources its results from Bing.

⁶ Yippy claims to be powered by a certain IBM brand (a brand that could correspond to any number of products) and annotates results with the phrase “Yippy Index”, but a side-by-side comparison with Bing and other Bing-based engines revealed results to be nearly identical.

⁷ Ask.moe was working on a FLOSS indexer; its search page stated an intention to switch to it from Bing at one point. This statement has since been removed.

=> https://git.sr.ht/~danskeren/spider.moe  FLOSS indexer

⁸ This is based on a statement Right Dao made in on Reddit:

=> https://reddit.com/comments/k4clx1/_/ge9dwmh/?context=1 Right Dao on Reddit
=> https://web.archive.org/web/20210320042457/https://i.reddit.com/r/degoogle/comments/k4clx1/right_dao_a_new_independent_search_engine_that/ge9dwmh/?context=1 Archive of the Reddit thread

⁹ Some search engines support the "site:" search operator to limit searches to subpages/subdomains of a single site or TLD. "site:.one", for instance, limits searches to websites with the ".one" TLD.

¹⁰ More information can be found in a HN subthread and the Cliqz tech blog:

=> https://news.ycombinator.com/item?id=27593801 HN comment thread for "Introducing Brave Search Beta"
=> https://0x65.dev/blog/2019-12-06/building-a-search-engine-from-scratch.html Tech @ Cliqz: Building a search engine from scratch
=> https://0x65.dev/blog/2019-12-10/search-quality-at-cliqz.html Tech @ Cliqz: Search quality at Cliqz
-												Add search engines for the Gemini Space

											
										
										
											2021-03-11 02:38:56 +00:00
+								This is a cursory review of all the indexing search engines I have been able to find. Gemini engines are at the bottom; the rest of this post is about Web search engines.
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
 								The three dominant English search engines with their own indexes¹ are Google, Bing, and Yandex (GBY). Many alternatives to GBY exist, but almost none of them have their own results; instead, they just source their results from GBY.
 								With that in mind, I decided to test and catalog all the different indexing search engines I could find. I prioritized breadth over depth, and encourage readers to try the engines out themselves if they’d like more information.
 								This page is a “living document” that I plan on updating indefinitely. Check for updates once in a while if you find this page interesting. Feel free to send me suggestions, updates, and corrections; I’d especially appreciate help from those who speak languages besides English and can evaluate a non-English indexing search engine. Contact info is in the article footer.
-												New search engine: Siik

How do I keep stumbling across these? I've run into a bunch lately by
sheer coincidence.

											
										
										
											2022-02-17 05:17:48 +00:00
+								I plan on updating the engines in the top two categories with more info comparing the structured/linked data the engines leverage (RDFa vocabularies, microdata, microformats, JSON-LD, etc.) to help authors determine which formats to use.
-												Add "Rationale" and info from Matt

Matt from Gigablast answered some of my questions; I updated the article
with information from him. Some of that information found its way to the
"Rationale" section.

											
										
										
											2021-03-20 20:18:20 +00:00
+								## Rationale
 								Google, Microsoft (the company behind Bing), and Yandex aren't just search engine companies; they're content and ad companies as well. For example, Google hosts video content on YouTube and Microsoft hosts social media content on LinkedIn. This gives these companies a powerful incentive to prioritize their own content. They are able to do so even if they claim that they treat their own content the same as any other: since they have complete access to their search engines' inner workings, they can tailor their content pages to better fit their algorithms and tailor their algorithms to work well on their own content. They can also index their own content without limitations but throttle indexing for other crawlers.²
 								One way to avoid this conflict of interest is to *use search engines that aren't linked to major content providers;* i.e., use engines with their own independent indexes.
 								There's also a practical, non-ideological reason to try other engines: different providers have different results. Websites that are hard to find on one search engine might be easy to find on another, so using more indexes and ranking algorithms results in access to more content.
-												Expand and re-locate "methodology" section

											
										
										
											2022-02-27 01:15:14 +00:00
+								## About the list
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
-												Expand and re-locate "methodology" section

											
										
										
											2022-02-27 01:15:14 +00:00
+								I primarily evaluated English-speaking search engines because that’s my primary language. With some difficulty, I could probably evaluate a Spanish one; however, I wasn’t able to find many Spanish-language engines powered by their own crawlers.
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
-												Expand and re-locate "methodology" section

											
										
										
											2022-02-27 01:15:14 +00:00
+								See the "Methodology" section at the bottom to see how I evaluated each one.
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
 								## General indexing search-engines
 								### Large indexes, good results
 								These are large engines that pass all the above tests and more.
 . Google: the biggest index. Allows submitting pages and sitemaps for crawling, but requires login. Powers a few other engines:
 								* Startpage
-												Add GMX and Fireball

											
										
										
											2022-03-08 06:44:34 +00:00
+								* GMX search
-												Update search engines

- Add ChatNoir, Ninfex
- Mark Runnaroo as discontinued

											
										
										
											2021-05-29 22:57:12 +00:00
+								* (discontinued) Runnaroo
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
+								* SAPO (Portuguese interface, can work with English results)
 . Bing: the runner-up. Allows submitting pages and sitemaps for crawling, but requires login. Its index powers many other engines:
-												Mention OneSearch, a Yahoo+Bing-based engine

Since it got added to some Searx instances I figured I should mention
it.

											
										
										
											2022-02-21 07:58:58 +00:00
+								* Yahoo (and its sibling engine, OneSearch)
-												Add "Rationale" and info from Matt

Matt from Gigablast answered some of my questions; I updated the article
with information from him. Some of that information found its way to the
"Rationale" section.

											
										
										
											2021-03-20 20:18:20 +00:00
+								* DuckDuckGo³
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
+								* AOL
-												Update information about Qwant

											
										
										
											2022-02-04 06:46:56 +00:00
+								* Qwant (partial)⁴
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
+								* Ecosia
 								* Ekoru
 								* Privado
 								* Findx
-												Add "Rationale" and info from Matt

Matt from Gigablast answered some of my questions; I updated the article
with information from him. Some of that information found its way to the
"Rationale" section.

											
										
										
											2021-03-20 20:18:20 +00:00
+								* Disconnect Search⁵
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
+								* PrivacyWall
 								* Lilo
 								* SearchScene
 								* Peekier
 								* Oscobo
 								* Million Short
-												Add "Rationale" and info from Matt

Matt from Gigablast answered some of my questions; I updated the article
with information from him. Some of that information found its way to the
"Rationale" section.

											
										
										
											2021-03-20 20:18:20 +00:00
+								* Yippy search⁶
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
+								* Lycos
 								* Givero
 								* Swisscows
-												Add GMX and Fireball

											
										
										
											2022-03-08 06:44:34 +00:00
+								* Fireball
-												Add "Rationale" and info from Matt

Matt from Gigablast answered some of my questions; I updated the article
with information from him. Some of that information found its way to the
"Rationale" section.

											
										
										
											2021-03-20 20:18:20 +00:00
+								* Ask.moe⁷
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
+								* Partially powers MetaGer by default; this can be turned off
-												Mention OneSearch, a Yahoo+Bing-based engine

Since it got added to some Searx instances I figured I should mention
it.

											
										
										
											2022-02-21 07:58:58 +00:00
+								* At this point, I mostly stopped adding Bing-based search engines. There are just too many.
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
 . Yandex: originally a Russian search engine, it now has an English version. Some Russian results bleed into its English site. Allows submitting pages and sitemaps for crawling, but requires login. Powers:
-												Add note about Epic Search going paid-only

											
										
										
											2021-06-28 01:56:44 +00:00
+								* Epic Search (went paid-only by June 2021)
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
+								* Occasionally powers DuckDuckGo’s link results instead of Bing.
-												New Turkish engine

											
										
										
											2022-03-02 04:25:38 +00:00
+. Mojeek: Seems privacy-oriented with a large index containing billions of pages. Quality isn’t at Google/Bing/Yandex’s level, but it’s not bad either. If I had to use Mojeek as my default general search engine, I’d live. Partially powers eTools.ch. At this moment, I think that Mojeek is the best alternative to GBY for general web search.
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
-												Search engines: mention Kagi, rm dead link

											
										
										
											2021-10-14 19:25:27 +00:00
+. Petal search: A search engine by Huawei that recently switched from searching for Android apps to general search. Despite its surprisingly good results, I wouldn't recommend it due to privacy concerns. Requires an account to submit sites. I discovered this via my access logs. Be aware that in some jurisdictions, it doesn't use its own index: in Russia and some EU regions it uses Yandex and Qwant, respectively.
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
 								=> https://petalsearch.com/  petalsearch.com
 								### Smaller indexes, relevant results
-												Mention exclusions

											
										
										
											2022-02-26 07:43:22 +00:00
+								These engines pass most of the tests listed in the "methodology" section. All of them seem relatively privacy-friendly.
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
-												Add "Rationale" and info from Matt

Matt from Gigablast answered some of my questions; I updated the article
with information from him. Some of that information found its way to the
"Rationale" section.

											
										
										
											2021-03-20 20:18:20 +00:00
+								* Right Dao: very fast, good results. Passes the tests fairly well. It plans on including query-based ads if/when its userbase grows.⁸
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
 								=> https://rightdao.com  Right Dao
-												New engine: Alexandria.org

Gowiki is down, move it to the graveyard
Add Alexandria in its place.

											
										
										
											2022-02-12 00:07:15 +00:00
 								* Gigablast: It’s been around for a while and also sports a classic web directory. Searches are a bit slow, and it charges to submit sites for crawling. It powers Private.sh. Gigablast is tied with Right Dao for quality.
-												Add two search engines, minor fixes

- Two new engines: search.tl and Anoox
- Replace some HTTP with HTTPS
- Add an <abbr> tag
- Spelling/capitalization

											
										
										
											2021-03-17 20:38:00 +00:00
+								=> https://gigablast.com/  Gigablast
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
+								=> https://private.sh  Private.sh
-												New engine: Alexandria.org

Gowiki is down, move it to the graveyard
Add Alexandria in its place.

											
										
										
											2022-02-12 00:07:15 +00:00
 								* Alexandria: A pretty new "non-profit, ad free" engine, with freely-licensed code. Surprisingly good at finding recent pages. Its index is built from the Common Crawl; it isn't as big as Gigablast or Right Dao but its ranking is great.
 								=> https://www.alexandria.org/ Alexandria
 								=> https://github.com/alexandria-org/alexandria Alexandria engine source code
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
 								### Smaller indexes, hit-and-miss
-												New search engine: "Infotiger"

											
										
										
											2021-05-12 18:27:06 +00:00
+								These engines fail badly at a few important tests. Otherwise, they seem to work well enough.
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
-												Fix: use HTTPS links where possible

											
										
										
											2021-08-04 05:52:28 +00:00
+								* seekport: The interface is in German but it supports searching in English just fine. The default language is selected by your locale. It’s really good considering its small index; it hasn’t heard of less common terms (e.g. “Seirdy”), but it’s able to find relevant results in other tests. The server does not support TLS.
-												Add more details about some search engines

											
										
										
											2021-03-20 06:25:52 +00:00
+								* Exalead: slow, quality is hit-and-miss. Its indexer claims to crawl the DMOZ directory, which has since shut down and been replaced by the Curlie directory. No relevant results for “Oppenheimer” and some other history-related queries. Allows submitting individual URLs for indexing, but requires solving a Google reCAPTCHA and entering an email address.
-												Mention ExactSeek sibling engines

											
										
										
											2021-10-22 01:23:41 +00:00
+								* ExactSeek: small index, disproportionately dominated by big sites. Failed multiple tests. Allows submitting individual URLs for crawling, but requires entering an email address and receiving a newsletter. Webmaster tools seem to heavily push for paid SEO options. It also powers SitesOnDisplay and Blog-search.com.
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
-												Fix: use HTTPS links where possible

											
										
										
											2021-08-04 05:52:28 +00:00
+								=> http://www.seekport.com/  seekport (HTTP only)
-												Add two search engines, minor fixes

- Two new engines: search.tl and Anoox
- Replace some HTTP with HTTPS
- Add an <abbr> tag
- Spelling/capitalization

											
										
										
											2021-03-17 20:38:00 +00:00
+								=> https://www.exalead.com/search/  Exalead
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
+								=> https://curlie.org  Curlie
 								=> https://www.exactseek.com/  ExactSeek
-												New engine: Kozmonavt

											
										
										
											2021-04-14 19:31:45 +00:00
-												Update Infotiger info

Infotiger has changed a lot in the past months, and my description of
the engine was quite outdated.

											
										
										
											2022-02-15 19:21:35 +00:00
+								* Infotiger: A small index that seems to find relevant results. It allows site submission for English and German pages. It also features a "similarity" search to query pages similar to a given link, with mixed results.
-												New engine: Kozmonavt

											
										
										
											2021-04-14 19:31:45 +00:00
+								* Kozmonavt: Has a small index of almost 5 million sites. If I want to find the website for a certain project, Kozmonavt works well (provided its index has crawled said website). It works poorly for learning things and finding general information. I cannot recommend it for anything serious since it lacks contact information, a privacy policy, or any other information about the org/people who made it. Discovered in the seirdy.one access logs.
-												Fix broken links/anchors

											
										
										
											2022-02-08 05:26:03 +00:00
+								* Burf.co: Very small index, but seems fine at ranking more relevant results higher. Allows site submission without any extra steps.
-												New engine: Entfer

											
										
										
											2022-02-14 00:07:16 +00:00
+								* Entfer: a newcomer that lets registered users upvote/downvote search results to customize ranking. Doesn't offer much information on who made it. Its index is small, but it does seem to return results related to the query.
-												New engines: semantic scholar, SSEL

Add Semantic Scholar and Secret Search Engine Labs.

											
										
										
											2022-03-01 08:35:54 +00:00
+								* Siik: Lacks contact info, and the ToS and Privacy Policy links are dead. Seems to have PHP errors in the backend for some of its instant-answer widgets. If you scroll past all that, it does have web results powered by what seems to be its own index. These results do tend to be somewhat relevant, but the index seems too small for more specific queries.
-												New engine: Kozmonavt

											
										
										
											2021-04-14 19:31:45 +00:00
-												New search engine: "Infotiger"

											
										
										
											2021-05-12 18:27:06 +00:00
+								=> https://alpha.infotiger.com/ Infotiger
-												New engine: Kozmonavt

											
										
										
											2021-04-14 19:31:45 +00:00
+								=> https://kozmonavt.ml/ Kozmonavt
-												Fix broken links/anchors

											
										
										
											2022-02-08 05:26:03 +00:00
+								=> https://burf.co/ Burf.co
-												New engine: Entfer

											
										
										
											2022-02-14 00:07:16 +00:00
+								=> https://entfer.com/ Entfer
-												New engines: semantic scholar, SSEL

Add Semantic Scholar and Secret Search Engine Labs.

											
										
										
											2022-03-01 08:35:54 +00:00
+								=> https://siik.co/ Siik
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
-												Update search engines

- Add ChatNoir, Ninfex
- Mark Runnaroo as discontinued

											
										
										
											2021-05-29 22:57:12 +00:00
+								* ChatNoir: An experimental engine by researchers that uses the Common Crawl index. The engine is open source. There's more information in its announcement on the Common Crawl mailing list (Google Groups).
-												New engines: semantic scholar, SSEL

Add Semantic Scholar and Secret Search Engine Labs.

											
										
										
											2022-03-01 08:35:54 +00:00
+								* Secret Search Engine Labs: Very small index with very little SEO spam; it toes the line between a "search engine" and a "surf engine". It's best for reading about broad topics that would otherwise be dominated by SEO spam, thanks to its CashRank algorithm. Allows site submission.
-												Update search engines

- Add ChatNoir, Ninfex
- Mark Runnaroo as discontinued

											
										
										
											2021-05-29 22:57:12 +00:00
 								=> https://www.chatnoir.eu/ ChatNoir
 								=> https://commoncrawl.org/ Common Crawl
 								=> https://github.com/chatnoir-eu ChatNoir source code (GitHub)
 								=> https://groups.google.com/g/common-crawl/c/3o2dOHpeRxo/m/H2Osqz9dAAAJ ChatNoir Announcement
-												New engines: semantic scholar, SSEL

Add Semantic Scholar and Secret Search Engine Labs.

											
										
										
											2022-03-01 08:35:54 +00:00
+								=> http://www.secretsearchenginelabs.com/ Secret Search Engine Labs
 								=> http://www.secretsearchenginelabs.com/tech/cashrank.php CashRank Algorithm
-												Update search engines

- Add ChatNoir, Ninfex
- Mark Runnaroo as discontinued

											
										
										
											2021-05-29 22:57:12 +00:00
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
+								### Unusable engines, irrelevant results
 								Results from these search engines don’t seem at all useful.
 								* YaCy: community-made index; slow. Results are awful/irrelevant, but can be useful for intranet or custom search.
 								* Scopia: only seems to be available via the MetaGer metasearch engine after turning off Bing and news results. Tiny index, very low-quality.
-												Search engines: add Artado Search

											
										
										
											2022-02-09 06:39:09 +00:00
+								* Artado Search: Primarily Turkish, but it also seems to support English results. Like Plumb, it uses client-side JS to fetch results from existing engines (Google, Bing, Yahoo, Petal, and others); like MetaGer, it has an option to use its own independent index. Results from its index are almost always empty. Very simple queries ("twitter", "wikipedia", "reddit") give some answers. Supports site submission and crowdsourced instant answers.
-												Add more details about some search engines

											
										
										
											2021-03-20 06:25:52 +00:00
+								* Active Search Results: very poor quality
-												Update search engines

- Remove goo.ne.jp since it uses Google
- Create "Graveyard" section and move wbsrch to it 🪦
- Restore link to Crawlson since it's back up

											
										
										
											2021-08-04 05:48:50 +00:00
+								* Crawlson: young, slow. In this category because its index has a cap of 10 URLs per domain. I initially discovered Crawlson in the seirdy.one access logs.
-												Add two search engines, minor fixes

- Two new engines: search.tl and Anoox
- Replace some HTTP with HTTPS
- Add an <abbr> tag
- Spelling/capitalization

											
										
										
											2021-03-17 20:38:00 +00:00
+								* Anoox: Results are few and irrelevant; fails to find any results for basic terms. Allows site submission. It's also a lightweight social network and claims to be powered by its users, letting members vote on listings to alter rankings.
-												New search engine: Yioop

											
										
										
											2021-03-30 06:08:28 +00:00
+								* Yioop!: A FLOSS search engine that boasts a very impressive feature-set: it can parse sitemaps, feeds, and a variety of markup formats; it can import pre-curated data in forms such as access logs, Usenet posts, and WARC archives; it also supports feed-based news search. Despite the impressive feature set, Yioop's results are few and irrelevant due to its small index. It allows submitting sites for crawling. Like Meorca, Yioop has social features such as blogs, wikis, and a chat bot API.
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
 								=> https://metager.org  MetaGer
-												Search engines: add Artado Search

											
										
										
											2022-02-09 06:39:09 +00:00
+								=> https://www.artadosearch.com/ Artado Search
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
+								=> https://www.activesearchresults.com  Active Search Results
-												Update search engines

- Remove goo.ne.jp since it uses Google
- Create "Graveyard" section and move wbsrch to it 🪦
- Restore link to Crawlson since it's back up

											
										
										
											2021-08-04 05:48:50 +00:00
+								=> https://crawlson.com Crawlson
-												Add two search engines, minor fixes

- Two new engines: search.tl and Anoox
- Replace some HTTP with HTTPS
- Add an <abbr> tag
- Spelling/capitalization

											
										
										
											2021-03-17 20:38:00 +00:00
+								=> https://www.anoox.com/  Anoox
-												Add info on relationship between Plumb and Gibiru

											
										
										
											2021-03-20 02:10:27 +00:00
+								=> https://archive.is/oVAre Plumb CPO
-												New search engine: Yioop

											
										
										
											2021-03-30 06:08:28 +00:00
+								=> https://www.yioop.com Yioop!
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
-												Add Brave, fix dated info, DDG misconceptions

- New category: "semi-independent indexes". Contains Brave and Plumb.
- Mention that Quor is down
- Fix outdated statement on Plumb's hCaptcha
- Clarify some misconceptions about DDG

											
										
										
											2021-06-22 15:42:34 +00:00
+								### Semi-independent indexes
-												New engine: Kagi Search

											
										
										
											2022-02-26 04:01:50 +00:00
+								Engines in this category fall back to GBY when their own indexes don't have enough results. As their own indexes grow, some claim that this should happen less often.
-												Add Brave, fix dated info, DDG misconceptions

- New category: "semi-independent indexes". Contains Brave and Plumb.
- Mention that Quor is down
- Fix outdated statement on Plumb's hCaptcha
- Clarify some misconceptions about DDG

											
										
										
											2021-06-22 15:42:34 +00:00
-												RIP Meorca, add info on 3p sources to Brave.

											
										
										
											2022-03-01 22:42:59 +00:00
+								* Brave Search: Many tests (including all the tests I listed in the "Methodology" section) resulted results identical to Google, revealed by a side-by-side comparison with Google, Startpage, and a Searx instance with only Google enabled. Brave claims that this is due to how Cliqz (the discontinued engine acquired by Brave) used query logs to build its page models and was optimized to match Google.¹⁰ The index is independent, but optimizing against Google resulted in too much similarity for the real benefit of an independent index to show. Furthermore, many queries have Bing results mixed in; users can click an "info" button to see the percentage of results that came from its own index. The independent percentage is typically quite high (often close to 100%) but can drop for advanced queries.
-												Add Brave, fix dated info, DDG misconceptions

- New category: "semi-independent indexes". Contains Brave and Plumb.
- Mention that Quor is down
- Fix outdated statement on Plumb's hCaptcha
- Clarify some misconceptions about DDG

											
										
										
											2021-06-22 15:42:34 +00:00
 								=> https://search.brave.com/ Brave Search
 								* Plumb: Almost all queries return no results; when this happens, it falls back to Google. It's fairly transparent about the fallback process, but I'm concerned about *how* it does this: it loads Google's Custom Search scripts from "cse.google.com" onto the page to do a client-side Google search. This can be mitigated by using a browser addon to block "cse.google.com" from loading any scripts. Plumb claims that this is a temporary measure while its index grows, and they're planning on getting rid of this. Allows submitting URLs, but requires solving an hCaptcha. This engine is very new; hopefully as it improves, it could graduate from this section. Its Chief Product Officer previously founded the Gibiru search engine which shares the same affiliates and (for now) the same index; the indexes will diverge with time.
 								=> https://plumb.one/ Plumb
-												Update Neeva info to reflect free tier

Thanks Akumei!

											
										
										
											2022-02-23 02:24:36 +00:00
+								* Neeva: Combines Bing results with results from its own index. Bing normally isn't okay with this, but Neeva is one of few exceptions. As of right now, results are mostly identical to Bing but original links not found by Bing frequently pop up. Long and esoteric queries are less likely to feature original results. Requires signing up with an email address or OAuth to use, and offers a paid tier with additional benefits.
-												Search engines: add semi-independent engine Neeva

											
										
										
											2021-08-30 22:43:24 +00:00
 								=> https://neeva.com/ Neeva
-												Update information about Qwant

											
										
										
											2022-02-04 06:46:56 +00:00
+								* Qwant: Qwant claims to use its own index, but it still relies on Bing for most results. It seems to be in a position similar to Neeva. Try a side-by-side comparison to see if or how it compares with Bing.
 								=> https://www.qwant.com Qwant
-												New engine: Kagi Search

											
										
										
											2022-02-26 04:01:50 +00:00
+								* Kagi Search: The most interesting entry in this category, IMO. Like Neeva, it requires an account; it will eventually require payment. It's powered by its own Teclis index (Teclis can be used independently; see the non-commercial section below), and claims to also use results from Google and Bing. The result seems somewhat unique: I'm able to recognize some results from the Teclis index mixed in with the mainstream ones. In addition to Teclis, Kagi's other products include the Kagi.ai intelligent answer service and the TinyGem social bookmarking service, both of which play a role in Kagi.com in the present or future.
 								=> https://kagi.com/ Kagi Search
 								=> https://kagi.ai/ Kagi.ai
 								=> https://tinygem.org/ TinyGem
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
+								## Non-generalist search
-												Add two search engines, minor fixes

- Two new engines: search.tl and Anoox
- Replace some HTTP with HTTPS
- Add an <abbr> tag
- Spelling/capitalization

											
										
										
											2021-03-17 20:38:00 +00:00
+								These indexing search engines don’t have a Google-like “ask me anything” endgame; they’re trying to do something different. You aren't supposed to use these engines the same way you use GBY.
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
-												Re-vamp "non-generalist search" section.

- Add a couple new engines to non-gen search
- Move search.tl to non-gen search
- Split non-gen search into subsections.

											
										
										
											2022-02-26 03:49:11 +00:00
+								### Small/non-commercial Web
-												Typo: s/Runaroo/Runnaroo/g

											
										
										
											2021-03-14 05:23:29 +00:00
+								* Wiby: I love this one. It focuses on smaller independent sites that capture the spirit of the “early” web. It’s more focused on “discovering” new interesting pages that match a set of keywords than finding a specific resources. I like to think of Wiby as an engine for surfing, not searching. Runnaroo occasionally features a hit from Wiby. If you have a small site or blog that isn’t very “commercial”, consider submitting it to the index.
-												Re-vamp "non-generalist search" section.

- Add a couple new engines to non-gen search
- Move search.tl to non-gen search
- Split non-gen search into subsections.

											
										
										
											2022-02-26 03:49:11 +00:00
+								* Marginalia Search: A recent addition similar to Wiby, and *my favorite entry on this page*. It has its own crawler but is strongly biased towards non-commercial, personal, and/or minimal sites. It's a great response to the increasingly SEO-spam-filled SERPs of GBY. Partially powers Teclis, which in turn partially powers Kagi.
-												Add Search My Site

Reader-contributed. Thanks for the suggestion!

											
										
										
											2021-03-12 21:07:17 +00:00
+								* Search My Site: Similar to Wiby, but only indexes user-submitted personal and independent sites. It optionally supports IndieAuth.
-												Re-vamp "non-generalist search" section.

- Add a couple new engines to non-gen search
- Move search.tl to non-gen search
- Split non-gen search into subsections.

											
										
										
											2022-02-26 03:49:11 +00:00
+								* Teclis: A project by the creator of Kagi search. Uses its own crawler that measures content blocked by uBlock Origin, and extracts content with the open-source article scrapers Trafilatura and Readability.js. This is quite an interesting approach: tracking blocked elements discourages tracking and advertising; using Trafilatura and Readability.js encourages the use of semantic HTML and Semantic Web standards such as microformats, microdata, and RDFa. It claims to also use some results from Marginalia.
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
 								=> https://wiby.me  wiby.me
-												New search engine: search.marginalia.nu

Thanks Cadence.

											
										
										
											2022-02-06 02:08:45 +00:00
+								=> https://search.marginalia.nu/ search.marginalia.nu
-												Add Search My Site

Reader-contributed. Thanks for the suggestion!

											
										
										
											2021-03-12 21:07:17 +00:00
+								=> https://searchmysite.net Search My site
-												Re-vamp "non-generalist search" section.

- Add a couple new engines to non-gen search
- Move search.tl to non-gen search
- Split non-gen search into subsections.

											
										
										
											2022-02-26 03:49:11 +00:00
+								=> http://teclis.com/ Teclis
 								### Site finders
 								These engines try to find a website, typically at the domain-name level. They don't focus on capturing particular pages within websites.
 								* search.tl: Generalist search for one TLD at a time (defaults to .com). I'm not sure why you'd want to always limit your searches to a single TLD, but now you can.⁹ There isn't any visible UI for changing the TLD for available results; you need to add/change the "tld" URL paramater. For example, to search .org sites, append "&tld=org" to the URL. It seems to be connected to Amidalla.de. Amidalla allows users to manually add URLs to its index and directory; I have yet to see if doing so impacts search.tl results.
 								* Thunderstone: A combined website catalog and search engine that focuses on categorization. Its about page claims: "We continuously survey all primary COM, NET, and ORG web-servers and distill their contents to produce this database. This is an index of *sites* not pages. It is very good at finding companies and organizations by purpose, product, subject matter, or location. If you’re trying to finding things like 'BillyBob's personal beer can page on AOL', try Yahoo or Dogpile." This seems to be the polar opposite of the engines in the "small or non-commercial Web" category.
-												Add Toutiao

											
										
										
											2022-03-04 03:40:18 +00:00
+								* sengine.info: Developed by netEstate GmbH, which specializes in content extraction for inprints and job ads. Also has a German-only version available. Discovered in my access logs.
-												Re-vamp "non-generalist search" section.

- Add a couple new engines to non-gen search
- Move search.tl to non-gen search
- Split non-gen search into subsections.

											
										
										
											2022-02-26 03:49:11 +00:00
 								=> http://www.search.tl  search.tl
 								=> https://search.thunderstone.com/texis/websearch21/ Thunderstone
-												Add Toutiao

											
										
										
											2022-03-04 03:40:18 +00:00
+								=> https://www.sengine.info/ sengine.info
-												Re-vamp "non-generalist search" section.

- Add a couple new engines to non-gen search
- Move search.tl to non-gen search
- Split non-gen search into subsections.

											
										
										
											2022-02-26 03:49:11 +00:00
 								### Other
-												Add Keybot search engine

											
										
										
											2022-03-01 19:48:32 +00:00
+								* Keybot: A must-have for anyone who does translation work. It crawls the web looking for multilingual websites. Translators who are unsure about how to translate a given word or phrase can see its usage in two given languages, to learn from other human translators. My parents are fluent English speakers but sometimes struggle to express a given Hindi idiom in English; something like this could be useful to them, since machine translation isn't nuanced enough for every situation. Part of the TTN Translation Network. Discovered in my access logs.
-												Re-vamp "non-generalist search" section.

- Add a couple new engines to non-gen search
- Move search.tl to non-gen search
- Split non-gen search into subsections.

											
										
										
											2022-02-26 03:49:11 +00:00
+								* Quor: seems to mainly index large news sites. Site is down as of June 2021. Originally available at www dot quor dot com.
 								* Ninfex: a "people-powered" search engine that combines aspects of link aggregators and search. It lets users vote on submissions and it also displays links to forums about submissions.
-												Add Keybot search engine

											
										
										
											2022-03-01 19:48:32 +00:00
+								* Semantic Scholar: a search engine by the Allen Institute for AI focused on academic PDFs, with a couple hundred million papers indexed. Discovered in my access logs.
-												Re-vamp "non-generalist search" section.

- Add a couple new engines to non-gen search
- Move search.tl to non-gen search
- Split non-gen search into subsections.

											
										
										
											2022-02-26 03:49:11 +00:00
-												Add Keybot search engine

											
										
										
											2022-03-01 19:48:32 +00:00
+								=> https://www.keybot.com/ Keybot Translation Search Machine.
-												Update search engines

- Add ChatNoir, Ninfex
- Mark Runnaroo as discontinued

											
										
										
											2021-05-29 22:57:12 +00:00
+								=> https://ninfex.com Ninfex
-												New engines: semantic scholar, SSEL

Add Semantic Scholar and Secret Search Engine Labs.

											
										
										
											2022-03-01 08:35:54 +00:00
+								=> https://www.semanticscholar.org/ Semantic Scholar
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
 								## Other languages
 								I’m unable to evaluate these engines properly since I don’t speak the necessary languages. English searches on these are a hit-or-miss. I might have made a few mistakes in this category.
 								### Big indexes
-												New engine (Goo) + more engine info

Thanks N. A. Ferrell for feedback!

											
										
										
											2021-06-18 03:17:48 +00:00
+								* Baidu: Chinese. Very large index; it's a major engine alongside GBY. Offers webmaster tools for site submission.
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
+								* Qihoo 360: Chinese. I’m not sure how independent this one is.
-												Add Toutiao

											
										
										
											2022-03-04 03:40:18 +00:00
+								* Toutiao: Chinese. Not sure how independent this one is either.
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
+								* Sogou: Chinese
 								* Yisou: Chinese
-												New engine (Goo) + more engine info

Thanks N. A. Ferrell for feedback!

											
										
										
											2021-06-18 03:17:48 +00:00
+								* Naver: Korean. Allows submitting sitemaps and feeds. Discovered via some Searx metasearch instances.
 								* Seznam: Czech, seems relatively privacy-friendly. Discovered in the seirdy.one access logs. It allows site submission with webmaster tools.
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
+								* Cốc Cốc: Vietnamese
-												New engine: go.mail.ru

											
										
										
											2021-03-31 20:21:38 +00:00
+								* go.mail.ru: Russian
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
 								=> https://search.naver.com  Naver
 								=> https://www.seznam.cz/  Seznam
 								=> https://coccoc.com/search  Cốc Cốc
-												New engine: go.mail.ru

											
										
										
											2021-03-31 20:21:38 +00:00
+								=> https://go.mail.ru/ go.mail.ru
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
 								### Smaller indexes
-												New Turkish engine

											
										
										
											2022-03-02 04:25:38 +00:00
+								* Vuhuv: Turkish
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
+								* Parsijoo: Persian
 								* search.ch: Regional search engine for Switzerland; users can restrict searches to their local regions.
 								* fastbot: German
 								* Moose.at: German (Austria-based)
-												New Turkish engine

											
										
										
											2022-03-02 04:25:38 +00:00
+								=> https://www.vuhuv.com.tr/ Vuhuv
 								=> https://tr.vuhuv.com/ Yuhuv (alternate domain)
-												Add two search engines, minor fixes

- Two new engines: search.tl and Anoox
- Replace some HTTP with HTTPS
- Add an <abbr> tag
- Spelling/capitalization

											
										
										
											2021-03-17 20:38:00 +00:00
+								=> https://www.parsijoo.ir/  Parsijoo
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
+								=> https://search.ch  search.ch
 								=> https://www.fastbot.de/  fastbot
 								=> https://www.moose.at  Moose.at
 								## Misc
-												Mention the ask.com network

											
										
										
											2022-02-28 21:42:15 +00:00
+								* Ask.com: The site is back. They claim to outsource search results. The results seem similar to Google, Bing, and Yandex; however, I can’t pinpoint exactly where their results are coming from. Also, several sites from the "ask.com network" such as directhit.com, info.com, and kensaq.com have uniqe-looking results.
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
+								* Not evaluated: Apple’s search. It’s only accessible through a search widget in iOS and macOS and shows very few results. This might change; see the next section.
-												Search engines: mention Kagi, rm dead link

											
										
										
											2021-10-14 19:25:27 +00:00
+								* Not evaluated: Kagi Search. It's in a closed beta and I haven't yet gotten an invitation.
-												Add note about Epic Search going paid-only

											
										
										
											2021-06-28 01:56:44 +00:00
+								* Partially evaluated: Infinity Search. It has a young, small index. It recently split into a paid offering with the main index and Infinity Decentralized, the latter of which allows users to select from community-hosted crawlers. I managed to try it out before it became a paid offering, and it seemed decent; however, I wasn’t able to run the tests listed in the “Methodology” section. Allows submitting URLs and sitemaps into a text box, no other work required.
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
 								=> https://uk.ask.com  uk.ask.com
 								=> https://infinitysearch.co  Infinity Search
 								=> https://infinitydecentralized.com/  Infinity Decentralized
 								## Upcoming engines
-												Add note about Epic Search going paid-only

											
										
										
											2021-06-28 01:56:44 +00:00
+								These engines aren’t ready yet; their indexes are either in a proof-of-concept phase with a handful of sites or aren’t available yet.
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
 								* Apple: given the activity of the AppleBot crawler lately, their index will almost certainly grow to a size large enough to power a general search engine soon. Check your server’s access logs; there’s a good chance it’s crawled your site if you have a few backlinks.
-												New upcoming search engine: Ahrefs

											
										
										
											2022-02-20 23:58:45 +00:00
+								* Ahrefs: Dmitry Gerasimenko from Ahrefs has announced plans for Ahrefs to release a search engine to "share ad revenue with content creators 90/10". This isn’t surprising: its crawlers are quite active and have probably built quite a large index.
 								=> https://twitter.com/botsbreeder/status/1110889488706760704 Initial announcement
 								=> https://medium.com/swlh/investor-money-vs-public-interest-did-google-fail-to-build-a-non-evil-platform-3a054f996ea9 Blog post on Ahrefs' motivation for a new engine
 								=> https://twitter.com/botsbreeder/status/1405920654877028357 An update on Ahrefs' search engine (June 2021)
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
-												Add search engines for the Gemini Space

											
										
										
											2021-03-11 02:38:56 +00:00
+								## Gemini search engines
 								Time for my first Gemini-exclusive content! A Gemini page about search engines wouldn't be complete without a few search engines for the Gemini space.
-												New Gemini search engine: AuraGem

											
										
										
											2022-02-21 07:59:02 +00:00
+								* geminispace.info: A GUS instance, but with an updated index. Supports submitting content. The biggest search engine on Gemini.
 								=> gemini://geminispace.info/ geminispace.info
-												Add search engines for the Gemini Space

											
										
										
											2021-03-11 02:38:56 +00:00
-												New Gemini search engine: AuraGem

											
										
										
											2022-02-21 07:59:02 +00:00
+								* AuraGem Search Engine: part of the Ponix capsule, written in Go. A relative newcomer.
 								=> gemini://auragem.space/searchengine/ AuraGem Search Engine
 								=> https://github.com/krixano/geminiserver Ponix source code
 								=> gemini://auragem.space/devlog Ponix devlog
-												Add search engines for the Gemini Space

											
										
										
											2021-03-11 02:38:56 +00:00
-												Update search engines

- Remove goo.ne.jp since it uses Google
- Create "Graveyard" section and move wbsrch to it 🪦
- Restore link to Crawlson since it's back up

											
										
										
											2021-08-04 05:48:50 +00:00
+								## Graveyard
 								These engines were originally included in the article, but have since been discontinued.
-												RIP Meorca, add info on 3p sources to Brave.

											
										
										
											2022-03-01 22:42:59 +00:00
+								* Meorca: A UK-based search engine that claimed not to "index pornography or illegal content websites". It also featured a public blog with a marketplace and free games. Allowed submitting URLs, but required a full name, email, phone number, and "business name" to do so. Discovered in the seirdy.one access logs. It seems to have dropped everything and pivoted to image-search, which is out of scope for this post.
-												New Gemini search engine: AuraGem

											
										
										
											2022-02-21 07:59:02 +00:00
+								* gus.guru: the original Gemini search engine. The index doesn't seem to be updated anymore.
-												Update search engines

- Remove goo.ne.jp since it uses Google
- Create "Graveyard" section and move wbsrch to it 🪦
- Restore link to Crawlson since it's back up

											
										
										
											2021-08-04 05:48:50 +00:00
+								* wbsrch: In addition to its generalist search, it also had many other utilities related to domain name statistics. Failed multiple tests. Its index was a bit dated; it had an old backlog of sites it hadn’t finished indexing. It also had several dedicated per-language indexes.
-												New engine: Alexandria.org

Gowiki is down, move it to the graveyard
Add Alexandria in its place.

											
										
										
											2022-02-12 00:07:15 +00:00
+								* Gowiki: Very young, small index, but showed promise. I discovered this in the seirdy.one access logs. It was only available in the US. Seems down as of early 2022.
-												Update search engines

- Remove goo.ne.jp since it uses Google
- Create "Graveyard" section and move wbsrch to it 🪦
- Restore link to Crawlson since it's back up

											
										
										
											2021-08-04 05:48:50 +00:00
-												RIP Meorca, add info on 3p sources to Brave.

											
										
										
											2022-03-01 22:42:59 +00:00
+								=> https://meorca.com/  Meorca Search Engine
-												New Gemini search engine: AuraGem

											
										
										
											2022-02-21 07:59:02 +00:00
+								=> gemini://gus.guru/ gus.guru
-												Update search engines

- Remove goo.ne.jp since it uses Google
- Create "Graveyard" section and move wbsrch to it 🪦
- Restore link to Crawlson since it's back up

											
										
										
											2021-08-04 05:48:50 +00:00
+								=> https://xangis.com/the-wbsrch-experiment/ The Wbsrch Experiment
-												New engine: Alexandria.org

Gowiki is down, move it to the graveyard
Add Alexandria in its place.

											
										
										
											2022-02-12 00:07:15 +00:00
+								=> https://gowiki.com  Gowiki
-												Update search engines

- Remove goo.ne.jp since it uses Google
- Create "Graveyard" section and move wbsrch to it 🪦
- Restore link to Crawlson since it's back up

											
										
										
											2021-08-04 05:48:50 +00:00
-												Mention exclusions

											
										
										
											2022-02-26 07:43:22 +00:00
+								## Exclusions
 								Two engines were excluded from this list for having a far-right focus.
 								One engine was excluded because it seems to be built using cryptocurrency in a way I'd rather not support.
-												Elaborate on Mojeek and friendly exclusions.

											
										
										
											2022-02-27 23:09:28 +00:00
+								Some fascinating little engines seem like hobbyist proofs-of-concept. I decided not to include them in this list, but watch them with interest to see if they can become something viable.
-												Expand and re-locate "methodology" section

											
										
										
											2022-02-27 01:15:14 +00:00
+								## Methodology
 								### Discovery
 								I find new engines by:
 								* Monitoring certain web directories for changes in their search engine listings.
 								* Checking other curated lists of "good/bad bots" to spot search engines.
 								* Using search engines to discover search engines: searching for the names of less-popular engines often pulls up similar lists.
 								* Receiving suggestions from readers
 								* Compiling a list of regular expressions for user-agent strings I'm familiar with. Before I delete my server access logs, I extract user-agents that don't match that list along with the pages they request.
 								* Checking the Searx and Searxng projects for new integrations.
 								### Evaluation
 								I focused almost entirely on "organic results" (the classic link results), and didn't focus too much on (often glaring) privacy issues, "enhanced" or "instant" results (e.g. Wikipedia sidebars, related searches, Stack Exchange answers), or other elements.
 								I compared results for esoteric queries side-by-side; if the first 20 results were (nearly) identical to another engine’s results (though perhaps in a slightly different order), they were likely sourced externally and not from an independent index.
 								I tried to pick queries that should have a good number of results and show variance between search engines. An incomplete selection of queries I tested:
 								* “vim”, “emacs”, “neovim”, and “nvimrc”: Search engines with relevant results for “nvimrc” typically have a big index. Finding relevant results for the text editors “vim” and “emacs” instead of other topics that share the name is a challenging task.
 								* “vim cleaner”: should return results related to a line of cleaning products rather than the Correct Text Editor.
 								* “Seirdy”: My site is relatively low-traffic, but my nickname is pretty unique and visible on several of the highest-traffic sites out there.
 								* “Project London”: a small movie made with volunteers and FLOSS without much advertising. If links related to the movie show up, the engine’s really good.
 								* “oppenheimer”: a name that could refer to many things. Without context, it should refer to the physicist who worked on the atomic bomb in Los Alamos. Other historical queries: “magna carta” (intermediate), “the prince” (very hard).
 								Some less-mainstream engines have noticed this article, which is great! I've had excellent discussions with people who work on several of these engines. Unfortunately, this article's visibility also incentivizes some engines to optimize specifically for any methodology I describe. I've addressed this by keeping a long list of test queries to myself. The simple queries above are a decent starting point for simple quick evaluations, but I also test for common search operators, keyword length, and types of domain-specific jargon. I also use queries designed to pull up specific pages with varying levels of popularity and recency to gauge the size, scope, and growth of an index.
-												New Turkish engine

											
										
										
											2022-03-02 04:25:38 +00:00
+								Professional critics often work anonymously because personalization can damage the integrity of their reviews. For similar reasons, I attempt to try each engine anonymously at least once by using a VPN and/or my standard anonymous setup: an amnesiac Whonix VM with the Tor Browser. I also often test using a fresh profile when travelling, or via a Searx instance if it supports a given engine. When avoiding personalization, I use "varied" queries that I don't repeat verbatim across search engines; this reduces the likelihood of identifying me. I also attempt to spread these tests out over time so admins won't notice an unusual uptick in unpredictable and esoteric searches. This might seem overkill, but I already regularly employ similar methods for a variety of different scenarios.
-												Expand and re-locate "methodology" section

											
										
										
											2022-02-27 01:15:14 +00:00
 								### Caveats
 								I didn't try to avoid personalization when testing engines that require account creation. Entries in the "hit-and-miss" and "unusable" sections got less attention: for instance, I didn't spend a lot of effort tracking results over time to see how new entries got added to them.
 								I avoided "natural language" queries like questions, focusing instead on keyword searches and search operators. I also mostly ignored infoboxes (also known as "instant answers").
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
+								## Acknowledgements
 								Some of this content came from the Search Engine Map and Search Engine Party. A few web directories also proved useful.
 								=> https://www.searchenginemap.com/  Search Engine Map
 								=> https://searchengine.party/  Search Engine Party
-												Add "Rationale" and info from Matt

Matt from Gigablast answered some of my questions; I updated the article
with information from him. Some of that information found its way to the
"Rationale" section.

											
										
										
											2021-03-20 20:18:20 +00:00
+								Matt from Gigablast also gave me some helpful information on GBY which I included in the "Rationale" section. He's written more about big tech in the Gigablast blog:
 								=> https://gigablast.com/blog.html Gigablast blog
-												New engine (Goo) + more engine info

Thanks N. A. Ferrell for feedback!

											
										
										
											2021-06-18 03:17:48 +00:00
+								Nicholas A. Ferrell of The New Leaf Journal wrote a great post on alternative search engines.
 								=> https://thenewleafjournal.com/a-2021-list-of-alternative-search-engines-and-search-resources/ A 2021 List of Alternative Search Engines and Search Resources
 								=> gemini://gemlog.blue/users/naferrell/ N.A. Ferrell's Gemlog
 								He also gave me some useful details about Seznam, Naver, Baidu, and Goo:
 								=> https://lists.sr.ht/~seirdy/seirdy.one-comments/%3C20210618031450.rb2twu4ypek6vvl3%40rkumarlappie.attlocal.net%3E Re: Editor of The New Leaf Journal - Added Your Guestbook Comment Info to My Post + Feedback
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
 								## Notes
 								¹ Yes, “indexes” is an acceptable plural form of the word “index”. The word “indices” sounds weird to me outside a math class.
-												Add "Rationale" and info from Matt

Matt from Gigablast answered some of my questions; I updated the article
with information from him. Some of that information found its way to the
"Rationale" section.

											
										
										
											2021-03-20 20:18:20 +00:00
+								² Matt from Gigablast told me that indexing YouTube or LinkedIn will get you blocked if you aren't Google or Microsoft. I imagine that you could do so by getting special permission if you're a megacorporation.
-												Add Brave, fix dated info, DDG misconceptions

- New category: "semi-independent indexes". Contains Brave and Plumb.
- Mention that Quor is down
- Fix outdated statement on Plumb's hCaptcha
- Clarify some misconceptions about DDG

											
										
										
											2021-06-22 15:42:34 +00:00
+								³ DuckDuckGo has a crawler called DuckDuckBot. This crawler doesn't impact the linked results displayed; it just grabs favicons and scrapes data for a few instant answers. DuckDuckGo's help pages claim that the engine uses over 400 sources; my interpretation is that at least 398 sources don't impact organic results. I don't think DuckDuckGo is transparent enough about the fact that their organic results are proxied. Compare DuckDuckGo side-by-side with Bing and Yandex and you'll see it's sourcing organic results from one of them (probably Bing).
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
-												Update information about Qwant

											
										
										
											2022-02-04 06:46:56 +00:00
+								⁴ Qwant claims to also use its own crawler for results, but it’s still mostly Bing in my experience. See the "semi-independent" section.
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
-												Add "Rationale" and info from Matt

Matt from Gigablast answered some of my questions; I updated the article
with information from him. Some of that information found its way to the
"Rationale" section.

											
										
										
											2021-03-20 20:18:20 +00:00
+								⁵ Disconnect Search allows users to have results proxied from Bing or Yahoo, but Yahoo sources its results from Bing.
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
-												Add "Rationale" and info from Matt

Matt from Gigablast answered some of my questions; I updated the article
with information from him. Some of that information found its way to the
"Rationale" section.

											
										
										
											2021-03-20 20:18:20 +00:00
+								⁶ Yippy claims to be powered by a certain IBM brand (a brand that could correspond to any number of products) and annotates results with the phrase “Yippy Index”, but a side-by-side comparison with Bing and other Bing-based engines revealed results to be nearly identical.
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
-												Add "Rationale" and info from Matt

Matt from Gigablast answered some of my questions; I updated the article
with information from him. Some of that information found its way to the
"Rationale" section.

											
										
										
											2021-03-20 20:18:20 +00:00
+								⁷ Ask.moe was working on a FLOSS indexer; its search page stated an intention to switch to it from Bing at one point. This statement has since been removed.
-												New article: search engines with their own indexes

Squashed commit of the following:

commit f04ef91062824feedd36fb73396d2e8136f7ce56
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:47:35 2021 -0800

    Final draft

commit db25b7346073faf866fed62161ddfe74cf2762c1
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 13:37:35 2021 -0800

    Add gemtext version

commit de8c4f1f2890dcbe569765d50f8cf884df2d7ea5
Author: Rohan Kumar <seirdy@seirdy.one>
Date:   Wed Mar 10 12:52:20 2021 -0800

    New article (draft): search engines

											
										
										
											2021-03-10 21:48:19 +00:00
 								=> https://git.sr.ht/~danskeren/spider.moe  FLOSS indexer
-												Clarify alternatives that support limiting by TLD

In the info for search.tl, clarify that its single-TLD search isn't a
unique feature; Google and Bing support a search operator to achieve the
same result.

											
										
										
											2021-03-17 21:35:13 +00:00
-												Add "Rationale" and info from Matt

Matt from Gigablast answered some of my questions; I updated the article
with information from him. Some of that information found its way to the
"Rationale" section.

											
										
										
											2021-03-20 20:18:20 +00:00
+								⁸ This is based on a statement Right Dao made in on Reddit:
-												Add more details about some search engines

											
										
										
											2021-03-20 06:25:52 +00:00
 								=> https://reddit.com/comments/k4clx1/_/ge9dwmh/?context=1 Right Dao on Reddit
 								=> https://web.archive.org/web/20210320042457/https://i.reddit.com/r/degoogle/comments/k4clx1/right_dao_a_new_independent_search_engine_that/ge9dwmh/?context=1 Archive of the Reddit thread
-												Add "Rationale" and info from Matt

Matt from Gigablast answered some of my questions; I updated the article
with information from him. Some of that information found its way to the
"Rationale" section.

											
										
										
											2021-03-20 20:18:20 +00:00
+								⁹ Some search engines support the "site:" search operator to limit searches to subpages/subdomains of a single site or TLD. "site:.one", for instance, limits searches to websites with the ".one" TLD.
-												Correct error about Brave

New info from Solso on HN: https://news.ycombinator.com/item?id=27596830

											
										
										
											2021-06-23 05:31:35 +00:00
 								¹⁰ More information can be found in a HN subthread and the Cliqz tech blog:
 								=> https://news.ycombinator.com/item?id=27593801 HN comment thread for "Introducing Brave Search Beta"
 								=> https://0x65.dev/blog/2019-12-06/building-a-search-engine-from-scratch.html Tech @ Cliqz: Building a search engine from scratch
 								=> https://0x65.dev/blog/2019-12-10/search-quality-at-cliqz.html Tech @ Cliqz: Search quality at Cliqz