diff --git a/content/posts/search-engines-with-own-indexes.gmi b/content/posts/search-engines-with-own-indexes.gmi index ac0ca4e..5b36069 100644 --- a/content/posts/search-engines-with-own-indexes.gmi +++ b/content/posts/search-engines-with-own-indexes.gmi @@ -121,22 +121,24 @@ These engines fail badly at a few important tests. Otherwise, they seem to work * Burf.co: Very small index, but seems fine at ranking more relevant results higher. Allows site submission without any extra steps. * Siik: Lacks contact info, and the ToS and Privacy Policy links are dead. Seems to have PHP errors in the backend for some of its instant-answer widgets. If you scroll past all that, it does have web results powered by what seems to be its own index. These results do tend to be somewhat relevant, but the index seems too small for more specific queries. +* ChatNoir: An experimental engine by researchers that uses the Common Crawl index. The engine is open source. There's more information in its announcement on the Common Crawl mailing list (Google Groups). +=> https://www.chatnoir.eu/ ChatNoir +=> https://commoncrawl.org/ Common Crawl => https://burf.co/ Burf.co => https://siik.co/ Siik => https://inetdex.com inetdex.com -* ChatNoir: An experimental engine by researchers that uses the Common Crawl index. The engine is open source. There's more information in its announcement on the Common Crawl mailing list (Google Groups). * Secret Search Engine Labs: Very small index with very little SEO spam; it toes the line between a "search engine" and a "surf engine". It's best for reading about broad topics that would otherwise be dominated by SEO spam, thanks to its CashRank algorithm. Allows site submission. * Gabanza: a search engine from a hosting company. I found few details abou the search engine itself, and the index was small, but it was suitable for discovering new pages related to short broad queries. +* Jambot: docs, blog posts, etc. have not been updated since around 2006 but the engine continues to crawl and index new pages. Discovered in my access logs. Has a bias towards older content. -=> https://www.chatnoir.eu/ ChatNoir -=> https://commoncrawl.org/ Common Crawl => https://github.com/chatnoir-eu ChatNoir source code (GitHub) => https://groups.google.com/g/common-crawl/c/3o2dOHpeRxo/m/H2Osqz9dAAAJ ChatNoir Announcement => http://www.secretsearchenginelabs.com/ Secret Search Engine Labs => http://www.secretsearchenginelabs.com/tech/cashrank.php CashRank Algorithm => https://www.gabanza.com/ Gabanza - The new search engine. +=> https://jambot.com/ Jambot ### Unusable engines, irrelevant results diff --git a/content/posts/search-engines-with-own-indexes.md b/content/posts/search-engines-with-own-indexes.md index c8b7ae8..b581cec 100644 --- a/content/posts/search-engines-with-own-indexes.md +++ b/content/posts/search-engines-with-own-indexes.md @@ -116,7 +116,7 @@ These engines pass most of the tests listed in the "methodology" section. All of : **My favorite generalist engine on this page.** Stract supports advanced ranking customization by allowing users ti import "optics" files, like a better version of Brave's "goggles" feature. [Stract is fully open-source](https://github.com/StractOrg/stract), with code released under an AGPL-3.0 license. The index is isn't massive but it's big enough to be a useful supplement to more major engines. Stract started with the Common Crawl index, but now uses its own crawler. Plans to add contextual ads and a subscription option for ad-free search. Discovered in my access logs. [Right Dao](https://rightdao.com) -: Very fast, good results. Passes the tests fairly well. It plans on including query-based ads if/when its user base grows.[^8] +: Very fast, good results. Passes the tests fairly well. It plans on including query-based ads if/when its user base grows.[^8] For the past few months, its index seems to have focused more on large, established sites rather than smaller, independent ones. It seems to be a bit lacking in more recent pages. [Alexandria](https://www.alexandria.org/) : A pretty new "non-profit, ad free" engine, with [freely-licensed code](https://github.com/alexandria-org/alexandria). Surprisingly good at finding recent pages. Its index is built from the Common Crawl; it isn't as big as Gigablast or Right Dao but its ranking is great. @@ -164,6 +164,9 @@ These engines fail badly at a few important tests. Otherwise, they seem to work [Gabanza](https://www.gabanza.com/) : A search engine from a hosting company. I found few details abou the search engine itself, and the index was small, but it was suitable for discovering new pages related to short broad queries. +[Jambo](https://jambot.com/) +: Docs, blog posts, etc. have not been updated since around 2006 but the engine continues to crawl and index new pages. Discovered in my access logs. Has a bias towards older content. + ### Fledgling engines Results from these search engines don't seem particularly relevant; indexes in this category tend to be small.