1
0
Fork 0
mirror of https://git.sr.ht/~seirdy/seirdy.one synced 2024-11-23 21:02:09 +00:00

Compare commits

...

2 commits

Author SHA1 Message Date
Rohan Kumar
40585964fe
Fix dead link 2022-07-07 00:15:23 -07:00
Rohan Kumar
ca8736504a
Retire Meorca, fix dead links, add Bing syndicate 2022-07-07 00:07:08 -07:00
3 changed files with 9 additions and 7 deletions

View file

@ -56,6 +56,7 @@ These are large engines that pass all my standard tests and more.
* Givero * Givero
* Swisscows * Swisscows
* Fireball * Fireball
* Netzzappen
* You.com¹¹ * You.com¹¹
* Partially powers MetaGer by default; this can be turned off * Partially powers MetaGer by default; this can be turned off
* At this point, I mostly stopped adding Bing-based search engines. There are just too many. * At this point, I mostly stopped adding Bing-based search engines. There are just too many.
@ -78,7 +79,7 @@ Google, Bing, and Yandex support structured data such as microformats1, microdat
These engines pass most of the tests listed in the "methodology" section. All of them seem relatively privacy-friendly. These engines pass most of the tests listed in the "methodology" section. All of them seem relatively privacy-friendly.
* Right Dao: very fast, good results. Passes the tests fairly well. It plans on including query-based ads if/when its userbase grows.⁸ * Right Dao: very fast, good results. Passes the tests fairly well. It plans on including query-based ads if/when its userbase grows.⁸ For the past few months, its index seems to have focused more on large, established sites rather than smaller, independent ones.
=> https://rightdao.com Right Dao => https://rightdao.com Right Dao
@ -124,11 +125,9 @@ These engines fail badly at a few important tests. Otherwise, they seem to work
=> https://siik.co/ Siik => https://siik.co/ Siik
=> https://inetdex.com inetdex.com => https://inetdex.com inetdex.com
* Meorca: A UK-based search engine that claims not to "index pornography or illegal content websites". It also features an optional social network ("blog"). Discovered in the seirdy.one access logs.
* ChatNoir: An experimental engine by researchers that uses the Common Crawl index. The engine is open source. There's more information in its announcement on the Common Crawl mailing list (Google Groups). * ChatNoir: An experimental engine by researchers that uses the Common Crawl index. The engine is open source. There's more information in its announcement on the Common Crawl mailing list (Google Groups).
* Secret Search Engine Labs: Very small index with very little SEO spam; it toes the line between a "search engine" and a "surf engine". It's best for reading about broad topics that would otherwise be dominated by SEO spam, thanks to its CashRank algorithm. Allows site submission. * Secret Search Engine Labs: Very small index with very little SEO spam; it toes the line between a "search engine" and a "surf engine". It's best for reading about broad topics that would otherwise be dominated by SEO spam, thanks to its CashRank algorithm. Allows site submission.
=> https://meorca.com/ Meorca Search Engine
=> https://www.chatnoir.eu/ ChatNoir => https://www.chatnoir.eu/ ChatNoir
=> https://commoncrawl.org/ Common Crawl => https://commoncrawl.org/ Common Crawl
=> https://github.com/chatnoir-eu ChatNoir source code (GitHub) => https://github.com/chatnoir-eu ChatNoir source code (GitHub)
@ -329,10 +328,12 @@ These engines were originally included in the article, but have since been disco
* gus.guru: the original Gemini search engine. The index doesn't seem to be updated anymore. * gus.guru: the original Gemini search engine. The index doesn't seem to be updated anymore.
* wbsrch: In addition to its generalist search, it also had many other utilities related to domain name statistics. Failed multiple tests. Its index was a bit dated; it had an old backlog of sites it hadnt finished indexing. It also had several dedicated per-language indexes. * wbsrch: In addition to its generalist search, it also had many other utilities related to domain name statistics. Failed multiple tests. Its index was a bit dated; it had an old backlog of sites it hadnt finished indexing. It also had several dedicated per-language indexes.
* Gowiki: Very young, small index, but showed promise. I discovered this in the seirdy.one access logs. It was only available in the US. Seems down as of early 2022. * Gowiki: Very young, small index, but showed promise. I discovered this in the seirdy.one access logs. It was only available in the US. Seems down as of early 2022.
* Meorca: A UK-based search engine that claims not to "index pornography or illegal content websites". It also features an optional social network ("blog"). Discovered in the seirdy.one access logs.
=> gemini://gus.guru/ gus.guru => gemini://gus.guru/ gus.guru
=> https://xangis.com/the-wbsrch-experiment/ The Wbsrch Experiment => https://xangis.com/the-wbsrch-experiment/ The Wbsrch Experiment
=> https://gowiki.com Gowiki => https://gowiki.com Gowiki
=> https://web.archive.org/web/20220429143153/https://www.meorca.com/search/ Meorca Search Engine (Wayback Machine snapshot)
## Exclusions ## Exclusions

View file

@ -86,6 +86,7 @@ These are large engines that pass all my standard tests and more.
- Givero - Givero
- Swisscows - Swisscows
- Fireball - Fireball
- Netzzappen
- You.com[^6] - You.com[^6]
- Partially powers MetaGer by default; this can be turned off - Partially powers MetaGer by default; this can be turned off
- At this point, I mostly stopped adding Bing-<wbr />based search engines. There are just too many. - At this point, I mostly stopped adding Bing-<wbr />based search engines. There are just too many.
@ -136,8 +137,6 @@ These engines fail badly at a few important tests. Otherwise, they seem to work
- [websearchengine.org](https://websearchengine.org) and [tuxdex.com](https://tuxdex.com): Both are run by the same people, powered by their [inetdex.com](https://inetdex.com) index. Searches are fast, but crawls are a bit shallow. Claims to have an index of 10 million domains, and not to use cookies. - [websearchengine.org](https://websearchengine.org) and [tuxdex.com](https://tuxdex.com): Both are run by the same people, powered by their [inetdex.com](https://inetdex.com) index. Searches are fast, but crawls are a bit shallow. Claims to have an index of 10 million domains, and not to use cookies.
- [Meorca](https://meorca.com/): A UK-based search engine that claims not to "index pornography or illegal content websites". It also features an optional social network ("blog"). Discovered in the seirdy.one access logs.
- [ChatNoir](https://www.chatnoir.eu/): An experimental engine by researchers that uses the [Common Crawl](https://commoncrawl.org/) index. The engine is [open source](https://github.com/chatnoir-eu). See the [announcement](https://groups.google.com/g/common-crawl/c/3o2dOHpeRxo/m/H2Osqz9dAAAJ) on the Common Crawl mailing list (Google Groups). - [ChatNoir](https://www.chatnoir.eu/): An experimental engine by researchers that uses the [Common Crawl](https://commoncrawl.org/) index. The engine is [open source](https://github.com/chatnoir-eu). See the [announcement](https://groups.google.com/g/common-crawl/c/3o2dOHpeRxo/m/H2Osqz9dAAAJ) on the Common Crawl mailing list (Google Groups).
- [Secret Search Engine Labs](http://www.secretsearchenginelabs.com/): Very small index with very little SEO spam; it toes the line between a "search engine" and a "surf engine". It's best for reading about broad topics that would otherwise be dominated by SEO spam, thanks to its [CashRank algorithm](http://www.secretsearchenginelabs.com/tech/cashrank.php). Allows site submission. - [Secret Search Engine Labs](http://www.secretsearchenginelabs.com/): Very small index with very little SEO spam; it toes the line between a "search engine" and a "surf engine". It's best for reading about broad topics that would otherwise be dominated by SEO spam, thanks to its [CashRank algorithm](http://www.secretsearchenginelabs.com/tech/cashrank.php). Allows site submission.
@ -298,7 +297,9 @@ These engines were originally included in the article, but have since been disco
- [wbsrch](https://wbsrch.com/): In addition to its generalist search, it also had many other utilities related to domain name statistics. Failed multiple tests. Its index was a bit dated; it had an old backlog of sites it hadn't finished indexing. It also had several dedicated per-language indexes. - [wbsrch](https://wbsrch.com/): In addition to its generalist search, it also had many other utilities related to domain name statistics. Failed multiple tests. Its index was a bit dated; it had an old backlog of sites it hadn't finished indexing. It also had several dedicated per-language indexes.
- [Gowiki](https://gowiki.com): Very young, small index, but showed promise. I discovered this in the seirdy.one access logs. It was only available in the US. Seems down as of early 2022. - [Gowiki](https://web.archive.org/web/20211226043304/https://www.gowiki.com/): Very young, small index, but showed promise. I discovered this in the seirdy.one access logs. It was only available in the US. Seems down as of early 2022.
- [Meorca](https://web.archive.org/web/20220429143153/https://www.meorca.com/search/): A UK-based search engine that claims not to "index pornography or illegal content websites". It also features an optional social network ("blog"). Discovered in the seirdy.one access logs.
Exclusions Exclusions
---------- ----------

View file

@ -1,4 +1,4 @@
{{- $wbmLinks := (slice "https://si3t.ch/log/2021-04-18-entetes-floc.html" "https://xmpp.org/2021/02/newsletter-02-feburary/" "https://gurlic.com/technology/post/393626430212145157" "https://gurlic.com/technology/post/343249858599059461" "https://www.librepunk.club/@penryn/108411423190214816" "https://benign.town/@josias/108457015755310198") -}} {{- $wbmLinks := (slice "https://si3t.ch/log/2021-04-18-entetes-floc.html" "https://xmpp.org/2021/02/newsletter-02-feburary/" "https://gurlic.com/technology/post/393626430212145157" "https://gurlic.com/technology/post/343249858599059461" "https://www.librepunk.club/@penryn/108411423190214816" "https://benign.town/@josias/108457015755310198" "http://www.tuxmachines.org/node/148146") -}}
<hr /> <hr />
<section aria-labelledby="webmentions"> <section aria-labelledby="webmentions">
<h2 id="webmentions" tabindex="-1">Web&#173;mentions</h2> <h2 id="webmentions" tabindex="-1">Web&#173;mentions</h2>