1
0
Fork 0
mirror of https://git.sr.ht/~seirdy/seirdy.one synced 2024-11-23 21:02:09 +00:00

Compare commits

...

5 commits

Author SHA1 Message Date
Rohan Kumar
1321f0bc35
Site design: refactor compatibility statement
- Mark Dillo as an abandoned browser
- Mention litehtml, Ultralight
- Move Goanna and Ultralight to own section
- Move Tor Browser to own section
- Add more details on current levels of (in)compatibility and the
  standards I hold myself to.
2022-08-05 07:22:42 -07:00
Rohan Kumar
8f11ec50e4
Add SeSe Engine, reformat to description-list
- Add newly-discovered SeSe Engine
- Mention that Yep was formerly known as FairSearch
- Mention that Right Dao and SeSe start their crawls at Wikipedia
- Refactor markup: turn unordered lists into description-lists. It
  makes sense to use them in this context.
2022-08-05 07:20:37 -07:00
Rohan Kumar
c8ab19dc14
Replace dead link 2022-08-03 21:33:51 -07:00
Rohan Kumar
c13e0fe690
Minor semantic tweaks
- Add missing p-name microformats to some citations
- Add download link for audio element
2022-08-03 21:33:20 -07:00
Rohan Kumar
d42f304f71
Many style tweaks
- Fix unnecessarily excessive spacing around "li > article" entry data
  (was due to containment)
- Aesthetic tweak: ugly underline between microformat u-photo and p-name
- Make CSS file smaller by using some microformats2 classnames instead
  of microdata attributes.
2022-08-03 21:31:05 -07:00
11 changed files with 225 additions and 110 deletions

View file

@ -66,7 +66,9 @@
}
/* Specified last so it overrides :visited. I chose this color because
* it's dimmer, so it's noticeable even without color vision. */
* it's dimmer, so it's noticeable even without color vision. However,
* it still meets the experimental SAPC-APCA threshold for "spot"
* contrast. */
a:active {
color: #f83;
}

View file

@ -4,18 +4,18 @@
* specific a11y, compatibility, or critical
* usability need.
*
* Two exceptions: I re-set the input styles so Safari wouldn't make
* them pill-shaped, and I tweaked some margins/paddings to make some
* things evenly aligned.
* Three exceptions:
* - I re-set the input styles so Safari wouldn't make them pill-shaped
* - I tweaked some margins/paddings to make some things evenly aligned
* - I made my IndieWeb profile photo align without an underline
* on the whitespace between the photo and my name.
*
* I also don't use any classes except when styling depends on
* *content* of an element rather than structure/semantics. Examples
* include images that look better with pixelated upscaling, and
* posts on the list of entries in the "notes" section that are tall
* I also don't use any non-microformats classes except when styling
* depends on *content* of an element rather than structure/semantics.
* Examples include images that look better with pixelated upscaling,
* and posts on the list of entries in the "notes" section that are tall
* and need a larger contain-intrinsic-size.
* One exception: a class for narrow width body text. My HTML contains
* microformats2 classnames for IndieWeb parsers, but I don't actually
* use those for styling.
* One exception: a class for narrow width body text.
*
* Some pages (e.g. post archives) are really long despite having a
* small download size. Rather than resort to pagination, I decided to
@ -104,7 +104,7 @@ html {
/* 45em is too wide for long body text.
* Typically meets SC 1.4.8, plus or minus a few characters. */
[itemprop="articleBody"],
.e-content,
[itemprop="dataFeedElement"],
.narrow {
margin: auto;
@ -140,6 +140,10 @@ html {
main > hr {
margin: 0 .5em;
}
/* containment has increased spacing of the first paragraph of h-feed
* entries; offset that. */
li .p-name + p,
header hr {
margin-bottom: 0;
}
@ -231,7 +235,7 @@ html {
aside > a, /* Used for section permalinks */
dt > a,
li > a,
[itemprop="comment"] dd > a,
.u-comment dd > a,
[itemprop="breadcrumb"] a,
[itemprop="breadcrumb"] > span {
padding: .75em .25em;
@ -262,7 +266,7 @@ html {
header > nav,
a[href="#h1"], /* skip link */
[itemprop="comment"] dd > a ,
.u-comment dd > a ,
footer > nav,
/* List items with direct hyperlink children should only have one
* hyperlink. */
@ -423,7 +427,7 @@ h1 {
/* Very narrow screens: full hyphenation.
* This is the typical width of a smart feature phone. */
@media (max-width: 250px) {
@media (max-width: 272px) {
body {
font-size: 100%;
-webkit-hyphens: auto;
@ -498,7 +502,7 @@ samp {
}
/* Remove list style from data feeds. */
[itemtype="https://schema.org/DataFeed"] > ol {
.h-feed > ol {
list-style-type: none;
padding: 0;
}
@ -507,7 +511,7 @@ samp {
* viewport, triggering horizontal scrolling. Allow breaking words in
* content I don't control (comments, names). For content I do control,
* I just add soft hyphens to the HTML. */
[itemprop="comment"],
.u-comment,
:not(pre) > code,
:not(pre) > samp,
span[itemtype="https://schema.org/Person"] {
@ -548,13 +552,27 @@ summary {
/* center standalone images; same justification as
* for centering the body contents. Also makes images easier to see
* for people holding a device with one hand. */
[itemprop="articleBody"] img {
.e-content img {
display: block;
height: auto;
margin: auto;
max-width: 100%;
}
.h-card .u-photo {
height: 1em;
width: 1em;
vertical-align: -.1em;
}
.h-card a.u-uid {
text-decoration: none;
}
a .p-name {
text-decoration: underline;
}
/* Stretch out audio elements so the progress meter is easier to use. */
audio {
width: 100%;

View file

@ -24,6 +24,11 @@
nav:not([itemprop="breadcrumb"]) a:not([rel="home"]) {
display: none;
}
[role="note"] p,
[role="doc-tip"] p {
margin: .25em 0;
}
}
/* Print: don't break up self-contained items. */

View file

@ -16,7 +16,7 @@ About me
<meta itemprop="url" content="https://seirdy.one" />
<div itemprop="author" itemscope="" itemtype="https://schema.org/Person" itemid="https://seirdy.one/#seirdy" class="p-author author h-card vcard" id="seirdy">
I'm <a itemprop="url" href="https://seirdy.one" rel="author me home canonical" class="u-url u-uid url"> {{% indieweb-icon %}} <span itemprop="name" class="p-name fn n"> <span itemprop="givenName" class="p-given-name given-name">Rohan</span>&#160;<span itemprop="familyName" class="p-family-name family-name">Kumar</span></span></a> (<span class="p-pronouns"><a href="https://pronoun.is/he/him" class="u-pronouns"><span class="p-pronoun">he</span>/<span class="p-pronoun">him</span></a></span>). I'm also known by my more casual online handle <span itemprop="alternateName" class="p-nickname nickname">Seirdy</span> (<span class="p-pronouns"><a href="https://pronoun.is/they/them" class="u-pronouns"><span class="p-pronoun">they</span>/<span class="p-pronoun">them</span></a></span>). Mixing them up is fine.
I'm <a itemprop="url" href="https://seirdy.one" rel="author me home canonical" class="u-url u-uid url">{{% indieweb-icon %}} <span itemprop="name" class="p-name fn n"> <span itemprop="givenName" class="p-given-name given-name">Rohan</span>&#160;<span itemprop="familyName" class="p-family-name family-name">Kumar</span></span></a> (<span class="p-pronouns"><a href="https://pronoun.is/he/him" class="u-pronouns"><span class="p-pronoun">he</span>/<span class="p-pronoun">him</span></a></span>). I'm also known by my more casual online handle <span itemprop="alternateName" class="p-nickname nickname">Seirdy</span> (<span class="p-pronouns"><a href="https://pronoun.is/they/them" class="u-pronouns"><span class="p-pronoun">they</span>/<span class="p-pronoun">them</span></a></span>). Mixing them up is fine.
The Director's Cut of my bio is at my [About page](./about/ "{itemprop='subjectOf'}").

View file

@ -46,7 +46,6 @@ I also go further than WCAG in many aspects:
- I ensure that the page works on extremely narrow viewports without triggering two-dimensional scaling. It should work at widths well below 200 CSS pixels.
### Assessment and evaluation
I test each WCAG success criterion myself using the mainstream browser engines (Blink, Gecko, WebKit). I test using multiple screen readers: Orca (primary, with Firefox and Epiphany), NVDA (with Firefox and Chromium), Windows Narrator (with Microsoft Edge), Apple VoiceOver (with desktop and mobile Safari), and Android TalkBack (with Chromium).
@ -78,28 +77,47 @@ This site sticks to Web standards. I regularly run a local build of [the Nu HTML
I also perform cross-browser testing for both HTML and XHTML versions of my pages. I test with, but do not necessarily endorse, a large variety of browsers:
- I maintain excellent compatibility with **mainstream engines:** Blink (Chromium and Edge), WebKit (Safari, Epiphany), and Gecko (Firefox). The hidden service also works well with the Tor Browser.
- The site works well with **textual browsers.** Lynx and Links2 are first-class citizens for which all features work as intended. [w3m doesn't support soft hyphens](https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=830173), but the site is still otherwise usable in it. I maintain compatibility with these engines by making CSS a strictly-optional progressive enhancement and using semantic markup.
Mainstream engines
: I maintain excellent compatibility with mainstream engines: Blink (Chromium, Edge, QtWebEngine), WebKit (Safari, Epiphany), and Gecko (Firefox).
- I also regularly test compatibility with **current alternative engines:** the SerenityOS browser, Servo, NetSurf, Dillo, Kristall, and Goanna (Pale Moon's Gecko fork). I have excellent compatibility with Goanna and Servo. The site is usable in NetSurf, Dillo, and the SerenityOS browser; the always-expanded `<details>` elements might look odd. [The SerenityOS browser doesn't support ECDSA certificates](https://github.com/SerenityOS/serenity/issues/14160), but the Tildeverse mirror works fine. It also has some issues displaying my SVG avatar; it does not attempt to use the PNG fallback.
Tor Browser
: My Tor hidden service also works well with the Tor Browser, with the exception of [a page containing an `<audio>` element](http://wgq3bd2kqoybhstp77i3wrzbfnsyd27wt34psaja4grqiezqircorkyd.onion/posts/2022/07/01/experiment-copilot-legality/). The `<audio>` element can't play in the Tor Browser due to a bug involving NoScript and Firefox's handling of the `sandbox` CSP directive. To work around the issue, I include link to download the audio.
- I occasionally test **abandoned engines,** sometimes with a TLS-terminating proxy if necessary. These engines include Tkhtml, KHTML, Internet Explorer (with and without compatibility mode), Netscape Navigator, Opera Presto,[^1] and outdated versions of current browsers. The aforementioned issue with `<details>` applies to all of these choices. I use Linux, but testing in browsers like Internet Explorer depends on my access to a Windows machine.
Mainstream engine forks
: Pale Moon and recent versions of K-Meleon use Goanna, a single-threaded fork of Firefox's Gecko engine. Ultralight is a proprietary, source-available, fork of WebKit focused on lightweight embedded webviews. My site should work in both engines without any noticeable issues.
Alternative engines
: I test compatibility with current alternative engines: the SerenityOS browser, Servo, NetSurf, Kristall, and litehtml. I have excellent compatibility with litehtml and Servo. The site is usable in NetSurf, and the SerenityOS browser. Only Servo supports `<details>`. [The SerenityOS browser doesn't support ECDSA certificates](https://github.com/SerenityOS/serenity/issues/14160), but the Tildeverse mirror works fine. The SerenityOS browser also has some issues displaying my SVG avatar; it does not attempt to use the PNG fallback.
Textual browsers
: The site works well with textual browsers. Lynx and Links2 are first-class citizens for which all features work as intended. I also test in [felinks (an ELinks fork)](https://github.com/rkd77/elinks), edbrowse, and w3m. [w3m doesn't support soft hyphens](https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=830173), but the site is still otherwise usable in it. I maintain compatibility with these engines by making CSS a strictly-optional progressive enhancement and using semantic markup. I also test with ; I occasionally try Edbrowse too. In all textual browsers, the aforementioned incomplete `<details>` handling applies.
Abandoned engines
: I occasionally test abandoned engines, sometimes with a TLS-terminating proxy if necessary. These engines include Tkhtml, KHTML, Dillo,[^1] Internet Explorer[^2] (with and without compatibility mode), Netscape Navigator, old Presto-based Opera versions,[^3] and outdated versions of current browsers. The aforementioned issue with `<details>` applies to all of these choices. I use Linux, but testing in browsers like Internet Explorer depends on my access to a Windows machine. Besides the `<details>` issues, the site works perfectly well in Internet Explorer 11 and Opera Presto. The site has layout issues but remains usable in Tkhtml, KHTML, and Netscape.
I strive to maintain compatibility to the following degrees:
- Works without major issues in mainstream engines, the Tor browser, Goanna, and Ultralight.
- Fully operable in textual browsers, litehtml, and NetSurf. Some issues (e.g. missing `<details>`) might make the experience unpleasant, but no major functionality will be disabled.
- Baseline functionality in abandoned engines, Dillo, the SerenityOS browser. Some ancillary features may not work (e.g. forms for Webmentions and search), but you'll still be able to browse and read.
Some engines I have not yet tested, but hope to try in the future:
- [Flow Browser](https://www.ekioh.com/flow-browser/)
- [gngr](https://gngr.info/)
- [WeasyPrint](https://weasyprint.org/)
- [Netzhaut](https://netzhaut.dev/)
- [Kozmonaut](https://github.com/twilco/kosmonaut)
- [Moon](https://github.com/ZeroX-DG/moon)
- [hastur](https://github.com/robinlinden/hastur)
- [Flow Browser](https://www.ekioh.com/flow-browser/)
Machine-friendliness
--------------------
I think making a site machine-friendly is a great alternative perspective to traditional SEO, the latter of which I think tends to incentivise low-quality content and makes searching difficult. It's a big part of what I've decided to call ["agent optimization"]({{<relref "notes/agent-optimization.md">}}).
This site is **parser-friendly.** It uses polygot (X)HTML5 markup containing schema.org microdata, microformats2, and legacy microformats. Microformats are useful for IndieWeb compatibility; schema.org microdata is useful for various forms of content-extraction (such as "reading mode" implementations) and search engines. I've also sprinkled in some Creative Commons vocabulary using RDFa syntax.
This site is **parser-friendly.** It uses well-formed polygot (X)HTML5 markup containing schema.org microdata, microformats2, and legacy microformats. Microformats are useful for IndieWeb compatibility; schema.org microdata is useful for various forms of content-extraction (such as "reading mode" implementations) and search engines. I've also sprinkled in some Creative Commons vocabulary using RDFa syntax.
I make Atom feeds available for articles and notes, and have a combined Atom feed for both. These feeds are enhanced with Ostatus and ActivityStreams XML namespaces.
@ -170,6 +188,10 @@ Privacy
This site is **privacy-respecting.** Its CSP blocks all scripts, third-party content, and other problematic features. I describe how I go out of my way to reduce the information you can transmit on this site in [my privacy policy](../privacy/).
[^1]: Opera Presto isn't really abandoned. Opera Mini's "Extreme" mode still uses a server-side Presto rendering engine; see {{<mention-work itemprop="citation" role="doc-credit" itemtype="Article">}}{{<cited-work name="Opera Browsers, Modes & Engines" url="https://dev.opera.com/articles/browsers-modes-engines/" extraName="headline">}}{{</mention-work>}}. That being said, I do test with the outdated desktop Presto engine in a sandboxed environment.
[^1]: Although there's no official announcement of Dillo's demise, the browser development has been inactive for a while. The official site, including its repository, is down; [I mirrored the Dillo repository.](../../notes/2022/06/14/dillo-repository-mirror/)
[^2]: [Internet Explorer's engine isn't abandoned](../../notes/2022/06/15/internet-explorer-is-almost-gone/), but the consumer version I have access to is.
[^3]: Opera Presto isn't really abandoned. Opera Mini's "Extreme" mode still uses a server-side Presto rendering engine; see {{<mention-work itemprop="citation" role="doc-credit" itemtype="Article">}}{{<cited-work name="Opera Browsers, Modes & Engines" url="https://dev.opera.com/articles/browsers-modes-engines/" extraName="headline">}}{{</mention-work>}}. That being said, I do test with the outdated desktop Presto engine in a sandboxed environment.

View file

@ -8,7 +8,7 @@ Firefox's multi-process architecture was overhauled, starting with a [utility pr
They've rolled out a separate GPU process on some platforms; the roll-out will likely finish this year.
Regarding toolchain hardening: Chromium official builds use [Clang's CFI sanitizer](https://clang.llvm.org/docs/ControlFlowIntegrity.html); Firefox doesn't. However, a subset of Firefox's libraries support [RLBox sandboxing](https://docs.rlbox.dev/). This isn't a complete solution, but is still a welcome change. [The Tor Browser disables libgraphite on the "safer" security level](https://gitweb.torproject.org/torbutton.git/tree/modules/security-prefs.js?id=c8f7cd3fec5d5845179fcf71ad46888f2d14c8b0) due to security concerns which RLBox may have addressed.
Regarding toolchain hardening: Chromium official builds use [Clang's CFI sanitizer](https://clang.llvm.org/docs/ControlFlowIntegrity.html); Firefox doesn't. However, a subset of Firefox's libraries support [RLBox sandboxing](https://hacks.mozilla.org/2021/12/webassembly-and-back-again-fine-grained-sandboxing-in-firefox-95/). This isn't a complete solution, but is still a welcome change. [The Tor Browser disables libgraphite on the "safer" security level](https://gitweb.torproject.org/torbutton.git/tree/modules/security-prefs.js?id=c8f7cd3fec5d5845179fcf71ad46888f2d14c8b0) due to security concerns which RLBox may have addressed.
I'm looking forward to seeing [PID namespace isolation](https://bugzilla.mozilla.org/show_bug.cgi?id=1151624) at some point.

View file

@ -75,30 +75,30 @@ These are large engines that pass all my standard tests and more.
Google, Bing, and Yandex support structured data such as microformats1, microdata, RDFa, Open Graph markup, and JSON-LD. Yandex's support for microformats1 is limited; for instance, it can parse h-card metadata for organizations but not people. Open Graph and Schema.org are the only supported vocabularies I'm aware of. Mojeek is evaluating structured data; it's interested in Open Graph and Schema.org vocabularies.
### Smaller indexes, relevant results
### Smaller indexes or less relevant results
These engines pass most of the tests listed in the "methodology" section. All of them seem relatively privacy-friendly.
These engines pass most of the tests listed in the "methodology" section. All of them seem relatively privacy-friendly. I wouldn't recommend using these engines to find specific answers; they're better for learning about a topic by finding interesting pages related to a set of keywords.
* Right Dao: very fast, good results. Passes the tests fairly well. It plans on including query-based ads if/when its userbase grows.⁸ For the past few months, its index seems to have focused more on large, established sites rather than smaller, independent ones.
=> https://rightdao.com Right Dao
* Gigablast: Its been around for a while and also sports a classic web directory. Searches are a bit slow, and it charges to submit sites for crawling. It powers Private.sh. Gigablast is tied with Right Dao for quality.
=> https://gigablast.com/ Gigablast
=> https://private.sh Private.sh
* Alexandria: A pretty new "non-profit, ad free" engine, with freely-licensed code. Surprisingly good at finding recent pages. Its index is built from the Common Crawl; it isn't as big as Gigablast or Right Dao but its ranking is great.
=> https://www.alexandria.org/ Alexandria
=> https://github.com/alexandria-org/alexandria Alexandria engine source code
* Yep: an ambitious engine from Ahrefs, an SEO/backlink-finder company, that "shares ad profit with creators and protects your privacy". Most engines show results that include keywords from or related to the query; Yep also shows results linked by pages containing the query. In other words, not all results contain relevant keywords. This makes it excellent for less precise searches and discovery of "related sites", especially with its index of *hundreds of billions of pages.* It's far worse at finding very specific information or recent events for now, but it will probably improve.
* Yep: an ambitious engine from Ahrefs, an SEO/backlink-finder company, that "shares ad profit with creators and protects your privacy". Most engines show results that include keywords from or related to the query; Yep also shows results linked by pages containing the query. In other words, not all results contain relevant keywords. This makes it excellent for less precise searches and discovery of "related sites", especially with its index of *hundreds of billions of pages.* It's far worse at finding very specific information or recent events for now, but it will probably improve. It was known as "FairSearch" before its official launch.
=> https://yep.com/ Yep
Yep supports Open Graph and some JSON-LD at the moment. A look through the source code for Alexandria and Gigablast didn't seem to reveal the use of any structured data
* SeSe Engine: Although it's a Chinese engine, its index seems to have a large-enough proportion of English content to fit here. The engine is open-source. It has surprisingly good results for such a low-budget project. Each result is annotated with detailed ranking metadata such as keyword relevance and backlink weight. Discovered in my access logs.
=> https://sese.yyj.moe/ SeSe Engine
=> https://github.com/RimoChan/sese-engine SeSe back-end Python code
=> https://github.com/YunYouJun/sese-engine-ui SeSe-UI Vue-based front-end
Yep supports Open Graph and some JSON-LD at the moment. A look through the source code for Alexandria and Gigablast didn't seem to reveal the use of any structured data. The surprising quality of results from SeSe and Right Dao seems influenced by the crawlers' high-quality starting locations (e.g. Wikipedia).
### Smaller indexes, hit-and-miss

View file

@ -50,8 +50,9 @@ General indexing search-engines
These are large engines that pass all my standard tests and more.
- Google: the biggest index. Allows submitting pages and sitemaps for crawling, and [even supports WebSub](https://developers.google.com/search/docs/advanced/sitemaps/build-sitemap#addsitemap) to automate the process. Powers a few other engines:
Google
: The biggest index. Allows submitting pages and sitemaps for crawling, and [even supports WebSub](https://developers.google.com/search/docs/advanced/sitemaps/build-sitemap#addsitemap) to automate the process. Powers a few other engines:
- [Startpage](https://www.startpage.com/), possibly the most popular Google proxy.
- [GMX Search](https://search.gmx.com/web), run by a popular German email provider.
@ -64,8 +65,8 @@ These are large engines that pass all my standard tests and more.
- A host of other engines using [Programmable Search Engine's](https://developers.google.com/custom-search/) client-side scripts.
- Bing: the runner-up. Allows submitting pages and sitemaps for crawling without login using [the IndexNow API](https://www.indexnow.org/). Its index powers many other engines:
Bing
: The runner-up. Allows submitting pages and sitemaps for crawling without login using [the IndexNow API](https://www.indexnow.org/). Its index powers many other engines:
- Yahoo (and its sibling engine, One&shy;Search)
- DuckDuck&shy;Go[^2]
- AOL
@ -91,91 +92,128 @@ These are large engines that pass all my standard tests and more.
- Partially powers MetaGer by default; this can be turned off
- At this point, I mostly stopped adding Bing-<wbr />based search engines. There are just too many.
- Yandex: originally a Russian search engine, it now has an English version. Some Russian results bleed into its English site. Like Bing, it allows submitting pages and sitemaps for crawling using the IndexNow API. Powers:
Yandex
: Originally a Russian search engine, it now has an English version. Some Russian results bleed into its English site. Like Bing, it allows submitting pages and sitemaps for crawling using the IndexNow API. Powers:
- Epic Search (went paid-only as of June 2021)
- Occasionally powers DuckDuck&shy;Go's link results instead of Bing <ins cite="https://energycommerce.house.gov/committee-activity/hearings/hearing-on-holding-big-tech-accountable-legislation-to-protect-online">(update: DuckDuckGo has "paused" its partnership with Yandex, confirmed in <cite>[Hearing on “Holding Big Tech Accountable: Legislation to Protect Online Users”](https://energycommerce.house.gov/committee-activity/hearings/hearing-on-holding-big-tech-accountable-legislation-to-protect-online)</cite></ins>
- Petal, for Russian users only.
- [Mojeek](https://www.mojeek.com/): Seems privacy-oriented with a large index containing billions of pages. Quality isn't at GBY's level, but its not bad either. If I had to use Mojeek as my default general search engine, I'd live. Partially powers [eTools.ch](https://www.etools.ch/). At this moment, _I think that Mojeek is the best alternative to GBY_ for general search.
[Mojeek](https://www.mojeek.com/)
: Seems privacy-oriented with a large index containing billions of pages. Quality isn't at GBY's level, but its not bad either. If I had to use Mojeek as my default general search engine, I'd live. Partially powers [eTools.ch](https://www.etools.ch/). At this moment, _I think that Mojeek is the best alternative to GBY_ for general search.
- [Petal Search](https://petalsearch.com/). A search engine by Huawei that recently switched from searching for Android apps to general search in order to reduce dependence on Western search providers. Despite its surprisingly good results, I wouldn't recommend it due to privacy concerns: its privacy policy describes advanced fingerprinting metrics, and it doesn't work without JavaScript. Requires an account to submit sites. I discovered this via my access logs. Be aware that in some jurisdictions, it doesn't use its own index: in Russia and some EU regions it uses Yandex and Qwant, respectively.
[Petal Search](https://petalsearch.com/)
: A search engine by Huawei that recently switched from searching for Android apps to general search in order to reduce dependence on Western search providers. Despite its surprisingly good results, I wouldn't recommend it due to privacy concerns: its privacy policy describes advanced fingerprinting metrics, and it doesn't work without JavaScript. Requires an account to submit sites. I discovered this via my access logs. Be aware that in some jurisdictions, it doesn't use its own index: in Russia and some EU regions it uses Yandex and Qwant, respectively.
Google, Bing, and Yandex support structured data such as microformats1, microdata, RDFa, Open Graph markup, and JSON-LD. Yandex's support for microformats1 is limited; for instance, it can parse `h-card` metadata for organizations but not people. Open Graph and Schema.org are the only supported vocabularies I'm aware of. Mojeek is evaluating structured data; it's interested in Open Graph and Schema.org vocabularies.
### Smaller indexes or less relevant results
These engines pass most of the tests listed in the "methodology" section. All of them seem relatively privacy-friendly.
These engines pass most of the tests listed in the "methodology" section. All of them seem relatively privacy-friendly. I wouldn't recommend using these engines to find specific answers; they're better for learning about a topic by finding interesting pages related to a set of keywords.
- [Right Dao](https://rightdao.com): very fast, good results. Passes the tests fairly well. It plans on including query-based ads if/when its user base grows.[^7]
- [Gigablast](https://gigablast.com/): It's been around for a while and also sports a classic web directory. Searches are a bit slow, and it charges to submit sites for crawling. It powers [Private.sh](https://private.sh). Gigablast is tied with Right Dao for quality.
[Right Dao](https://rightdao.com)
: Very fast, good results. Passes the tests fairly well. It plans on including query-based ads if/when its user base grows.[^7]
- [Alexandria](https://www.alexandria.org/): A pretty new "non-profit, ad free" engine, with [freely-licensed code](https://github.com/alexandria-org/alexandria). Surprisingly good at finding recent pages. Its index is built from the Common Crawl; it isn't as big as Gigablast or Right Dao but its ranking is great.
[Gigablast](https://gigablast.com/)
: It's been around for a while and also sports a classic web directory. Searches are a bit slow, and it charges to submit sites for crawling. It powers [Private.sh](https://private.sh). Gigablast is tied with Right Dao for quality.
- [Yep](https://yep.com/): an ambitious engine from Ahrefs, an SEO/backlink-finder company, that "shares ad profit with creators and protects your privacy". Most engines show results that include keywords from or related to the query; Yep also shows results linked by pages containing the query. In other words, not all results contain relevant keywords. This makes it excellent for less precise searches and discovery of "related sites", especially with its index of _hundreds of billions of pages._ It's far worse at finding very specific information or recent events for now, but it will probably improve.
[Alexandria](https://www.alexandria.org/)
: A pretty new "non-profit, ad free" engine, with [freely-licensed code](https://github.com/alexandria-org/alexandria). Surprisingly good at finding recent pages. Its index is built from the Common Crawl; it isn't as big as Gigablast or Right Dao but its ranking is great.
Yep supports Open Graph and some JSON-LD at the moment. A look through the source code for Alexandria and Gigablast didn't seem to reveal the use of any structured data.
[Yep](https://yep.com/)
: An ambitious engine from Ahrefs, an SEO/backlink-finder company, that "shares ad profit with creators and protects your privacy". Most engines show results that include keywords from or related to the query; Yep also shows results linked by pages containing the query. In other words, not all results contain relevant keywords. This makes it excellent for less precise searches and discovery of "related sites", especially with its index of _hundreds of billions of pages._ It's far worse at finding very specific information or recent events for now, but it will probably improve. It was known as "FairSearch" before its official launch.
[SeSe Engine](https://sese.yyj.moe/)
: Although it's a Chinese engine, its index seems to have a large-enough proportion of English content to fit here. The engine is open-source; see the [SeSe back-end Python code](https://github.com/RimoChan/sese-engine) and [the SeSe-ui Vue-based front-end](https://github.com/YunYouJun/sese-engine-ui). It has surprisingly good results for such a low-budget project. Each result is annotated with detailed ranking metadata such as keyword relevance and backlink weight. Discovered in my access logs.
Yep supports Open Graph and some JSON-LD at the moment. A look through the source code for Alexandria and Gigablast didn't seem to reveal the use of any structured data. The surprising quality of results from SeSe and Right Dao seems influenced by the crawlers' high-quality starting location: Wikipedia.
### Smaller indexes, hit-and-<wbr />miss {#smaller-indexes-hit-and-miss}
These engines fail badly at a few important tests. Otherwise, they seem to work well enough.
- [seekport](http://www.seekport.com/): The interface is in German but it supports searching in English just fine. The default language is selected by your locale. It's really good considering its small index; it hasn't heard of less common terms (e.g. "Seirdy"), but it's able to find relevant results in other tests.
- [Exalead](https://www.exalead.com/search/): slow, quality is hit-and-miss. Its indexer claims to crawl the DMOZ directory, which has since shut down and been replaced by the [Curlie](https://curlie.org) directory. No relevant results for "Oppenheimer" and some other history-related queries. Allows submitting individual URLs for indexing, but requires solving a Google reCAPTCHA and entering an email address.
[seekport](http://www.seekport.com/)
: The interface is in German but it supports searching in English just fine. The default language is selected by your locale. It's really good considering its small index; it hasn't heard of less common terms (e.g. "Seirdy"), but it's able to find relevant results in other tests.
- [ExactSeek](https://www.exactseek.com/): small index, disproportionately dominated by big sites. Failed multiple tests. Allows submitting individual URLs for crawling, but requires entering an email address and receiving a newsletter. Webmaster tools seem to heavily push for paid <abbr title="search-engine optimization">SEO</abbr> options. It also powers SitesOnDisplay and [Blog-<wbr />search.com](https://www.blog-search.com).
[Exalead](https://www.exalead.com/search/)
: Slow, quality is hit-and-miss. Its indexer claims to crawl the DMOZ directory, which has since shut down and been replaced by the [Curlie](https://curlie.org) directory. No relevant results for "Oppenheimer" and some other history-related queries. Allows submitting individual URLs for indexing, but requires solving a Google reCAPTCHA and entering an email address.
- [Infotiger](https://alpha.infotiger.com/): A small index that seems to find relevant results. It allows site submission for English and German pages. It also features a "similarity" search to query pages similar to a given link, with mixed results.
[ExactSeek](https://www.exactseek.com/)
: Small index, disproportionately dominated by big sites. Failed multiple tests. Allows submitting individual URLs for crawling, but requires entering an email address and receiving a newsletter. Webmaster tools seem to heavily push for paid <abbr title="search-engine optimization">SEO</abbr> options. It also powers SitesOnDisplay and [Blog-<wbr />search.com](https://www.blog-search.com).
- [Burf.co](https://burf.co/): Very small index, but seems fine at ranking more relevant results higher. Allows site submission without any extra steps.
[Infotiger](https://alpha.infotiger.com/)
: A small index that seems to find relevant results. It allows site submission for English and German pages. It also features a "similarity" search to query pages similar to a given link, with mixed results.
- [Entfer](https://entfer.com/): a newcomer that lets registered users upvote/downvote search results to customize ranking. Doesn't offer much information about who made it. Its index is small, but it does seem to return results related to the query.
[Burf.co](https://burf.co/)
: Very small index, but seems fine at ranking more relevant results higher. Allows site submission without any extra steps.
- [Siik](https://siik.co/): Lacks contact info, and the ToS and Privacy Policy links are dead. Seems to have PHP errors in the backend for some of its instant-answer widgets. If you scroll past all that, it does have web results powered by what seems to be its own index. These results do tend to be somewhat relevant, but the index seems too small for more specific queries.
[Entfer](https://entfer.com/)
: a newcomer that lets registered users upvote/downvote search results to customize ranking. Doesn't offer much information about who made it. Its index is small, but it does seem to return results related to the query.
- [websearchengine.org](https://websearchengine.org) and [tuxdex.com](https://tuxdex.com): Both are run by the same people, powered by their [inetdex.com](https://inetdex.com) index. Searches are fast, but crawls are a bit shallow. Claims to have an index of 10 million domains, and not to use cookies.
[Siik](https://siik.co/)
: Lacks contact info, and the ToS and Privacy Policy links are dead. Seems to have PHP errors in the backend for some of its instant-answer widgets. If you scroll past all that, it does have web results powered by what seems to be its own index. These results do tend to be somewhat relevant, but the index seems too small for more specific queries.
- [ChatNoir](https://www.chatnoir.eu/): An experimental engine by researchers that uses the [Common Crawl](https://commoncrawl.org/) index. The engine is [open source](https://github.com/chatnoir-eu). See the [announcement](https://groups.google.com/g/common-crawl/c/3o2dOHpeRxo/m/H2Osqz9dAAAJ) on the Common Crawl mailing list (Google Groups).
[websearchengine.org](https://websearchengine.org) OR [tuxdex.com](https://tuxdex.com)
: Both are run by the same people, powered by their [inetdex.com](https://inetdex.com) index. Searches are fast, but crawls are a bit shallow. Claims to have an index of 10 million domains, and not to use cookies.
- [Secret Search Engine Labs](http://www.secretsearchenginelabs.com/): Very small index with very little SEO spam; it toes the line between a "search engine" and a "surf engine". It's best for reading about broad topics that would otherwise be dominated by SEO spam, thanks to its [CashRank algorithm](http://www.secretsearchenginelabs.com/tech/cashrank.php). Allows site submission.
[ChatNoir](https://www.chatnoir.eu/)
: An experimental engine by researchers that uses the [Common Crawl](https://commoncrawl.org/) index. The engine is [open source](https://github.com/chatnoir-eu). See the [announcement](https://groups.google.com/g/common-crawl/c/3o2dOHpeRxo/m/H2Osqz9dAAAJ) on the Common Crawl mailing list (Google Groups).
[Secret Search Engine Labs](http://www.secretsearchenginelabs.com/)
: Very small index with very little SEO spam; it toes the line between a "search engine" and a "surf engine". It's best for reading about broad topics that would otherwise be dominated by SEO spam, thanks to its [CashRank algorithm](http://www.secretsearchenginelabs.com/tech/cashrank.php). Allows site submission.
### Unusable engines, irrelevant results
Results from these search engines don't seem at all useful.
- [Yessle](https://www.yessle.com/): seems new; allows page submission by pasting a page into the search box. Index is really small but it crawls new sites quickly. Claims to be private.
- [Bloopish](https://search.aibull.io/): extremely quick to update its index; site submissions show up in seconds. Unfortunately, its index only contains a few thousand documents (under 100 thousand at the time of writing). It's growing fast: if you search for a term, it'll start crawling related pages and grow its index.
[Yessle](https://www.yessle.com/)
: Seems new; allows page submission by pasting a page into the search box. Index is really small but it crawls new sites quickly. Claims to be private.
- YaCy: community-made index; slow. Results are awful/irrelevant, but can be useful for intranet or custom search.
[Bloopish](https://search.aibull.io/)
: Extremely quick to update its index; site submissions show up in seconds. Unfortunately, its index only contains a few thousand documents (under 100 thousand at the time of writing). It's growing fast: if you search for a term, it'll start crawling related pages and grow its index.
- Scopia: only seems to be available via the [MetaGer](https://metager.org) metasearch engine after turning off Bing and news results. Tiny index, very low-quality.
YaCy
: Community-made index; slow. Results are awful/irrelevant, but can be useful for intranet or custom search.
- [Artado Search](https://www.artadosearch.com/): Primarily Turkish, but it also seems to support English results. Like Plumb, it uses client-side JS to fetch results from existing engines (Google, Bing, Yahoo, Petal, and others); like MetaGer, it has an option to use its own independent index. Results from its index are almost always empty. Very simple queries ("twitter", "wikipedia", "reddit") give some answers. Supports site submission and crowdsourced instant answers.
Scopia
: only seems to be available via the [MetaGer](https://metager.org) metasearch engine after turning off Bing and news results. Tiny index, very low-quality.
- [Active Search Results](https://www.activesearchresults.com): very poor quality
[Artado Search](https://www.artadosearch.com/)
: Primarily Turkish, but it also seems to support English results. Like Plumb, it uses client-side JS to fetch results from existing engines (Google, Bing, Yahoo, Petal, and others); like MetaGer, it has an option to use its own independent index. Results from its index are almost always empty. Very simple queries ("twitter", "wikipedia", "reddit") give some answers. Supports site submission and crowdsourced instant answers.
- [Crawlson](https://www.crawlson.com): young, slow. In this category because its index has a cap of 10 URLs per domain. I initially discovered Crawlson in the seirdy.one access logs.
[Active Search Results](https://www.activesearchresults.com)
: Very poor quality. Results seem highly biased towards commercial sites.
- [Anoox](https://www.anoox.com/): Results are few and irrelevant; fails to find any results for basic terms. Allows site submission. It's also a lightweight social network and claims to be powered by its users, letting members vote on listings to alter rankings.
[Crawlson](https://www.crawlson.com)
: Young, slow. In this category because its index has a cap of 10 URLs per domain. I initially discovered Crawlson in the seirdy.one access logs.
- [Yioop!](https://www.yioop.com): A FLOSS search engine that boasts a very impressive [feature-set](https://www.seekquarry.com/): it can parse sitemaps, feeds, and a variety of markup formats; it can import pre-curated data in forms such as access logs, Usenet posts, and WARC archives; it also supports feed-based news search. Despite the impressive feature set, Yioop's results are few and irrelevant due to its small index. It allows submitting sites for crawling. Like Meorca, Yioop has social features such as blogs, wikis, and a chat bot API.
[Anoox](https://www.anoox.com/)
: Results are few and irrelevant; fails to find any results for basic terms. Allows site submission. It's also a lightweight social network and claims to be powered by its users, letting members vote on listings to alter rankings.
[Yioop!](https://www.yioop.com)
: A FLOSS search engine that boasts a very impressive [feature-set](https://www.seekquarry.com/): it can parse sitemaps, feeds, and a variety of markup formats; it can import pre-curated data in forms such as access logs, Usenet posts, and WARC archives; it also supports feed-based news search. Despite the impressive feature set, Yioop's results are few and irrelevant due to its small index. It allows submitting sites for crawling. Like Meorca, Yioop has social features such as blogs, wikis, and a chat bot API.
### Semi-independent indexes
Engines in this category fall back to GBY when their own indexes don't have enough results. As their own indexes grow, some claim that this should happen less often.
- [Brave Search](https://search.brave.com/): Many tests (including all the tests I listed in the "Methodology" section) resulted results identical to Google, revealed by a side-by-side comparison with Google, Startpage, and a Searx instance with only Google enabled. Brave claims that this is due to how Cliqz (the discontinued engine acquired by Brave) used query logs to build its page models and was optimized to match Google.[^8] The index is independent, but optimizing against Google resulted in too much similarity for the real benefit of an independent index to show. Furthermore, many queries have Bing results mixed in; users can click an "info" button to see the percentage of results that came from its own index. The independent percentage is typically quite high (often close to 100% independent) but can drop for advanced queries.
- [Plumb](https://plumb.one/): Almost all queries return no results; when this happens, it falls back to Google. It's fairly transparent about the fallback process, but I'm concerned about _how_ it does this: it loads Google's Custom Search scripts from `cse.google.com` onto the page to do a client-side Google search. This can be mitigated by using a browser addon to block `cse.google.com` from loading any scripts. Plumb claims that this is a temporary measure while its index grows, and they're planning on getting rid of this. Allows submitting URLs, but requires solving an hCaptcha. This engine is very new; hopefully as it improves, it could graduate from this section. Its Chief Product Officer [previously founded](https://archive.is/oVAre) the Gibiru search engine which shares the same affiliates and (for now) the same index; the indexes will diverge with time.
[Brave Search](https://search.brave.com/)
: Many tests (including all the tests I listed in the "Methodology" section) resulted results identical to Google, revealed by a side-by-side comparison with Google, Startpage, and a Searx instance with only Google enabled. Brave claims that this is due to how Cliqz (the discontinued engine acquired by Brave) used query logs to build its page models and was optimized to match Google.[^8] The index is independent, but optimizing against Google resulted in too much similarity for the real benefit of an independent index to show. Furthermore, many queries have Bing results mixed in; users can click an "info" button to see the percentage of results that came from its own index. The independent percentage is typically quite high (often close to 100% independent) but can drop for advanced queries.
- [Neeva](https://neeva.com): Combines Bing results with results from its own index. Bing normally isn't okay with this, but Neeva is one of few exceptions. As of right now, results are mostly identical to Bing but original links not found by Bing frequently pop up. Long and esoteric queries are less likely to feature original results. Requires signing up with an email address or OAuth to use, and offers a paid tier with additional benefits.
[Plumb](https://plumb.one/)
: Almost all queries return no results; when this happens, it falls back to Google. It's fairly transparent about the fallback process, but I'm concerned about _how_ it does this: it loads Google's Custom Search scripts from `cse.google.com` onto the page to do a client-side Google search. This can be mitigated by using a browser addon to block `cse.google.com` from loading any scripts. Plumb claims that this is a temporary measure while its index grows, and they're planning on getting rid of this. Allows submitting URLs, but requires solving an hCaptcha. This engine is very new; hopefully as it improves, it could graduate from this section. Its Chief Product Officer [previously founded](https://archive.is/oVAre) the Gibiru search engine which shares the same affiliates and (for now) the same index; the indexes will diverge with time.
- [Qwant](https://www.qwant.com): Qwant claims to use its own index, but it still relies on Bing for most results. It seems to be in a position similar to Neeva. Try a side-by-side comparison to see if or how it compares with Bing.
[Neeva](https://neeva.com)
: Combines Bing results with results from its own index. Bing normally isn't okay with this, but Neeva is one of few exceptions. As of right now, results are mostly identical to Bing but original links not found by Bing frequently pop up. Long and esoteric queries are less likely to feature original results. Requires signing up with an email address or OAuth to use, and offers a paid tier with additional benefits.
- [Kagi Search](https://kagi.com/): The most interesting entry in this category, IMO. Like Neeva, it requires an account; it will eventually require payment. It's powered by its own Teclis index (Teclis can be used independently; see the [non-commercial section](#small-or-non-commercial-web) below), and claims to also use results from Google and Bing. The result seems somewhat unique: I'm able to recognize some results from the Teclis index mixed in with the mainstream ones. In addition to Teclis, Kagi's other products include the [Kagi.ai](https://kagi.ai/) intelligent answer service and the [TinyGem](https://tinygem.org/) social bookmarking service, both of which play a role in Kagi.com in the present or future.
[Qwant](https://www.qwant.com)
: Qwant claims to use its own index, but it still relies on Bing for most results. It seems to be in a position similar to Neeva. Try a side-by-side comparison to see if or how it compares with Bing.
[Kagi Search](https://kagi.com/)
: The most interesting entry in this category, IMO. Like Neeva, it requires an account; it will eventually require payment. It's powered by its own Teclis index (Teclis can be used independently; see the [non-commercial section](#small-or-non-commercial-web) below), and claims to also use results from Google and Bing. The result seems somewhat unique: I'm able to recognize some results from the Teclis index mixed in with the mainstream ones. In addition to Teclis, Kagi's other products include the [Kagi.ai](https://kagi.ai/) intelligent answer service and the [TinyGem](https://tinygem.org/) social bookmarking service, both of which play a role in Kagi.com in the present or future.
Non-generalist search
---------------------
@ -184,35 +222,50 @@ These indexing search engines dont have a Google-like “ask me anything” e
### Small or non-commercial Web
- [Marginalia Search](https://search.marginalia.nu/): _My favorite entry on this page_. It has its own crawler but is strongly biased towards non-commercial, personal, and/or minimal sites. It's a great response to the increasingly SEO-spam-filled SERPs of GBY. Partially powers Teclis, which in turn partially powers Kagi. <ins cite="https://memex.marginalia.nu/log/58-marginalia-open-source.gmi" datetime="2022-05-28T14:09:00-07:00">Update 2022-05-28: [Marginalia.nu is now open source.](https://memex.marginalia.nu/log/58-marginalia-open-source.gmi)</ins>
- [Teclis](http://teclis.com/): A project by the creator of Kagi search. Uses its own crawler that measures content blocked by uBlock Origin, and extracts content with the open-source article scrapers Trafilatura and Readability.js. This is quite an interesting approach: tracking blocked elements discourages tracking and advertising; using Trafilatura and Readability.js encourages the use of semantic HTML and Semantic Web standards such as [microformats](https://microformats.org/), [microdata](https://html.spec.whatwg.org/multipage/microdata.html), and [RDFa](https://www.w3.org/TR/rdfa-primer/). It claims to also use some results from Marginalia.
[Marginalia Search](https://search.marginalia.nu/)
: _My favorite entry on this page_. It has its own crawler but is strongly biased towards non-commercial, personal, and/or minimal sites. It's a great response to the increasingly SEO-spam-filled SERPs of GBY. Partially powers Teclis, which in turn partially powers Kagi. <ins cite="https://memex.marginalia.nu/log/58-marginalia-open-source.gmi" datetime="2022-05-28T14:09:00-07:00">Update 2022-05-28: [Marginalia.nu is now open source.](https://memex.marginalia.nu/log/58-marginalia-open-source.gmi)</ins>
[Teclis](http://teclis.com/)
: A project by the creator of Kagi search. Uses its own crawler that measures content blocked by uBlock Origin, and extracts content with the open-source article scrapers Trafilatura and Readability.js. This is quite an interesting approach: tracking blocked elements discourages tracking and advertising; using Trafilatura and Readability.js encourages the use of semantic HTML and Semantic Web standards such as [microformats](https://microformats.org/), [microdata](https://html.spec.whatwg.org/multipage/microdata.html), and [RDFa](https://www.w3.org/TR/rdfa-primer/). It claims to also use some results from Marginalia.
### Site finders
These engines try to find a website, typically at the domain-name level. They don't focus on capturing particular pages within websites.
- [Kozmonavt](https://kozmonavt.ml/): The best in this category. Has a small but growing index of over 8 million sites. If I want to find the website for a certain project, Kozmonavt works well (provided its index has crawled said website). It works poorly for learning things and finding general information. I cannot recommend it for anything serious since it lacks contact information, a privacy policy, or any other information about the org/people who made it. Discovered in the seirdy.one access logs.
- [search.tl](http://www.search.tl/): Generalist search for one <abbr title="top-level domain">TLD</abbr> at a time (defaults to .com). I'm not sure why you'd want to always limit your searches to a single TLD, but now you can.[^9] There isn't any visible UI for changing the TLD for available results; you need to add/change the `tld` URL parameter. For example, to search .org sites, append `&tld=org` to the URL. It seems to be connected to [Amidalla](http://www.amidalla.de/). Amidalla allows users to manually add URLs to its index and directory; I have yet to see if doing so impacts search.tl results.
[Kozmonavt](https://kozmonavt.ml/)
: The best in this category. Has a small but growing index of over 8 million sites. If I want to find the website for a certain project, Kozmonavt works well (provided its index has crawled said website). It works poorly for learning things and finding general information. I cannot recommend it for anything serious since it lacks contact information, a privacy policy, or any other information about the org/people who made it. Discovered in the seirdy.one access logs.
- [Thunderstone](https://search.thunderstone.com/): A combined website catalog and search engine that focuses on categorization. Its [about page](https://search.thunderstone.com/texis/websearch19/about.html) claims: <q cite="https://search.thunderstone.com/texis/websearch19/about.html">We continuously survey all primary COM, NET, and ORG web-servers and distill their contents to produce this database. This is an index of _sites_ not pages. It is very good at finding companies and organizations by purpose, product, subject matter, or location. If you're trying to finding things like _'BillyBob's personal beer can page on AOL'_, try Yahoo or Dogpile.</q> This seems to be the polar opposite of the engines in the ["small or non-commercial Web" category](#small-or-non-commercial-web).
[search.tl](http://www.search.tl/)
: Generalist search for one <abbr title="top-level domain">TLD</abbr> at a time (defaults to .com). I'm not sure why you'd want to always limit your searches to a single TLD, but now you can.[^9] There isn't any visible UI for changing the TLD for available results; you need to add/change the `tld` URL parameter. For example, to search .org sites, append `&tld=org` to the URL. It seems to be connected to [Amidalla](http://www.amidalla.de/). Amidalla allows users to manually add URLs to its index and directory; I have yet to see if doing so impacts search.tl results.
- [sengine.info](https://www.sengine.info/): only shows domains, not individual pages. Developed by netEstate GmbH, which specializes in content extraction for inprints and job ads. Also has a German-only version available. Discovered in my access logs.
[Thunderstone](https://search.thunderstone.com/)
: A combined website catalog and search engine that focuses on categorization. Its [about page](https://search.thunderstone.com/texis/websearch19/about.html) claims: <q cite="https://search.thunderstone.com/texis/websearch19/about.html">We continuously survey all primary COM, NET, and ORG web-servers and distill their contents to produce this database. This is an index of _sites_ not pages. It is very good at finding companies and organizations by purpose, product, subject matter, or location. If you're trying to finding things like _'BillyBob's personal beer can page on AOL'_, try Yahoo or Dogpile.</q> This seems to be the polar opposite of the engines in the ["small or non-commercial Web" category](#small-or-non-commercial-web).
- [Gnomit](https://www.gnomit.com/): Allows single-keyword queries and returns sites that seem to cover a related topic. I actually kind of enjoy using it; results are old (typically from 2009) and a bit random, but make for a nice way to discover something new. For instance, searching for "IRC" helped me discover new IRC networks I'd never heard of.
[sengine.info](https://www.sengine.info/)
: Only shows domains, not individual pages. Developed by netEstate GmbH, which specializes in content extraction for inprints and job ads. Also has a German-only version available. Discovered in my access logs.
[Gnomit](https://www.gnomit.com/)
: Allows single-keyword queries and returns sites that seem to cover a related topic. I actually kind of enjoy using it; results are old (typically from 2009) and a bit random, but make for a nice way to discover something new. For instance, searching for "IRC" helped me discover new IRC networks I'd never heard of.
### Other
- [Keybot](https://www.keybot.com/): A must-have for anyone who does translation work. It crawls the web looking for multilingual websites. Translators who are unsure about how to translate a given word or phrase can see its usage in two given languages, to learn from other human translators. My parents are fluent English speakers but sometimes struggle to express a given Hindi idiom in English; something like this could be useful to them, since machine translation isn't nuanced enough for every situation. Part of the [TTN Translation Network](https://www.ttn.ch/). Discovered in my access logs.
- Quor: Seems to mainly index large news sites. Site is down as of June 2021; originally available at www dot quor dot com.
[Keybot](https://www.keybot.com/)
: A must-have for anyone who does translation work. It crawls the web looking for multilingual websites. Translators who are unsure about how to translate a given word or phrase can see its usage in two given languages, to learn from other human translators. My parents are fluent English speakers but sometimes struggle to express a given Hindi idiom in English; something like this could be useful to them, since machine translation isn't nuanced enough for every situation. Part of the [TTN Translation Network](https://www.ttn.ch/). Discovered in my access logs.
- [Semantic Scholar](https://www.semanticscholar.org/): a search engine by the Allen Institute for AI focused on academic PDFs, with a couple hundred million papers indexed. Discovered in my access logs.
Quor
: Seems to mainly index large news sites. Site is down as of June 2021; originally available at www dot quor dot com.
- [Bonzamate](https://bonzamate.com.au/): a search engine specifically for Australian websites. Boyter wrote [an interesting blog post about Bonzamate](https://boyter.org/posts/abusing-aws-to-make-a-search-engine/).
[Semantic Scholar](https://www.semanticscholar.org/)
: A search engine by the Allen Institute for AI focused on academic PDFs, with a couple hundred million papers indexed. Discovered in my access logs.
- [searchcode](https://searchcode.com/): A code-search engine by the developer of Bonzamate. Searches a hand-picked list of code forges for source code, supporting many search operators.
[Bonzamate](<https://bonzamate.com.au/>)
: A search engine specifically for Australian websites. Boyter wrote [an interesting blog post about Bonzamate](https://boyter.org/posts/abusing-aws-to-make-a-search-engine/).
[searchcode](https://searchcode.com/)
: A code-search engine by the developer of Bonzamate. Searches a hand-picked list of code forges for source code, supporting many search operators.
Other languages
---------------
@ -264,20 +317,25 @@ Almost qualified
These engines come close enough to passing my inclusion criteria that I felt I had to mention them. They all display original organic results that you can't find on other engines, and maintain their own indexes. Unfortunately, they don't quite pass.
- Wiby: [wiby.me](https://wiby.me) and [wiby.org](https://wiby.org): I love this one. It focuses on smaller independent sites that capture the spirit of the "early" web. It's more focused on "discovering" new interesting pages that match a set of keywords than finding a specific resources. I like to think of Wiby as an engine for surfing, not searching. Runnaroo occasionally features a hit from Wiby. If you have a small site or blog that isn't very "commercial", consider submitting it to the index. Does not qualify because it seems to be powered only by user-submitted sites; it doesn't try to "crawl the Web".
- [Mwmbl](https://mwmbl.org/): like YaCy, it's an open-source engine whose crawling is community-driven. Users can install a Firefox addon to crawl pages in its backlog. Unfortunately, it doesn't qualify because it only crawls pages linked by hand-picked sites (e.g. Wikipedia, GitHub, domains that rank well on Hacker News). The crawl-depth is "1", so it doesn't crawl the whole Web (yet).
[wiby.me](https://wiby.me) OR [wiby.org](https://wiby.org)
: I love this one. It focuses on smaller independent sites that capture the spirit of the "early" web. It's more focused on "discovering" new interesting pages that match a set of keywords than finding a specific resources. I like to think of Wiby as an engine for surfing, not searching. Runnaroo occasionally features a hit from Wiby. If you have a small site or blog that isn't very "commercial", consider submitting it to the index. Does not qualify because it seems to be powered only by user-submitted sites; it doesn't try to "crawl the Web".
- [Search My Site](https://searchmysite.net): Similar to Marginalia and Teclis, but only indexes user-submitted personal and independent sites. It optionally supports IndieAuth. Its API powers this site's search results; try it out using the search bar at the bottom of this page. Does not qualify because it's limited to user-submitted and/or hand-picked sites.
[Mwmbl](https://mwmbl.org/)
: Like YaCy, it's an open-source engine whose crawling is community-driven. Users can install a Firefox addon to crawl pages in its backlog. Unfortunately, it doesn't qualify because it only crawls pages linked by hand-picked sites (e.g. Wikipedia, GitHub, domains that rank well on Hacker News). The crawl-depth is "1", so it doesn't crawl the whole Web (yet).
[Search My Site](https://searchmysite.net)
: Similar to Marginalia and Teclis, but only indexes user-submitted personal and independent sites. It optionally supports IndieAuth. Its API powers this site's search results; try it out using the search bar at the bottom of this page. Does not qualify because it's limited to user-submitted and/or hand-picked sites.
Misc
----
- Ask.com: The site is back. They claim to outsource search results. The results seem similar to Google, Bing, and Yandex; however, I cant pinpoint exactly where their results are coming from. Also, several sites from the "ask.com network" such as directhit.com, info.com, and kensaq.com have uniqe-looking results.
- Not evaluated: Apple's search. It's only accessible through a search widget in iOS and macOS and shows very few results. This might change; see the next section.
Ask.com
: The site is back. They claim to outsource search results. The results seem similar to Google, Bing, and Yandex; however, I cant pinpoint exactly where their results are coming from. Also, several sites from the "ask.com network" such as directhit.com, info.com, and kensaq.com have uniqe-looking results.
- Partially evaluated: [Infinity Search](https://infinitysearch.co): young, small index. It recently split into a paid offering with the main index and [Infinity Decentralized](https://infinitydecentralized.com/), the latter of which allows users to select from community-hosted crawlers. I managed to try it out before it became a paid offering, and it seemed decent; however, I wasn't able to run the tests listed in the "Methodology" section. Allows submitting URLs and sitemaps into a text box, no other work required.
[Infinity Search](https://infinitysearch.co)
: Partially evaluated. Young, small index. It recently split into a paid offering with the main index and [Infinity Decentralized](https://infinitydecentralized.com/), the latter of which allows users to select from community-hosted crawlers. I managed to try it out before it became a paid offering, and it seemed decent; however, I wasn't able to run the tests listed in the "Methodology" section. Allows submitting URLs and sitemaps into a text box, no other work required.
Search engines without a web interface
--------------------------------------
@ -293,13 +351,18 @@ Graveyard
These engines were originally included in the article, but have since been discontinued.
- [wbsrch](https://xangis.com/the-wbsrch-experiment/): In addition to its generalist search, it also had many other utilities related to domain name statistics. Failed multiple tests. Its index was a bit dated; it had an old backlog of sites it hadn't finished indexing. It also had several dedicated per-language indexes.
- [Gowiki](https://web.archive.org/web/20211226043304/https://www.gowiki.com/): Very young, small index, but showed promise. I discovered this in the seirdy.one access logs. It was only available in the US. Seems down as of early 2022.
[wbsrch](https://xangis.com/the-wbsrch-experiment/)
: In addition to its generalist search, it also had many other utilities related to domain name statistics. Failed multiple tests. Its index was a bit dated; it had an old backlog of sites it hadn't finished indexing. It also had several dedicated per-language indexes.
- [Meorca](https://web.archive.org/web/20220429143153/https://www.meorca.com/search/): A UK-based search engine that claims not to "index pornography or illegal content websites". It also features an optional social network ("blog"). Discovered in the seirdy.one access logs.
[Gowiki](https://web.archive.org/web/20211226043304/https://www.gowiki.com/)
: Very young, small index, but showed promise. I discovered this in the seirdy.one access logs. It was only available in the US. Seems down as of early 2022.
- [Ninfex](https://web.archive.org/web/20220624172257/https://ninfex.com/): a "people-powered" search engine that combines aspects of link aggregators and search. It lets users vote on submissions and it also displays links to forums about submissions.
[Meorca](https://web.archive.org/web/20220429143153/https://www.meorca.com/search/)
: A UK-based search engine that claims not to "index pornography or illegal content websites". It also features an optional social network ("blog"). Discovered in the seirdy.one access logs.
[Ninfex](https://web.archive.org/web/20220624172257/https://ninfex.com/)
: A "people-powered" search engine that combines aspects of link aggregators and search. It lets users vote on submissions and it also displays links to forums about submissions.
Exclusions
----------

View file

@ -813,7 +813,7 @@ Some users have trouble when controls have a different look, color, or shape tha
</blockquote>
{{< quotecaption partOfType="TechArticle" >}}
<cite itemprop="name headline">Making Content Usable for People with Cognitive and Learning Disabilities</cite>, section 4.2.5.3:
<cite itemprop="name headline" class="p-name">Making Content Usable for People with Cognitive and Learning Disabilities</cite>, section 4.2.5.3:
<a href="https://www.w3.org/TR/coga-usable/#how-it-helps-3">Clearly Identify Controls and Their Use: How it Helps</a>
{{< /quotecaption >}}
{{</quotation>}}
@ -842,7 +842,7 @@ When this is not possible, provide instructions that explain how to use the cont
</blockquote>
{{< quotecaption partOfType="TechArticle" >}}
<cite itemprop="name headline">Making Content Usable for People with Cognitive and Learning Disabilities</cite>, section 4.2.5.2:
<cite itemprop="name headline" class="p-name">Making Content Usable for People with Cognitive and Learning Disabilities</cite>, section 4.2.5.2:
<a href="https://www.w3.org/TR/coga-usable/#what-to-do-3">Clearly Identify Controls and Their Use: What to Do</a>
{{< /quotecaption >}}
{{</quotation>}}
@ -1169,7 +1169,7 @@ This includes:
</blockquote>
{{< quotecaption partOfType="TechArticle" >}}
<cite itemprop="name headline">Making Content Usable for People with Cognitive and Learning Disabilities</cite>, section 4.3.1:
<cite itemprop="name headline" class="p-name">Making Content Usable for People with Cognitive and Learning Disabilities</cite>, section 4.3.1:
<a href="https://www.w3.org/TR/coga-usable/#what-to-do-6">Make it Easy to Find the Most Important Tasks and Features of the Site</a>
{{< /quotecaption >}}
{{</quotation>}}

View file

@ -2,6 +2,7 @@
{{- $opus := resources.GetMatch (printf "/a/%s.opus" $name) -}}
{{- $mp3 := resources.GetMatch (printf "/a/%s.mp3" $name) -}}
{{- $isTranscribed := false -}}
{{- $download_url := "" -}}
{{- /* preloading an in-page audio asset under 32 KiB is fine. */ -}}
{{- $shouldPreload := lt (len $opus.Content) 32768 -}}
<audio
@ -19,21 +20,24 @@
{{- end -}}
>
{{ with $opus -}}
{{- $download_url = $opus.RelPermalink -}}
{{ $opus_src := . | resources.Fingerprint "md5" -}}
<source
src="{{ $opus_src.RelPermalink }}"
type='audio/ogg; codecs="opus"' />
{{ end -}}
{{ with $mp3 -}}
{{- $download_url = $mp3.RelPermalink -}}
{{ $mp3_src := . | resources.Fingerprint "md5" -}}
<source
src="{{ $mp3_src.RelPermalink }}"
type="audio/mpeg" />
<p role="note">Your browser does not support HTML5 audio. Heres a <a
href="{{ $mp3_src.RelPermalink }}"
{{ if $isTranscribed -}}
itemprop="audio contentUrl"
{{- end -}}
>link to download the audio file <samp translate="no">{{ $name }}.mp3</samp></a> instead.</p>
<p role="note">Your browser does not support HTML5 audio.</p>
{{ end -}}
</audio>{{- /* Strip trailing newline: https://github.com/gohugoio/hugo/issues/1753 */ -}}
</audio>
<a
href="{{ $download_url }}" download=""
{{ if $isTranscribed -}}
itemprop="audio contentUrl"
{{- end -}}>Download audio file <samp translate="no">{{ $name }}.mp3</samp></a>
{{- /* Strip trailing newline: https://github.com/gohugoio/hugo/issues/1753 */ -}}

View file

@ -44,14 +44,15 @@
"ignore": [
"time",
"picture",
"meta[name=color-scheme]",
"meta[name=theme-color]",
"img[decoding]",
"a[download]",
"a[referrerpolicy]",
"code[translate]",
"a[translate]",
"code[translate]",
"samp[translate]",
"span[translate]",
"samp[translate]"
"img[decoding]",
"meta[name=color-scheme]",
"meta[name=theme-color]"
]
}
],