Typo

Fix webrings and webmentions
Update info on existing search engines
2024-11-27 22:12:10 +00:00 · 2022-09-16 21:20:56 -07:00 · 2022-09-16 20:24:43 -07:00 · 2022-09-13 22:18:23 -07:00
4 changed files with 21 additions and 21 deletions
--- a/content/posts/search-engines-with-own-indexes.gmi
+++ b/content/posts/search-engines-with-own-indexes.gmi
@ -79,7 +79,7 @@ Google, Bing, and Yandex support structured data such as microformats1, microdat

 These engines pass most of the tests listed in the "methodology" section. All of them seem relatively privacy-friendly. I wouldn't recommend using these engines to find specific answers; they're better for learning about a topic by finding interesting pages related to a set of keywords.

-* Right Dao: very fast, good results. Passes the tests fairly well. It plans on including query-based ads if/when its userbase grows.⁸ For the past few months, its index seems to have focused more on large, established sites rather than smaller, independent ones.
+* Right Dao: very fast, good results. Passes the tests fairly well. It plans on including query-based ads if/when its userbase grows.⁸ For the past few months, its index seems to have focused more on large, established sites rather than smaller, independent ones. It seems to be a bit lacking in more recent pages.
 => https://rightdao.com  Right Dao

 * Gigablast: It’s been around for a while and also sports a classic web directory. Searches are a bit slow, and it charges to submit sites for crawling. It powers Private.sh. Gigablast is tied with Right Dao for quality.
@ -104,16 +104,17 @@ Yep supports Open Graph and some JSON-LD at the moment. A look through the sourc

 These engines fail badly at a few important tests. Otherwise, they seem to work well enough.

-* seekport: The interface is in German but it supports searching in English just fine. The default language is selected by your locale. It’s really good considering its small index; it hasn’t heard of less common terms (e.g. “Seirdy”), but it’s able to find relevant results in other tests. The server does not support TLS.
+* Infotiger: My favorite engine in this section. It offers advanced result filtering and sports a somewhat large index. It allows site submission for English and German pages. The fastest-improving engine in this section; I look forward to the day it "graduates" to the previous section. Infotiger also has a Tor hidden service.
+* seekport: The interface is in German but it supports searching in English just fine. The default language is selected by your locale. It’s really good considering its small index; it hasn’t heard of less common terms. but it’s able to find relevant results in other tests. It's the second-fastest-improving engines in this section.
 * Exalead: slow, quality is hit-and-miss. Its indexer claims to crawl the DMOZ directory, which has since shut down and been replaced by the Curlie directory. No relevant results for “Oppenheimer” and some other history-related queries. Allows submitting individual URLs for indexing, but requires solving a Google reCAPTCHA and entering an email address.
 * ExactSeek: small index, disproportionately dominated by big sites. Failed multiple tests. Allows submitting individual URLs for crawling, but requires entering an email address and receiving a newsletter. Webmaster tools seem to heavily push for paid SEO options. It also powers SitesOnDisplay and Blog-search.com.
-* Infotiger: A small index that seems to find relevant results. It allows site submission for English and German pages. It also features a "similarity" search to query pages similar to a given link, with mixed results.

+=> https://infotiger.com/ Infotiger
+=> http://infotiger4xywbfq45mvd5drh43jpqeurakg2ya7gqwvjf2bbwnixzqd.onion/ Infotiger hidden service
 => http://www.seekport.com/ seekport (HTTP only)
 => https://www.exalead.com/search/ Exalead
 => https://curlie.org Curlie
 => https://www.exactseek.com/ ExactSeek
-=> https://alpha.infotiger.com/ Infotiger

 * Burf.co: Very small index, but seems fine at ranking more relevant results higher. Allows site submission without any extra steps.
 * Entfer: a newcomer that lets registered users upvote/downvote search results to customize ranking. Doesn't offer much information about who made it. Its index is small, but it does seem to return results related to the query.
@ -380,7 +381,7 @@ I find new engines by:

 ### Criteria for inclusion

-Engines in this list should have their own indexes powered by by web crawlers. Original results should not be limited to a set of websites hand-picked by the engine creators; indexes should be built from sites from across the Web. An engine should discover new interesting places around the Web.
+Engines in this list should have their own indexes powered by web crawlers. Original results should not be limited to a set of websites hand-picked by the engine creators; indexes should be built from sites from across the Web. An engine should discover new interesting places around the Web.

 Here's an oversimplified example to illustrate what I'm looking for: imagine somone self-hosts their own personal or interest-specific website and happens to get some recognition. Could they get *automatically* discovered by your crawler, indexed, and included in the first page of results for a certain query?

--- a/content/posts/search-engines-with-own-indexes.md
+++ b/content/posts/search-engines-with-own-indexes.md
@ -130,11 +130,13 @@ Yep supports Open Graph and some JSON-LD at the moment. A look through the sourc

 ### Smaller indexes, hit-and-<wbr />miss {#smaller-indexes-hit-and-miss}

-These engines fail badly at a few important tests. Otherwise, they seem to work well enough.
+These engines fail badly at a few important tests. Otherwise, they seem to work well enough for users who'd like some more serendipity in less-specific searches.

+[Infotiger](https://alpha.infotiger.com/)
+: My favorite engine in this section. It offers advanced result filtering and sports a somewhat large index. It allows site submission for English and German pages. The fastest-improving engine in this section: I use it often to discover new sites, and look forward to the day it "graduates" to the previous section. [Infotier has a Tor hidden service](http://infotiger4xywbfq45mvd5drh43jpqeurakg2ya7gqwvjf2bbwnixzqd.onion/).

 [seekport](http://www.seekport.com/)
-: The interface is in German but it supports searching in English just fine. The default language is selected by your locale. It's really good considering its small index; it hasn't heard of less common terms (e.g. "Seirdy"), but it's able to find relevant results in other tests.
+: The interface is in German but it supports searching in English just fine. The default language is selected by your locale. It’s really good considering its small index; it hasn’t heard of less common terms. but it’s able to find relevant results in other tests. It's the second-fastest-improving engines in this section.

 [Exalead](https://www.exalead.com/search/)
 : Slow, quality is hit-and-miss. Its indexer claims to crawl the DMOZ directory, which has since shut down and been replaced by the [Curlie](https://curlie.org) directory. No relevant results for "Oppenheimer" and some other history-related queries. Allows submitting individual URLs for indexing, but requires solving a Google reCAPTCHA and entering an email address.
@ -142,9 +144,6 @@ These engines fail badly at a few important tests. Otherwise, they seem to work
 [ExactSeek](https://www.exactseek.com/)
 : Small index, disproportionately dominated by big sites. Failed multiple tests. Allows submitting individual URLs for crawling, but requires entering an email address and receiving a newsletter. Webmaster tools seem to heavily push for paid <abbr title="search-engine optimization">SEO</abbr> options. It also powers SitesOnDisplay and [Blog-<wbr />search.com](https://www.blog-search.com).

-[Infotiger](https://alpha.infotiger.com/)
-: A small index that seems to find relevant results. It allows site submission for English and German pages. It also features a "similarity" search to query pages similar to a given link, with mixed results.
-
 [Burf.co](https://burf.co/)
 : Very small index, but seems fine at ranking more relevant results higher. Allows site submission without any extra steps.

@ -409,7 +408,7 @@ I find new engines by:

 ### Criteria for inclusion

-Engines in this list should have their own indexes powered by by web crawlers. Original results should not be limited to a set of websites hand-picked by the engine creators; indexes should be built from sites from across the Web. An engine should discover new interesting places around the Web.
+Engines in this list should have their own indexes powered by web crawlers. Original results should not be limited to a set of websites hand-picked by the engine creators; indexes should be built from sites from across the Web. An engine should discover new interesting places around the Web.

 Here's an oversimplified example to illustrate what I'm looking for: imagine somone self-hosts their own personal or interest-specific website and happens to get some recognition. Could they get _automatically_ discovered by your crawler, indexed, and included in the first page of results for a certain query?

--- a/layouts/partials/webmentions.html
+++ b/layouts/partials/webmentions.html
@ -81,7 +81,8 @@
 									{{- if findRE `^https://brid.gy/[^/]*/mastodon` $webmention.source -}}
 									<p role="doc-tip" itemprop="accessibilitySummary">This comment may have major formatting errors that could impact screen reader comprehension.</p>
 									{{- end -}}
-									<p><q itemprop="text" class="p-content">{{ $webmention.content | replaceRE `^@Seirdy(@pleroma.envs.net)? ?` ""}}</q></p>
+									<p><q itemprop="text" class="p-content">{{ $webmention.content | replaceRE `^@Seirdy(@pleroma.envs.net)? ?` "" | replaceRE ` \n` `
+`}}</q></p>
 								{{- end -}}
 							{{- end }}
 						</dd>
--- a/scripts/populate-webrings.sh
+++ b/scripts/populate-webrings.sh
@ -89,11 +89,10 @@ netizens() {
 	printf 'Netizens,'
 	{
 		curl -sSL --compressed https://netizensring.link/onionring-variables.js \
-		| grep -C 1 https://seirdy.one/ | sd -s '];' "'https://netizensring.link/,'"
-		echo "'null',"
-	} | sd 'https://seirdy.one/,?' 'https://netizensring.link/,' \
-		| sd "\n|'" '' | trim_trailing_comma 
-		echo
+		| grep -C 1 https://seirdy.one/
+	} | sd 'https://seirdy.one/,?' 'https://netizensring.link/' \
+		| sd "\n|'|\r" '' | trim_trailing_comma 
+		echo ',null'
 }

 print_csv_values() {
Author	SHA1	Message	Date
Rohan Kumar	a4694fbaf9	Typo	2022-09-16 21:20:56 -07:00
Rohan Kumar	0fb45529c7	Fix webrings and webmentions	2022-09-16 20:24:43 -07:00
Rohan Kumar	931229e5e3	Update info on existing search engines - Infotiger and seekport are improving. - Right Dao seems to be missing recent results compared to others in its section.	2022-09-13 22:18:23 -07:00