1
0
Fork 0
mirror of https://git.sr.ht/~seirdy/seirdy.one synced 2024-11-27 22:12:10 +00:00

Compare commits

..

3 commits

Author SHA1 Message Date
Rohan Kumar
a4694fbaf9
Typo 2022-09-16 21:20:56 -07:00
Rohan Kumar
0fb45529c7
Fix webrings and webmentions 2022-09-16 20:24:43 -07:00
Rohan Kumar
931229e5e3
Update info on existing search engines
- Infotiger and seekport are improving.
- Right Dao seems to be missing recent results compared to others in its
  section.
2022-09-13 22:18:23 -07:00
4 changed files with 21 additions and 21 deletions

View file

@ -79,7 +79,7 @@ Google, Bing, and Yandex support structured data such as microformats1, microdat
These engines pass most of the tests listed in the "methodology" section. All of them seem relatively privacy-friendly. I wouldn't recommend using these engines to find specific answers; they're better for learning about a topic by finding interesting pages related to a set of keywords.
* Right Dao: very fast, good results. Passes the tests fairly well. It plans on including query-based ads if/when its userbase grows.⁸ For the past few months, its index seems to have focused more on large, established sites rather than smaller, independent ones.
* Right Dao: very fast, good results. Passes the tests fairly well. It plans on including query-based ads if/when its userbase grows.⁸ For the past few months, its index seems to have focused more on large, established sites rather than smaller, independent ones. It seems to be a bit lacking in more recent pages.
=> https://rightdao.com Right Dao
* Gigablast: Its been around for a while and also sports a classic web directory. Searches are a bit slow, and it charges to submit sites for crawling. It powers Private.sh. Gigablast is tied with Right Dao for quality.
@ -104,16 +104,17 @@ Yep supports Open Graph and some JSON-LD at the moment. A look through the sourc
These engines fail badly at a few important tests. Otherwise, they seem to work well enough.
* seekport: The interface is in German but it supports searching in English just fine. The default language is selected by your locale. Its really good considering its small index; it hasnt heard of less common terms (e.g. “Seirdy”), but its able to find relevant results in other tests. The server does not support TLS.
* Infotiger: My favorite engine in this section. It offers advanced result filtering and sports a somewhat large index. It allows site submission for English and German pages. The fastest-improving engine in this section; I look forward to the day it "graduates" to the previous section. Infotiger also has a Tor hidden service.
* seekport: The interface is in German but it supports searching in English just fine. The default language is selected by your locale. Its really good considering its small index; it hasnt heard of less common terms. but its able to find relevant results in other tests. It's the second-fastest-improving engines in this section.
* Exalead: slow, quality is hit-and-miss. Its indexer claims to crawl the DMOZ directory, which has since shut down and been replaced by the Curlie directory. No relevant results for “Oppenheimer” and some other history-related queries. Allows submitting individual URLs for indexing, but requires solving a Google reCAPTCHA and entering an email address.
* ExactSeek: small index, disproportionately dominated by big sites. Failed multiple tests. Allows submitting individual URLs for crawling, but requires entering an email address and receiving a newsletter. Webmaster tools seem to heavily push for paid SEO options. It also powers SitesOnDisplay and Blog-search.com.
* Infotiger: A small index that seems to find relevant results. It allows site submission for English and German pages. It also features a "similarity" search to query pages similar to a given link, with mixed results.
=> https://infotiger.com/ Infotiger
=> http://infotiger4xywbfq45mvd5drh43jpqeurakg2ya7gqwvjf2bbwnixzqd.onion/ Infotiger hidden service
=> http://www.seekport.com/ seekport (HTTP only)
=> https://www.exalead.com/search/ Exalead
=> https://curlie.org Curlie
=> https://www.exactseek.com/ ExactSeek
=> https://alpha.infotiger.com/ Infotiger
* Burf.co: Very small index, but seems fine at ranking more relevant results higher. Allows site submission without any extra steps.
* Entfer: a newcomer that lets registered users upvote/downvote search results to customize ranking. Doesn't offer much information about who made it. Its index is small, but it does seem to return results related to the query.
@ -380,7 +381,7 @@ I find new engines by:
### Criteria for inclusion
Engines in this list should have their own indexes powered by by web crawlers. Original results should not be limited to a set of websites hand-picked by the engine creators; indexes should be built from sites from across the Web. An engine should discover new interesting places around the Web.
Engines in this list should have their own indexes powered by web crawlers. Original results should not be limited to a set of websites hand-picked by the engine creators; indexes should be built from sites from across the Web. An engine should discover new interesting places around the Web.
Here's an oversimplified example to illustrate what I'm looking for: imagine somone self-hosts their own personal or interest-specific website and happens to get some recognition. Could they get *automatically* discovered by your crawler, indexed, and included in the first page of results for a certain query?

View file

@ -130,11 +130,13 @@ Yep supports Open Graph and some JSON-LD at the moment. A look through the sourc
### Smaller indexes, hit-and-<wbr />miss {#smaller-indexes-hit-and-miss}
These engines fail badly at a few important tests. Otherwise, they seem to work well enough.
These engines fail badly at a few important tests. Otherwise, they seem to work well enough for users who'd like some more serendipity in less-specific searches.
[Infotiger](https://alpha.infotiger.com/)
: My favorite engine in this section. It offers advanced result filtering and sports a somewhat large index. It allows site submission for English and German pages. The fastest-improving engine in this section: I use it often to discover new sites, and look forward to the day it "graduates" to the previous section. [Infotier has a Tor hidden service](http://infotiger4xywbfq45mvd5drh43jpqeurakg2ya7gqwvjf2bbwnixzqd.onion/).
[seekport](http://www.seekport.com/)
: The interface is in German but it supports searching in English just fine. The default language is selected by your locale. It's really good considering its small index; it hasn't heard of less common terms (e.g. "Seirdy"), but it's able to find relevant results in other tests.
: The interface is in German but it supports searching in English just fine. The default language is selected by your locale. Its really good considering its small index; it hasnt heard of less common terms. but its able to find relevant results in other tests. It's the second-fastest-improving engines in this section.
[Exalead](https://www.exalead.com/search/)
: Slow, quality is hit-and-miss. Its indexer claims to crawl the DMOZ directory, which has since shut down and been replaced by the [Curlie](https://curlie.org) directory. No relevant results for "Oppenheimer" and some other history-related queries. Allows submitting individual URLs for indexing, but requires solving a Google reCAPTCHA and entering an email address.
@ -142,9 +144,6 @@ These engines fail badly at a few important tests. Otherwise, they seem to work
[ExactSeek](https://www.exactseek.com/)
: Small index, disproportionately dominated by big sites. Failed multiple tests. Allows submitting individual URLs for crawling, but requires entering an email address and receiving a newsletter. Webmaster tools seem to heavily push for paid <abbr title="search-engine optimization">SEO</abbr> options. It also powers SitesOnDisplay and [Blog-<wbr />search.com](https://www.blog-search.com).
[Infotiger](https://alpha.infotiger.com/)
: A small index that seems to find relevant results. It allows site submission for English and German pages. It also features a "similarity" search to query pages similar to a given link, with mixed results.
[Burf.co](https://burf.co/)
: Very small index, but seems fine at ranking more relevant results higher. Allows site submission without any extra steps.
@ -409,7 +408,7 @@ I find new engines by:
### Criteria for inclusion
Engines in this list should have their own indexes powered by by web crawlers. Original results should not be limited to a set of websites hand-picked by the engine creators; indexes should be built from sites from across the Web. An engine should discover new interesting places around the Web.
Engines in this list should have their own indexes powered by web crawlers. Original results should not be limited to a set of websites hand-picked by the engine creators; indexes should be built from sites from across the Web. An engine should discover new interesting places around the Web.
Here's an oversimplified example to illustrate what I'm looking for: imagine somone self-hosts their own personal or interest-specific website and happens to get some recognition. Could they get _automatically_ discovered by your crawler, indexed, and included in the first page of results for a certain query?

View file

@ -81,7 +81,8 @@
{{- if findRE `^https://brid.gy/[^/]*/mastodon` $webmention.source -}}
<p role="doc-tip" itemprop="accessibilitySummary">This comment may have major formatting errors that could impact screen reader comprehension.</p>
{{- end -}}
<p><q itemprop="text" class="p-content">{{ $webmention.content | replaceRE `^@Seirdy(@pleroma.envs.net)? ?` ""}}</q></p>
<p><q itemprop="text" class="p-content">{{ $webmention.content | replaceRE `^@Seirdy(@pleroma.envs.net)? ?` "" | replaceRE ` \n` `
`}}</q></p>
{{- end -}}
{{- end }}
</dd>

View file

@ -89,11 +89,10 @@ netizens() {
printf 'Netizens,'
{
curl -sSL --compressed https://netizensring.link/onionring-variables.js \
| grep -C 1 https://seirdy.one/ | sd -s '];' "'https://netizensring.link/,'"
echo "'null',"
} | sd 'https://seirdy.one/,?' 'https://netizensring.link/,' \
| sd "\n|'" '' | trim_trailing_comma
echo
| grep -C 1 https://seirdy.one/
} | sd 'https://seirdy.one/,?' 'https://netizensring.link/' \
| sd "\n|'|\r" '' | trim_trailing_comma
echo ',null'
}
print_csv_values() {