1
0
Fork 0
mirror of https://git.sr.ht/~seirdy/seirdy.one synced 2024-09-19 20:02:10 +00:00

Compare commits

...

4 commits

Author SHA1 Message Date
Rohan Kumar
b061e1b63f
Fix bad schema.org markup 2022-06-21 12:59:09 -07:00
Rohan Kumar
70962529bf
Mention paused DDG-Yandex partnership 2022-06-21 09:44:48 -07:00
Rohan Kumar
d0dc380438
Increase cache lifetime for htmltest 2022-06-21 09:42:02 -07:00
Rohan Kumar
eeef22ac61
New engine: searchcode 2022-06-21 09:41:46 -07:00
3 changed files with 11 additions and 5 deletions

View file

@ -60,7 +60,8 @@ These are large engines that pass all my standard tests and more.
3. Yandex: originally a Russian search engine, it now has an English version. Some Russian results bleed into its English site. Like Bing, it allows submitting pages and sitemaps for crawling using the IndexNow API. Powers: 3. Yandex: originally a Russian search engine, it now has an English version. Some Russian results bleed into its English site. Like Bing, it allows submitting pages and sitemaps for crawling using the IndexNow API. Powers:
* Epic Search (went paid-only by June 2021) * Epic Search (went paid-only by June 2021)
* Occasionally powers DuckDuckGos link results instead of Bing. * Occasionally powers DuckDuckGos link results instead of Bing. (update: DuckDuckGo has "paused" its partnership with Yandex)
* Petal for Russian users only.
4. Mojeek: Seems privacy-oriented with a large index containing billions of pages. Quality isnt at Google/Bing/Yandexs level, but its not bad either. If I had to use Mojeek as my default general search engine, Id live. Partially powers eTools.ch. At this moment, I think that Mojeek is the best alternative to GBY for general web search. 4. Mojeek: Seems privacy-oriented with a large index containing billions of pages. Quality isnt at Google/Bing/Yandexs level, but its not bad either. If I had to use Mojeek as my default general search engine, Id live. Partially powers eTools.ch. At this moment, I think that Mojeek is the best alternative to GBY for general web search.
@ -222,12 +223,14 @@ These engines try to find a website, typically at the domain-name level. They do
* Ninfex: a "people-powered" search engine that combines aspects of link aggregators and search. It lets users vote on submissions and it also displays links to forums about submissions. * Ninfex: a "people-powered" search engine that combines aspects of link aggregators and search. It lets users vote on submissions and it also displays links to forums about submissions.
* Semantic Scholar: a search engine by the Allen Institute for AI focused on academic PDFs, with a couple hundred million papers indexed. Discovered in my access logs. * Semantic Scholar: a search engine by the Allen Institute for AI focused on academic PDFs, with a couple hundred million papers indexed. Discovered in my access logs.
* Bonzamate: a search engine specifically for Australian websites. * Bonzamate: a search engine specifically for Australian websites.
* searchcode: A code-search engine by the developer of Bonzamate. Searches a hand-picked list of code forges for source code, supporting many search operators.
=> https://www.keybot.com/ Keybot Translation Search Machine. => https://www.keybot.com/ Keybot Translation Search Machine.
=> https://ninfex.com Ninfex => https://ninfex.com Ninfex
=> https://www.semanticscholar.org/ Semantic Scholar => https://www.semanticscholar.org/ Semantic Scholar
=> https://bonzamate.com.au/ Bonzamate => https://bonzamate.com.au/ Bonzamate
=> https://boyter.org/posts/abusing-aws-to-make-a-search-engine/ Blog post about Bonzamate: "Abuzing AWS to make a search engine". => https://boyter.org/posts/abusing-aws-to-make-a-search-engine/ Blog post about Bonzamate: "Abuzing AWS to make a search engine".
=> https://searchcode.com/ searchcode
## Other languages ## Other languages
@ -435,7 +438,7 @@ He also gave me some useful details about Seznam, Naver, Baidu, and Goo:
² Matt from Gigablast told me that indexing YouTube or LinkedIn will get you blocked if you aren't Google or Microsoft. I imagine that you could do so by getting special permission if you're a megacorporation. ² Matt from Gigablast told me that indexing YouTube or LinkedIn will get you blocked if you aren't Google or Microsoft. I imagine that you could do so by getting special permission if you're a megacorporation.
³ DuckDuckGo has a crawler called DuckDuckBot. This crawler doesn't impact the linked results displayed; it just grabs favicons and scrapes data for a few instant answers. DuckDuckGo's help pages claim that the engine uses over 400 sources; my interpretation is that at least 398 sources don't impact organic results. I don't think DuckDuckGo is transparent enough about the fact that their organic results are proxied. Compare DuckDuckGo side-by-side with Bing and Yandex and you'll see it's sourcing organic results from one of them (probably Bing). Update 2022: DuckDuckGo has the ability to downrank results on its own; it was previously working with Bing to get Bing to remove misinformation and spam: ³ DuckDuckGo has a crawler called DuckDuckBot. This crawler doesn't impact the linked results displayed; it just grabs favicons and scrapes data for a few instant answers. DuckDuckGo's help pages claim that the engine uses over 400 sources; my interpretation is that at least 398 sources don't impact organic results. I don't think DuckDuckGo is transparent enough about the fact that their organic results are proxied. Compare DuckDuckGo side-by-side with Bing and you'll see it's sourcing organic results from one of them (probably Bing). Update 2022: DuckDuckGo has the ability to downrank results on its own; it was previously working with Bing to get Bing to remove misinformation and spam:
=> https://web.archive.org/web/20220310222014/https://nitter.pussthecat.org/yegg/status/1501716484761997318 Gabriel Weinberg on Twitter => https://web.archive.org/web/20220310222014/https://nitter.pussthecat.org/yegg/status/1501716484761997318 Gabriel Weinberg on Twitter
=> https://www.nytimes.com/2022/02/23/technology/duckduckgo-conspiracy-theories.html DuckDuckGo's prior approach to moderation => https://www.nytimes.com/2022/02/23/technology/duckduckgo-conspiracy-theories.html DuckDuckGo's prior approach to moderation

View file

@ -89,7 +89,8 @@ These are large engines that pass all my standard tests and more.
- Yandex: originally a Russian search engine, it now has an English version. Some Russian results bleed into its English site. Like Bing, it allows submitting pages and sitemaps for crawling using the IndexNow API. Powers: - Yandex: originally a Russian search engine, it now has an English version. Some Russian results bleed into its English site. Like Bing, it allows submitting pages and sitemaps for crawling using the IndexNow API. Powers:
- Epic Search (went paid-only as of June 2021) - Epic Search (went paid-only as of June 2021)
- Occasionally powers DuckDuck&shy;Go's link results instead of Bing. - Occasionally powers DuckDuck&shy;Go's link results instead of Bing <ins cite="https://energycommerce.house.gov/committee-activity/hearings/hearing-on-holding-big-tech-accountable-legislation-to-protect-online">(update: DuckDuckGo has "paused" its partnership with Yandex, confirmed in {{<mention-work itemtype="Event" itemprop="mentions" role="doc-credit">}}{{<cited-work name="Hearing on “Holding Big Tech Accountable: Legislation to Protect Online Users”" url="https://energycommerce.house.gov/committee-activity/hearings/hearing-on-holding-big-tech-accountable-legislation-to-protect-online" >}}{{</mention-work>}})</ins>
- Petal, for Russian users only.
- [Mojeek](https://www.mojeek.com/): Seems privacy-oriented with a large index containing billions of pages. Quality isn't at GBY's level, but its not bad either. If I had to use Mojeek as my default general search engine, I'd live. Partially powers [eTools.ch](https://www.etools.ch/). At this moment, _I think that Mojeek is the best alternative to GBY_ for general search. - [Mojeek](https://www.mojeek.com/): Seems privacy-oriented with a large index containing billions of pages. Quality isn't at GBY's level, but its not bad either. If I had to use Mojeek as my default general search engine, I'd live. Partially powers [eTools.ch](https://www.etools.ch/). At this moment, _I think that Mojeek is the best alternative to GBY_ for general search.
@ -212,6 +213,8 @@ These engines try to find a website, typically at the domain-name level. They do
- [Bonzamate](https://bonzamate.com.au/): a search engine specifically for Australian websites. Boyter wrote [an interesting blog post about Bonzamate](https://boyter.org/posts/abusing-aws-to-make-a-search-engine/). - [Bonzamate](https://bonzamate.com.au/): a search engine specifically for Australian websites. Boyter wrote [an interesting blog post about Bonzamate](https://boyter.org/posts/abusing-aws-to-make-a-search-engine/).
- [searchcode](https://searchcode.com/): A code-search engine by the developer of Bonzamate. Searches a hand-picked list of code forges for source code, supporting many search operators.
Other languages Other languages
--------------- ---------------
@ -376,7 +379,7 @@ When building webpages, authors need to consider the barriers to entry for a new
Try a "bad" engine from lower in the list. It might show you utter crap. But every garbage heap has an undiscovered treasure. I'm sure that some hidden gems you'll find will be worth your while. Let's add some serendipity to the SEO-filled Web. Try a "bad" engine from lower in the list. It might show you utter crap. But every garbage heap has an undiscovered treasure. I'm sure that some hidden gems you'll find will be worth your while. Let's add some serendipity to the SEO-filled Web.
Acknow&shy;ledgements {#acknowledgements} Acknow&shy;ledgements {#acknowledgements}
------------------------------- ---------------------
Some of this content came from the [Search Engine Map](https://www.searchenginemap.com/) and [Search Engine Party](https://searchengine.party/). A few web directories also proved useful. Some of this content came from the [Search Engine Map](https://www.searchenginemap.com/) and [Search Engine Party](https://searchengine.party/). A few web directories also proved useful.

View file

@ -1,7 +1,7 @@
DirectoryPath: "public" DirectoryPath: "public"
IgnoreDirs: IgnoreDirs:
- "search" - "search"
CacheExpires: "72h" # three days CacheExpires: "96h" # four days
CheckFavicon: true CheckFavicon: true
EnforceHTML5: true EnforceHTML5: true
IgnoreAltMissing: true # an empty alt makes presentation-role explicit, it's not a defect. IgnoreAltMissing: true # an empty alt makes presentation-role explicit, it's not a defect.