mirror of
https://git.sr.ht/~seirdy/seirdy.one
synced 2024-11-23 04:42:10 +00:00
Compare commits
4 commits
26484342a2
...
8d712639ea
Author | SHA1 | Date | |
---|---|---|---|
|
8d712639ea | ||
|
ecdb95b5eb | ||
|
1da6fc470e | ||
|
aea602e88a |
5 changed files with 13 additions and 5 deletions
|
@ -35,6 +35,7 @@ first = "Rohan"
|
|||
last = "Kumar"
|
||||
nick = "Seirdy"
|
||||
email = "seirdy@seirdy.one"
|
||||
fediverse = "@Seirdy@pleroma.envs.net"
|
||||
|
||||
[menu]
|
||||
[[menu.main]]
|
||||
|
|
|
@ -316,7 +316,6 @@ These engines come close enough to passing my inclusion criteria that I felt I h
|
|||
* Wiby: I love this one. It focuses on smaller independent sites that capture the spirit of the “early” web. It’s more focused on “discovering” new interesting pages that match a set of keywords than finding a specific resources. I like to think of Wiby as an engine for surfing, not searching. Runnaroo occasionally featured a hit from Wiby (Runnaroo has since shut down). If you have a small site or blog that isn’t very “commercial”, consider submitting it to the index. Does not qualify because it seems to be powered only by user-submitted sites; it doesn't try to "crawl the Web".
|
||||
* Mwmbl: like YaCy, it's an open-source engine whose crawling is community-driven. Users can install a Firefox addon to crawl pages in its backlog. Unfortunately, it doesn't qualify because it only crawls pages linked by hand-picked sites (e.g. Wikipedia, GitHub, domains that rank well on Hacker News). The crawl-depth is "1", so it doesn't crawl the whole Web (yet).
|
||||
* Search My Site: Similar to Marginalia and Teclis, but only indexes user-submitted personal and independent sites. It optionally supports IndieAuth. Its API powers this site's search results; try it out using the search bar at the bottom of this page. Does not qualify because it's limited to user-submitted and/or hand-picked sites.
|
||||
* Blog Surf: a search engine for blogs with RSS/Atom feeds. Does not qualify because all blogs submitted to the index require manual review, but it seems interesting. Its "MarketRank" algorithm seems to give it a bias towards sites popular on "Hacker" "News".
|
||||
* Kukei.eu: a curated search engine for web developers, which crawls a hand-picked list of sites. As it does not index the whole Web, it doesn't qualify. I still find it interesting.
|
||||
Unobtanium Search: A fledgling search engine by Slatian. At the time of writing, it crawls hand-curated sites: personal, technical, indie wiki, and German hacker community sites. It may eventually crawl government/public-service sites. More documentation will be on its website.
|
||||
|
||||
|
@ -324,7 +323,6 @@ Unobtanium Search: A fledgling search engine by Slatian. At the time of writing,
|
|||
=> https://wiby.me wiby.me
|
||||
=> https://mwmbl.org/ Mwmbl
|
||||
=> https://searchmysite.net Search My site
|
||||
=> https://blogsurf.io/ Blog Surf
|
||||
=> https://kukei.eu/ Kukei.eu
|
||||
=> https://unobtanium.rocks/
|
||||
|
||||
|
@ -393,16 +391,19 @@ These engines were originally included in the article, but have since been disco
|
|||
* Siik: Lacked contact info, and the ToS and Privacy Policy links were dead. Seemed to have PHP errors in the backend for some of its instant-answer widgets. If you scrolled past all that, you'd find web results powered by what seems to be its own index. These results did tend to be somewhat relevant, but the index seemed too small for more specific queries.
|
||||
* Parsijoo: Persian search engine
|
||||
* Moose.at: German (Austria-based). The site is still up but redirects searches to Brave.
|
||||
* Blog Surf: a search engine for blogs with RSS/Atom feeds. Originally in "almost qualified". It did not qualify because all blogs submitted to the index require manual review, but it seemed interesting. Its "MarketRank" algorithm gave it a bias towards sites popular on "Hacker" "News".
|
||||
|
||||
=> https://web.archive.org/web/20221002041725/https://siik.co/ Siik
|
||||
=> https://www.parsijoo.ir/ Parsijoo
|
||||
=> https://www.moose.at Moose.at
|
||||
=> https://blogsurf.io/ Blog Surf
|
||||
|
||||
## Upcoming engines
|
||||
|
||||
=> https://cyberfind.net/bot.html Cyberfind
|
||||
=> https://fynd.bot/ fynd
|
||||
=> https://www.wepch.com/search-engine Wepch Search Engine
|
||||
=> https://www.weblogdb.com/ Weblog DataBase
|
||||
|
||||
## Exclusions
|
||||
|
||||
|
|
|
@ -355,9 +355,6 @@ These engines come close enough to passing my inclusion criteria that I felt I h
|
|||
[Search My Site](https://searchmysite.net)
|
||||
: Similar to Marginalia and Teclis, but only indexes user-submitted personal and independent sites. It optionally supports IndieAuth. Its API powers this site's search results; try it out using the search bar at the bottom of this page. Does not qualify because it's limited to user-submitted and/or hand-picked sites.
|
||||
|
||||
[Blog Surf](https://blogsurf.io/)
|
||||
: A search engine for blogs with RSS/Atom feeds. Does not qualify because all blogs submitted to the index require manual review, but it seems interesting. Its "MarketRank" algorithm seems to give it a bias towards sites popular on "Hacker" "News".
|
||||
|
||||
[Kukei.eu](https://kukei.eu/)
|
||||
: A curated search engine for web developers, which crawls [a hand-picked list of sites](https://github.com/Kukei-eu/spider/blob/914b8dfffc10cb3a948561aef2bf86937d3a0b2e/index-sources.js). As it does not index the whole Web, it doesn't qualify. I still find it interesting.
|
||||
|
||||
|
@ -419,6 +416,9 @@ websearchengine.org OR tuxdex.com
|
|||
[Siik](https://web.archive.org/web/20221002041725/https://siik.co/)
|
||||
: Lacked contact info, and the ToS and Privacy Policy links were dead. Seemed to have PHP errors in the backend for some of its instant-answer widgets. If you scrolled past all that, you'd find web results powered by what seems to be its own index. These results did tend to be somewhat relevant, but the index seemed too small for more specific queries.
|
||||
|
||||
[Blog Surf](https://blogsurf.io/)
|
||||
: A search engine for blogs with RSS/Atom feeds. Originally in "almost qualified". It did not qualify because all blogs submitted to the index require manual review, but it seemed interesting. Its "MarketRank" algorithm gave it a bias towards sites popular on "Hacker" "News".
|
||||
|
||||
Dead engines I don't have an extended description for:
|
||||
|
||||
- [Parsijoo](https://www.parsijoo.ir/): Persian search engine.
|
||||
|
@ -430,6 +430,7 @@ Dead engines I don't have an extended description for:
|
|||
- [Cyberfind/find.tf](https://cyberfind.net/bot.html)
|
||||
- [fynd](https://fynd.bot/)
|
||||
- [Wepch](https://www.wepch.com/search-engine)
|
||||
- [Weblog DataBase](https://www.weblogdb.com/)
|
||||
|
||||
## Exclusions
|
||||
|
||||
|
|
|
@ -54,6 +54,7 @@
|
|||
{{- end -}}
|
||||
<meta name="description" content="{{ $description }}" />
|
||||
<meta name="author" content="{{ .Site.Author.name }}" />
|
||||
<meta name="fediverse:creator" content="{{ .Site.Author.fediverse }}" />
|
||||
<meta property="article:author" content="{{ .Site.Author.name }}" />
|
||||
{{ if and (gt .Date 0) (not .Params.evergreen) -}}
|
||||
<meta property="article:published_time" content="{{ .Date.UTC.Format "2006-01-02T15:04:05Z07:00" }}" />
|
||||
|
|
|
@ -129,6 +129,10 @@ Disallow: /
|
|||
User-agent: Cotoyogi
|
||||
Disallow: /
|
||||
|
||||
# https://webz.io/bot.html
|
||||
User-agent: Webzio-extended
|
||||
Disallow: /
|
||||
|
||||
# I'm not blocking CCBot for now. It publishes a free index for anyone to use.
|
||||
# Googe used this to train the initial version of Bard (now called Gemini).
|
||||
# I allow CCBot since its index is also used for upstart/hobbyist search engines
|
||||
|
|
Loading…
Reference in a new issue