1
0
Fork 0
mirror of https://git.sr.ht/~seirdy/seirdy.one synced 2024-11-27 06:12:09 +00:00

Compare commits

...

4 commits

Author SHA1 Message Date
Seirdy
8d712639ea
Move Blog Surf to graveyard 2024-10-23 01:39:41 -04:00
Seirdy
ecdb95b5eb
Add upcoming search engine 2024-10-23 01:31:50 -04:00
Seirdy
1da6fc470e
Add fediverse:creator tag 2024-10-23 00:29:16 -04:00
Seirdy
aea602e88a
Block webz GenAI scraping 2024-10-23 00:09:46 -04:00
5 changed files with 13 additions and 5 deletions

View file

@ -35,6 +35,7 @@ first = "Rohan"
last = "Kumar" last = "Kumar"
nick = "Seirdy" nick = "Seirdy"
email = "seirdy@seirdy.one" email = "seirdy@seirdy.one"
fediverse = "@Seirdy@pleroma.envs.net"
[menu] [menu]
[[menu.main]] [[menu.main]]

View file

@ -316,7 +316,6 @@ These engines come close enough to passing my inclusion criteria that I felt I h
* Wiby: I love this one. It focuses on smaller independent sites that capture the spirit of the “early” web. Its more focused on “discovering” new interesting pages that match a set of keywords than finding a specific resources. I like to think of Wiby as an engine for surfing, not searching. Runnaroo occasionally featured a hit from Wiby (Runnaroo has since shut down). If you have a small site or blog that isnt very “commercial”, consider submitting it to the index. Does not qualify because it seems to be powered only by user-submitted sites; it doesn't try to "crawl the Web". * Wiby: I love this one. It focuses on smaller independent sites that capture the spirit of the “early” web. Its more focused on “discovering” new interesting pages that match a set of keywords than finding a specific resources. I like to think of Wiby as an engine for surfing, not searching. Runnaroo occasionally featured a hit from Wiby (Runnaroo has since shut down). If you have a small site or blog that isnt very “commercial”, consider submitting it to the index. Does not qualify because it seems to be powered only by user-submitted sites; it doesn't try to "crawl the Web".
* Mwmbl: like YaCy, it's an open-source engine whose crawling is community-driven. Users can install a Firefox addon to crawl pages in its backlog. Unfortunately, it doesn't qualify because it only crawls pages linked by hand-picked sites (e.g. Wikipedia, GitHub, domains that rank well on Hacker News). The crawl-depth is "1", so it doesn't crawl the whole Web (yet). * Mwmbl: like YaCy, it's an open-source engine whose crawling is community-driven. Users can install a Firefox addon to crawl pages in its backlog. Unfortunately, it doesn't qualify because it only crawls pages linked by hand-picked sites (e.g. Wikipedia, GitHub, domains that rank well on Hacker News). The crawl-depth is "1", so it doesn't crawl the whole Web (yet).
* Search My Site: Similar to Marginalia and Teclis, but only indexes user-submitted personal and independent sites. It optionally supports IndieAuth. Its API powers this site's search results; try it out using the search bar at the bottom of this page. Does not qualify because it's limited to user-submitted and/or hand-picked sites. * Search My Site: Similar to Marginalia and Teclis, but only indexes user-submitted personal and independent sites. It optionally supports IndieAuth. Its API powers this site's search results; try it out using the search bar at the bottom of this page. Does not qualify because it's limited to user-submitted and/or hand-picked sites.
* Blog Surf: a search engine for blogs with RSS/Atom feeds. Does not qualify because all blogs submitted to the index require manual review, but it seems interesting. Its "MarketRank" algorithm seems to give it a bias towards sites popular on "Hacker" "News".
* Kukei.eu: a curated search engine for web developers, which crawls a hand-picked list of sites. As it does not index the whole Web, it doesn't qualify. I still find it interesting. * Kukei.eu: a curated search engine for web developers, which crawls a hand-picked list of sites. As it does not index the whole Web, it doesn't qualify. I still find it interesting.
Unobtanium Search: A fledgling search engine by Slatian. At the time of writing, it crawls hand-curated sites: personal, technical, indie wiki, and German hacker community sites. It may eventually crawl government/public-service sites. More documentation will be on its website. Unobtanium Search: A fledgling search engine by Slatian. At the time of writing, it crawls hand-curated sites: personal, technical, indie wiki, and German hacker community sites. It may eventually crawl government/public-service sites. More documentation will be on its website.
@ -324,7 +323,6 @@ Unobtanium Search: A fledgling search engine by Slatian. At the time of writing,
=> https://wiby.me wiby.me => https://wiby.me wiby.me
=> https://mwmbl.org/ Mwmbl => https://mwmbl.org/ Mwmbl
=> https://searchmysite.net Search My site => https://searchmysite.net Search My site
=> https://blogsurf.io/ Blog Surf
=> https://kukei.eu/ Kukei.eu => https://kukei.eu/ Kukei.eu
=> https://unobtanium.rocks/ => https://unobtanium.rocks/
@ -393,16 +391,19 @@ These engines were originally included in the article, but have since been disco
* Siik: Lacked contact info, and the ToS and Privacy Policy links were dead. Seemed to have PHP errors in the backend for some of its instant-answer widgets. If you scrolled past all that, you'd find web results powered by what seems to be its own index. These results did tend to be somewhat relevant, but the index seemed too small for more specific queries. * Siik: Lacked contact info, and the ToS and Privacy Policy links were dead. Seemed to have PHP errors in the backend for some of its instant-answer widgets. If you scrolled past all that, you'd find web results powered by what seems to be its own index. These results did tend to be somewhat relevant, but the index seemed too small for more specific queries.
* Parsijoo: Persian search engine * Parsijoo: Persian search engine
* Moose.at: German (Austria-based). The site is still up but redirects searches to Brave. * Moose.at: German (Austria-based). The site is still up but redirects searches to Brave.
* Blog Surf: a search engine for blogs with RSS/Atom feeds. Originally in "almost qualified". It did not qualify because all blogs submitted to the index require manual review, but it seemed interesting. Its "MarketRank" algorithm gave it a bias towards sites popular on "Hacker" "News".
=> https://web.archive.org/web/20221002041725/https://siik.co/ Siik => https://web.archive.org/web/20221002041725/https://siik.co/ Siik
=> https://www.parsijoo.ir/ Parsijoo => https://www.parsijoo.ir/ Parsijoo
=> https://www.moose.at Moose.at => https://www.moose.at Moose.at
=> https://blogsurf.io/ Blog Surf
## Upcoming engines ## Upcoming engines
=> https://cyberfind.net/bot.html Cyberfind => https://cyberfind.net/bot.html Cyberfind
=> https://fynd.bot/ fynd => https://fynd.bot/ fynd
=> https://www.wepch.com/search-engine Wepch Search Engine => https://www.wepch.com/search-engine Wepch Search Engine
=> https://www.weblogdb.com/ Weblog DataBase
## Exclusions ## Exclusions

View file

@ -355,9 +355,6 @@ These engines come close enough to passing my inclusion criteria that I felt I h
[Search My Site](https://searchmysite.net) [Search My Site](https://searchmysite.net)
: Similar to Marginalia and Teclis, but only indexes user-submitted personal and independent sites. It optionally supports IndieAuth. Its API powers this site's search results; try it out using the search bar at the bottom of this page. Does not qualify because it's limited to user-submitted and/or hand-picked sites. : Similar to Marginalia and Teclis, but only indexes user-submitted personal and independent sites. It optionally supports IndieAuth. Its API powers this site's search results; try it out using the search bar at the bottom of this page. Does not qualify because it's limited to user-submitted and/or hand-picked sites.
[Blog Surf](https://blogsurf.io/)
: A search engine for blogs with RSS/Atom feeds. Does not qualify because all blogs submitted to the index require manual review, but it seems interesting. Its "MarketRank" algorithm seems to give it a bias towards sites popular on "Hacker" "News".
[Kukei.eu](https://kukei.eu/) [Kukei.eu](https://kukei.eu/)
: A curated search engine for web developers, which crawls [a hand-picked list of sites](https://github.com/Kukei-eu/spider/blob/914b8dfffc10cb3a948561aef2bf86937d3a0b2e/index-sources.js). As it does not index the whole Web, it doesn't qualify. I still find it interesting. : A curated search engine for web developers, which crawls [a hand-picked list of sites](https://github.com/Kukei-eu/spider/blob/914b8dfffc10cb3a948561aef2bf86937d3a0b2e/index-sources.js). As it does not index the whole Web, it doesn't qualify. I still find it interesting.
@ -419,6 +416,9 @@ websearchengine.org OR tuxdex.com
[Siik](https://web.archive.org/web/20221002041725/https://siik.co/) [Siik](https://web.archive.org/web/20221002041725/https://siik.co/)
: Lacked contact info, and the ToS and Privacy Policy links were dead. Seemed to have PHP errors in the backend for some of its instant-answer widgets. If you scrolled past all that, you'd find web results powered by what seems to be its own index. These results did tend to be somewhat relevant, but the index seemed too small for more specific queries. : Lacked contact info, and the ToS and Privacy Policy links were dead. Seemed to have PHP errors in the backend for some of its instant-answer widgets. If you scrolled past all that, you'd find web results powered by what seems to be its own index. These results did tend to be somewhat relevant, but the index seemed too small for more specific queries.
[Blog Surf](https://blogsurf.io/)
: A search engine for blogs with RSS/Atom feeds. Originally in "almost qualified". It did not qualify because all blogs submitted to the index require manual review, but it seemed interesting. Its "MarketRank" algorithm gave it a bias towards sites popular on "Hacker" "News".
Dead engines I don't have an extended description for: Dead engines I don't have an extended description for:
- [Parsijoo](https://www.parsijoo.ir/): Persian search engine. - [Parsijoo](https://www.parsijoo.ir/): Persian search engine.
@ -430,6 +430,7 @@ Dead engines I don't have an extended description for:
- [Cyberfind/find.tf](https://cyberfind.net/bot.html) - [Cyberfind/find.tf](https://cyberfind.net/bot.html)
- [fynd](https://fynd.bot/) - [fynd](https://fynd.bot/)
- [Wepch](https://www.wepch.com/search-engine) - [Wepch](https://www.wepch.com/search-engine)
- [Weblog DataBase](https://www.weblogdb.com/)
## Exclusions ## Exclusions

View file

@ -54,6 +54,7 @@
{{- end -}} {{- end -}}
<meta name="description" content="{{ $description }}" /> <meta name="description" content="{{ $description }}" />
<meta name="author" content="{{ .Site.Author.name }}" /> <meta name="author" content="{{ .Site.Author.name }}" />
<meta name="fediverse:creator" content="{{ .Site.Author.fediverse }}" />
<meta property="article:author" content="{{ .Site.Author.name }}" /> <meta property="article:author" content="{{ .Site.Author.name }}" />
{{ if and (gt .Date 0) (not .Params.evergreen) -}} {{ if and (gt .Date 0) (not .Params.evergreen) -}}
<meta property="article:published_time" content="{{ .Date.UTC.Format "2006-01-02T15:04:05Z07:00" }}" /> <meta property="article:published_time" content="{{ .Date.UTC.Format "2006-01-02T15:04:05Z07:00" }}" />

View file

@ -129,6 +129,10 @@ Disallow: /
User-agent: Cotoyogi User-agent: Cotoyogi
Disallow: / Disallow: /
# https://webz.io/bot.html
User-agent: Webzio-extended
Disallow: /
# I'm not blocking CCBot for now. It publishes a free index for anyone to use. # I'm not blocking CCBot for now. It publishes a free index for anyone to use.
# Googe used this to train the initial version of Bard (now called Gemini). # Googe used this to train the initial version of Bard (now called Gemini).
# I allow CCBot since its index is also used for upstart/hobbyist search engines # I allow CCBot since its index is also used for upstart/hobbyist search engines