1
0
Fork 0
mirror of https://git.sr.ht/~seirdy/seirdy.one synced 2024-12-17 22:32:10 +00:00

Compare commits

..

2 commits

Author SHA1 Message Date
Rohan Kumar
c68fb1f2f8
Add privacy info to Petal 2022-06-12 21:52:47 -07:00
Rohan Kumar
95a1685567
Kang VLC's robots.txt commentary 2022-06-12 21:52:28 -07:00
3 changed files with 15 additions and 2 deletions

View file

@ -64,7 +64,7 @@ These are large engines that pass all my standard tests and more.
4. Mojeek: Seems privacy-oriented with a large index containing billions of pages. Quality isnt at Google/Bing/Yandexs level, but its not bad either. If I had to use Mojeek as my default general search engine, Id live. Partially powers eTools.ch. At this moment, I think that Mojeek is the best alternative to GBY for general web search. 4. Mojeek: Seems privacy-oriented with a large index containing billions of pages. Quality isnt at Google/Bing/Yandexs level, but its not bad either. If I had to use Mojeek as my default general search engine, Id live. Partially powers eTools.ch. At this moment, I think that Mojeek is the best alternative to GBY for general web search.
5. Petal search: A search engine by Huawei that recently switched from searching for Android apps to general search. Despite its surprisingly good results, I wouldn't recommend it due to privacy concerns. Requires an account to submit sites. I discovered this via my access logs. Be aware that in some jurisdictions, it doesn't use its own index: in Russia and some EU regions it uses Yandex and Qwant, respectively. 5. Petal search: A search engine by Huawei that recently switched from searching for Android apps to general search. Despite its surprisingly good results, I wouldn't recommend it due to privacy concerns: its privacy policy describes advanced fingerprinting metrics, and it doesn't work without JavaScript. Requires an account to submit sites. I discovered this via my access logs. Be aware that in some jurisdictions, it doesn't use its own index: in Russia and some EU regions it uses Yandex and Qwant, respectively.
=> https://petalsearch.com/ petalsearch.com => https://petalsearch.com/ petalsearch.com

View file

@ -93,7 +93,7 @@ These are large engines that pass all my standard tests and more.
- [Mojeek](https://www.mojeek.com/): Seems privacy-oriented with a large index containing billions of pages. Quality isn't at GBY's level, but its not bad either. If I had to use Mojeek as my default general search engine, I'd live. Partially powers [eTools.ch](https://www.etools.ch/). At this moment, _I think that Mojeek is the best alternative to GBY_ for general search. - [Mojeek](https://www.mojeek.com/): Seems privacy-oriented with a large index containing billions of pages. Quality isn't at GBY's level, but its not bad either. If I had to use Mojeek as my default general search engine, I'd live. Partially powers [eTools.ch](https://www.etools.ch/). At this moment, _I think that Mojeek is the best alternative to GBY_ for general search.
- [Petal Search](https://petalsearch.com/). A search engine by Huawei that recently switched from searching for Android apps to general search in order to reduce dependence on Western search providers. Despite its surprisingly good results, I wouldn't recommend it due to privacy concerns. Requires an account to submit sites. I discovered this via my access logs. Be aware that in some jurisdictions, it doesn't use its own index: in Russia and some EU regions it uses Yandex and Qwant, respectively. - [Petal Search](https://petalsearch.com/). A search engine by Huawei that recently switched from searching for Android apps to general search in order to reduce dependence on Western search providers. Despite its surprisingly good results, I wouldn't recommend it due to privacy concerns: its privacy policy describes advanced fingerprinting metrics, and it doesn't work without JavaScript. Requires an account to submit sites. I discovered this via my access logs. Be aware that in some jurisdictions, it doesn't use its own index: in Russia and some EU regions it uses Yandex and Qwant, respectively.
Google, Bing, and Yandex support structured data such as microformats1, microdata, RDFa, Open Graph markup, and JSON-LD. Yandex's support for microformats1 is limited; for instance, it can parse `h-card` metadata for organizations but not people. Open Graph and Schema.org are the only supported vocabularies I'm aware of. Mojeek is evaluating structured data; it's interested in Open Graph and Schema.org vocabularies. Google, Bing, and Yandex support structured data such as microformats1, microdata, RDFa, Open Graph markup, and JSON-LD. Yandex's support for microformats1 is limited; for instance, it can parse `h-card` metadata for organizations but not people. Open Graph and Schema.org are the only supported vocabularies I'm aware of. Mojeek is evaluating structured data; it's interested in Open Graph and Schema.org vocabularies.

View file

@ -3,15 +3,28 @@ Disallow: /noindex/
Disallow: /misc/ Disallow: /misc/
Disallow: /webmentions/ Disallow: /webmentions/
# "This robot collects content from the Internet for the sole purpose of # helping educational institutions prevent plagiarism. [...] we compare # student papers against the content we find on the Internet to see if we # can find similarities." (http://www.turnitin.com/robot/crawlerinfo.html)
# --> fuck off.
User-Agent: TurnitinBot User-Agent: TurnitinBot
Disallow: / Disallow: /
# "NameProtect engages in crawling activity in search of a wide range of
# brand and other intellectual property violations that may be of interest
# to our clients." (http://www.nameprotect.com/botinfo.html)
# --> fuck off.
User-Agent: NPBot User-Agent: NPBot
Disallow: / Disallow: /
# "iThenticate is a new service we have developed to combat the piracy of intellectual property and ensure the originality of written work for# publishers, non-profit agencies, corporations, and newspapers." (http://www.slysearch.com/)
# --> fuck off.
User-Agent: SlySearch User-Agent: SlySearch
Disallow: / Disallow: /
# "BLEXBot assists internet marketers to get information on the link structure of sites and their interlinking on the web, to avoid any technical and possible legal issues and improve overall online experience." (http://webmeup-crawler.com/)
# --> fuck off.
User-Agent: BLEXBot
Dissalow: /
User-Agent: Adsbot User-Agent: Adsbot
Disallow: / Disallow: /