1
0
Fork 0
mirror of https://git.sr.ht/~seirdy/seirdy.one synced 2024-11-23 21:02:09 +00:00

Compare commits

..

No commits in common. "e4e020649d12ee3879c4b26e06e6b25da1a713ab" and "265b032c6a548f4e44cb750104e404d014945a61" have entirely different histories.

4 changed files with 16 additions and 59 deletions

Binary file not shown.

Before

Width:  |  Height:  |  Size: 260 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 2.4 KiB

View file

@ -42,13 +42,13 @@ Read more about the design of this site in my [site design standards page]({{<re
<a href="https://anybrowser.org/campaign/">{{<picture name="b/any_browser" alt="The text “any browser you like.” next to a light prism." width="162" height="62" class="pix">}}</a> <a href="https://anybrowser.org/campaign/">{{<picture name="b/any_browser" alt="The text “any browser you like.” next to a light prism." width="162" height="62" class="pix">}}</a>
<a href="https://dd-b.net/lynx-enhanced.html">{{<picture name="b/lynx_enh" alt="Lynx Enhanced." width="162" height="62" class="pix">}}</a> <a href="https://dd-b.net/lynx-enhanced.html">{{<picture name="b/lynx_enh" alt="Lynx Enhanced." width="162" height="62" class="pix">}}</a>
<a href="https://www.torproject.org/">{{<picture name="b/tor" alt="The Tor Project." width="162" height="62" class="pix">}}</a>
<a href="https://web.archive.org/web/20230607005614/http://www.ermel.org/handcoded/">{{<picture name="b/handcoded" alt="100% hand-coded HTML." width="162" height="62" class="pix">}}</a> <a href="https://web.archive.org/web/20230607005614/http://www.ermel.org/handcoded/">{{<picture name="b/handcoded" alt="100% hand-coded HTML." width="162" height="62" class="pix">}}</a>
{{<picture name="b/cookie_free" alt="This site is certified 100% cookie free!" width="162" height="62" class="pix">}} {{<picture name="b/cookie_free" alt="This site is certified 100% cookie free!" width="162" height="62" class="pix">}}
{{<picture name="b/javascript-zero" alt="Proudly zero JavaScript!" width="162" height="62" class="pix">}} {{<picture name="b/javascript-zero" alt="Proudly zero JavaScript!" width="162" height="62" class="pix">}}
{{<picture name="b/web11" alt="Web 1.1." width="162" height="62" class="pix">}} {{<picture name="b/web11" alt="Web 1.1." width="162" height="62" class="pix">}}
{{<picture name="b/is_it_slow_say_so" alt="Is it slow? Say so!" width="162" height="62" class="pix">}} {{<picture name="b/is_it_slow_say_so" alt="Is it slow? Say so!" width="162" height="62" class="pix">}}
{{<picture name="b/dark-mode" alt="Made for Dark Mode!" width="162" height="62" class="pix">}} {{<picture name="b/dark-mode" alt="Made for Dark Mode!" width="162" height="62" class="pix">}}
<a href="https://www.w3.org/developers/tools/">{{<picture name="b/heartvalidator" alt="I heart validator." width="162" height="62" class="pix">}}</a>
<a href="https://git.sr.ht/~seirdy/seirdy.one/tree/master/item/linter-configs/vnu_filter.jq">{{<picture name="b/html5" alt="W3C valid HTML5." width="162" height="62" class="pix">}}</a> <a href="https://git.sr.ht/~seirdy/seirdy.one/tree/master/item/linter-configs/vnu_filter.jq">{{<picture name="b/html5" alt="W3C valid HTML5." width="162" height="62" class="pix">}}</a>
<a href="https://git.sr.ht/~seirdy/seirdy.one/tree/master/item/linter-configs/vnu_filter.jq">{{<picture name="b/valid-css" alt="W3C valid CSS." width="162" height="62" class="pix">}}</a> <a href="https://git.sr.ht/~seirdy/seirdy.one/tree/master/item/linter-configs/vnu_filter.jq">{{<picture name="b/valid-css" alt="W3C valid CSS." width="162" height="62" class="pix">}}</a>
<a href="https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Fseirdy.one%2Fatom.xml">{{<picture name="b/valid-atom" alt="Valid Atom feed." width="162" height="62" class="pix">}}</a> <a href="https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Fseirdy.one%2Fatom.xml">{{<picture name="b/valid-atom" alt="Valid Atom feed." width="162" height="62" class="pix">}}</a>
@ -59,12 +59,11 @@ Read more about the design of this site in my [site design standards page]({{<re
<a href="https://creativecommons.org/licenses/by-sa/4.0/">{{<picture name="b/cc-by-sa" alt="Creative Commons BY-SA license." width="162" height="62" class="pix">}}</a> <a href="https://creativecommons.org/licenses/by-sa/4.0/">{{<picture name="b/cc-by-sa" alt="Creative Commons BY-SA license." width="162" height="62" class="pix">}}</a>
<a href="https://www.gnu.org/licenses/agpl-3.0.en.html">{{<picture name="b/agplv3" alt="AGPL v3: Free Software. It stands for GNU Affero General Public Licence, version 3." width="162" height="62" class="pix">}}</a> <a href="https://www.gnu.org/licenses/agpl-3.0.en.html">{{<picture name="b/agplv3" alt="AGPL v3: Free Software. It stands for GNU Affero General Public Licence, version 3." width="162" height="62" class="pix">}}</a>
### Software that seirdy.one runs on ### Software used by seirdy.one
<a href="https://fedoraproject.org/">{{<picture name="b/fedora" alt="Powered by Fedora™." width="162" height="62" class="pix">}}</a> <a href="https://fedoraproject.org/">{{<picture name="b/fedora" alt="Powered by Fedora™." width="162" height="62" class="pix">}}</a>
<a href="https://nginx.org/">{{<picture name="b/nginx" alt="Nginx powered." width="162" height="62" class="pix">}}</a> <a href="https://nginx.org/">{{<picture name="b/nginx" alt="Nginx powered." width="162" height="62" class="pix">}}</a>
<a href="https://llvm.org/">{{<picture name="b/llvm" alt="LLVM compiler infrastructure." width="162" height="62" class="pix">}}</a> <a href="https://llvm.org/">{{<picture name="b/llvm" alt="LLVM compiler infrastructure." width="162" height="62" class="pix">}}</a>
<a href="https://www.torproject.org/">{{<picture name="b/tor" alt="The Tor Project." width="162" height="62" class="pix">}}</a>
### Notes on "About this site" badges ### Notes on "About this site" badges
@ -150,6 +149,5 @@ The Yesterweb is winding down its social activity after admin burnout, but it re
## Other ## Other
{{<picture name="b/lain" alt="Close-up of Lain Iwakuras eyes with a static filter." width="162" height="62" class="pix">}} {{<picture name="b/lain" alt="Close-up of Lain Iwakuras eyes with a static filter." width="162" height="62" class="pix">}}
<a href="https://ooo.eeeee.ooo/">{{<picture name="b/miku" alt="The text “This site is Miku-approved” next to Hatsune Miku." width="162" height="62" class="pix">}}</a>
{{<picture name="b/graphicdesign" alt="The words “graphic design is my passion” next to a bad drawing of a frog. Sarcasm implied." width="162" height="62" class="pix">}} {{<picture name="b/graphicdesign" alt="The words “graphic design is my passion” next to a bad drawing of a frog. Sarcasm implied." width="162" height="62" class="pix">}}
{{<picture name="b/ilovehorror" alt="I heart horror." width="162" height="62" class="pix">}} {{<picture name="b/ilovehorror" alt="I heart horror." width="162" height="62" class="pix">}}

View file

@ -2,88 +2,52 @@ User-agent: *
Disallow: /noindex/ Disallow: /noindex/
Disallow: /misc/ Disallow: /misc/
# I opt out of online advertising so malware that injects ads on my site won't # I opt out of online advertising so malware that injects ads on my site won't get paid.
# get paid. You should do the same. my ads.txt file contains a standard # You should do the same. my ads.txt file contains a standard placeholder to forbid any
# placeholder to forbid any compliant ad networks from paying for ad placement # compliant ad networks from paying for ad placement on my domain.
# on my domain.
User-Agent: Adsbot User-Agent: Adsbot
Disallow: / Disallow: /
Allow: /ads.txt Allow: /ads.txt
Allow: /app-ads.txt Allow: /app-ads.txt
# Enabling our crawler to access your site offers several significant benefits
# to you as a publisher. By allowing us access, you enable the maximum number
# of advertisers to confidently purchase advertising space on your pages. Our
# comprehensive data insights help advertisers understand the suitability and
# context of your content, ensuring that their ads align with your audience's
# interests and needs. This alignment leads to improved user experiences,
# increased engagement, and ultimately, higher revenue potential for your
# publication. (https://www.peer39.com/crawler-notice)
# --> fuck off.
User-agent: peer39_crawler
User-Agent: peer39_crawler/1.0
Disallow: /
## IP-violation scanners ## ## IP-violation scanners ##
# The next three are borrowed from https://www.videolan.org/robots.txt # The next three are borrowed from https://www.videolan.org/robots.txt
# > This robot collects content from the Internet for the sole purpose of # # > This robot collects content from the Internet for the sole purpose of # helping educational institutions prevent plagiarism. [...] we compare student papers against the content we find on the Internet to see if we # can find similarities. (http://www.turnitin.com/robot/crawlerinfo.html)
# helping educational institutions prevent plagiarism. [...] we compare student
# papers against the content we find on the Internet to see if we # can find
# similarities. (http://www.turnitin.com/robot/crawlerinfo.html)
# --> fuck off. # --> fuck off.
User-Agent: TurnitinBot User-Agent: TurnitinBot
Disallow: / Disallow: /
# > NameProtect engages in crawling activity in search of a wide range of brand # > NameProtect engages in crawling activity in search of a wide range of brand and other intellectual property violations that may be of interest to our clients. (http://www.nameprotect.com/botinfo.html)
# and other intellectual property violations that may be of interest to our
# clients. (http://www.nameprotect.com/botinfo.html)
# --> fuck off. # --> fuck off.
User-Agent: NPBot User-Agent: NPBot
Disallow: / Disallow: /
# iThenticate is a new service we have developed to combat the piracy of # iThenticate is a new service we have developed to combat the piracy of intellectual property and ensure the originality of written work for# publishers, non-profit agencies, corporations, and newspapers. (http://www.slysearch.com/)
# intellectual property and ensure the originality of written work for#
# publishers, non-profit agencies, corporations, and newspapers.
# (http://www.slysearch.com/)
# --> fuck off. # --> fuck off.
User-Agent: SlySearch User-Agent: SlySearch
Disallow: / Disallow: /
# BLEXBot assists internet marketers to get information on the link structure # BLEXBot assists internet marketers to get information on the link structure of sites and their interlinking on the web, to avoid any technical and possible legal issues and improve overall online experience. (http://webmeup-crawler.com/)
# of sites and their interlinking on the web, to avoid any technical and
# possible legal issues and improve overall online experience.
# (http://webmeup-crawler.com/)
# --> fuck off. # --> fuck off.
User-Agent: BLEXBot User-Agent: BLEXBot
Disallow: / Disallow: /
# Providing Intellectual Property professionals with superior brand protection # Providing Intellectual Property professionals with superior brand protection services by artfully merging the latest technology with expert analysis. (https://www.checkmarknetwork.com/spider.html/)
# services by artfully merging the latest technology with expert analysis.
# (https://www.checkmarknetwork.com/spider.html/)
# "The Internet is just way to big to effectively police alone." (ACTUAL quote) # "The Internet is just way to big to effectively police alone." (ACTUAL quote)
# --> fuck off. # --> fuck off.
User-agent: CheckMarkNetwork/1.0 (+https://www.checkmarknetwork.com/spider.html) User-agent: CheckMarkNetwork/1.0 (+https://www.checkmarknetwork.com/spider.html)
Disallow: / Disallow: /
# Stop trademark violations and affiliate non-compliance in paid search. # Stop trademark violations and affiliate non-compliance in paid search. Automatically monitor your partner and affiliates online marketing to protect yourself from harmful brand violations and regulatory risks. We regularly crawl websites on behalf of our clients to ensure content compliance with brand and regulatory guidelines. (https://www.brandverity.com/why-is-brandverity-visiting-me)
# Automatically monitor your partner and affiliates online marketing to
# protect yourself from harmful brand violations and regulatory risks. We
# regularly crawl websites on behalf of our clients to ensure content
# compliance with brand and regulatory guidelines.
# (https://www.brandverity.com/why-is-brandverity-visiting-me)
# --> fuck off. # --> fuck off.
User-agent: BrandVerity/1.0 User-agent: BrandVerity/1.0
Disallow: / Disallow: /
## Misc. icky stuff ## ## Misc. icky stuff ##
# Pipl assembles online identity information from multiple independent sources # Pipl assembles online identity information from multiple independent sources to create the most complete picture of a digital identity and connect it to real people and their offline identity records. When all the fragments of online identity data are collected, connected, and corroborated, the result is a more trustworthy identity.
# to create the most complete picture of a digital identity and connect it to
# real people and their offline identity records. When all the fragments of
# online identity data are collected, connected, and corroborated, the result
# is a more trustworthy identity.
# --> fuck off. # --> fuck off.
User-agent: PiplBot User-agent: PiplBot
Disallow: / Disallow: /
@ -92,6 +56,7 @@ Disallow: /
# Eat shit, OpenAI. # Eat shit, OpenAI.
User-agent: ChatGPT-User User-agent: ChatGPT-User
Disallow: /
User-agent: GPTBot User-agent: GPTBot
Disallow: / Disallow: /
@ -103,15 +68,11 @@ Disallow: /
# There isn't any public documentation for this AFAICT. # There isn't any public documentation for this AFAICT.
# Reuters thinks this works so I might as well give it a shot. # Reuters thinks this works so I might as well give it a shot.
User-agent: anthropic-ai User-agent: anthropic-ai
Disallow: /
User-agent: Claude-Web User-agent: Claude-Web
Disallow: / Disallow: /
# Extremely aggressive crawling with no documentation. people had to email the
# company about this for robots.txt guidance.
User-agent: ClaudeBot
Disallow: /
# FacebookBot crawls public web pages to improve language models for our speech # FacebookBot crawls public web pages to improve language models for our speech recognition technology.
# recognition technology.
# <https://developers.facebook.com/docs/sharing/bot/?_fb_noscript=1> # <https://developers.facebook.com/docs/sharing/bot/?_fb_noscript=1>
User-Agent: FacebookBot User-Agent: FacebookBot
Disallow: / Disallow: /
@ -127,9 +88,7 @@ Disallow: /
# I'm not familiar enough with Omgili to make a call here. # I'm not familiar enough with Omgili to make a call here.
# In the long run, my embedded robots meta-tags and headers could cover gen-AI # In the long run, my embedded robots meta-tags and headers could cover gen-AI
# I don't block cohere-ai or Perplexitybot: they don't appear to actually # I don't block cohere-ai or Perplexitybot: they don't appear to actually scrape data for LLM training purposes. The crawling powers search engines with integrated pre-trained LLMs.
# scrape data for LLM training purposes. The crawling powers search engines
# with integrated pre-trained LLMs.
# TODO: investigate whether YouBot scrapes to train its own in-house LLM. # TODO: investigate whether YouBot scrapes to train its own in-house LLM.
Sitemap: https://seirdy.one/sitemap.xml Sitemap: https://seirdy.one/sitemap.xml