1
0
Fork 0
mirror of https://git.sr.ht/~seirdy/seirdy.one synced 2024-11-23 21:02:09 +00:00

Compare commits

...

3 commits

Author SHA1 Message Date
Rohan Kumar
0b64e1cf45
New note: user agents set the terms 2022-08-12 00:28:17 -07:00
Rohan Kumar
f2df224e6c
New engine: Marlo 2022-08-11 21:30:22 -07:00
Rohan Kumar
36c9092073
Fix bad links 2022-08-11 21:30:20 -07:00
6 changed files with 35 additions and 7 deletions

View file

@ -0,0 +1,17 @@
---
title: "User agents set the terms"
date: 2022-08-12T00:27:26-07:00
replyURI: "https://lobste.rs/s/dusuzt/let_websites_framebust_out_native_apps#c_dqolcq"
replyTitle: "I have the freedom to set the terms on which I will offer access to a website of mine."
replyType: "DiscussionForumPosting"
replyAuthor: "James Bennet"
replyAuthorURI: "https://www.b-list.org/about/"
---
The Web is not built around advance informed consent; there's no agreement to terms before downloading a public file (besides basic protocol negotiations). This is one reason why "by using this site, you agree to our cookies, privacy policy, kidney harvesting, etc" notices won't fly under the GDPR.
A website admin can't set terms for downloading a linked document; the user-agent just makes a request and the server works with that data to deny or accept it. There's no obligation for the UA to be honest or accurate.
Ultimately, nobody is forcing you to run a Web server; however, plenty of people have to use the Web. Respect for the UA is part of the agreement you make when joining a UA-centric network.
Should you disagree with the precedent set by the HTML Living Standard, nearly every Web Accessibility Initiative standard (users must be able to override and replace stylesheets, colors, distracting elements), the exceptions to e.g. the Content Security Policy in Webappsec standards to allow UA-initiated script injection, etc.: you're always free to build your own alternative to the Web with your own server-centric standards.

View file

@ -155,11 +155,13 @@ Results from these search engines dont seem at all useful.
* Crawlson: young, slow. In this category because its index has a cap of 10 URLs per domain. I initially discovered Crawlson in the seirdy.one access logs.
* Anoox: Results are few and irrelevant; fails to find any results for basic terms. Allows site submission. It's also a lightweight social network and claims to be powered by its users, letting members vote on listings to alter rankings.
* Yioop!: A FLOSS search engine that boasts a very impressive feature-set: it can parse sitemaps, feeds, and a variety of markup formats; it can import pre-curated data in forms such as access logs, Usenet posts, and WARC archives; it also supports feed-based news search. Despite the impressive feature set, Yioop's results are few and irrelevant due to its small index. It allows submitting sites for crawling. Like Meorca, Yioop has social features such as blogs, wikis, and a chat bot API.
* Marlo: Another FLOSS engine, written in Haskell. Has a small index that's good enough for surfing broad topics, but not good enough for specific research.
=> https://crawlson.com Crawlson
=> https://www.anoox.com/ Anoox
=> https://archive.is/oVAre Plumb CPO
=> https://www.yioop.com Yioop!
=> https://marlo.sandymaguire.me/ Marlo
### Semi-independent indexes

View file

@ -195,6 +195,9 @@ Scopia
[Yioop!](https://www.yioop.com)
: A FLOSS search engine that boasts a very impressive [feature-set](https://www.seekquarry.com/): it can parse sitemaps, feeds, and a variety of markup formats; it can import pre-curated data in forms such as access logs, Usenet posts, and WARC archives; it also supports feed-based news search. Despite the impressive feature set, Yioop's results are few and irrelevant due to its small index. It allows submitting sites for crawling. Like Meorca, Yioop has social features such as blogs, wikis, and a chat bot API.
[Marlo](https://marlo.sandymaguire.me/)
: Another FLOSS engine: [Marlo is written in Haskell]. Has a small index that's good enough for surfing broad topics, but not good enough for specific research.
### Semi-independent indexes
Engines in this category fall back to GBY when their own indexes don't have enough results. As their own indexes grow, some claim that this should happen less often.
@ -261,7 +264,7 @@ Quor
[Semantic Scholar](https://www.semanticscholar.org/)
: A search engine by the Allen Institute for AI focused on academic PDFs, with a couple hundred million papers indexed. Discovered in my access logs.
[Bonzamate](<https://bonzamate.com.au/>)
[Bonzamate](https://bonzamate.com.au/)
: A search engine specifically for Australian websites. Boyter wrote [an interesting blog post about Bonzamate](https://boyter.org/posts/abusing-aws-to-make-a-search-engine/).
[searchcode](https://searchcode.com/)

View file

@ -425,7 +425,7 @@ Long pages with many DOM nodes may benefit from CSS containment, a more recently
Leveraging containment and `content-visibility` is a progressive enhancement, so there aren't any serious implications for older browsers. I use `content-visibility` to defer rendering off-screen entries in my archives. Doing so allows me to serve long archive pages instead of resorting to pagination, with page-length limited only by download size. In my tests using Lighthouse with Chromium Devtools' simulated CPU throttling,[^11] this article rendered faster _with_ containment-enabled CSS than without any custom stylesheets at all.
Using containment for content at the end of the page is relatively safe. Using it for content earlier in the page risks introducing [layout shifts](#layout-shifts). Eliminate the layout shifts by calculating a value for the `contain-intrinsic-size` property. {{<mention-work itemtype="TechArticle">}}{{<cited-work url="https://www.terluinwebdesign.nl/en/css/calculating-contain-intrinsic-size-for-content-visibility/" name="Calculating 'contain-intrinsic-size' for 'content-visibility'" extraName="headline">}}, by {{<indieweb-person first-name="Thijs" last-name="Terluin" url="https://www.terluinwebdesign.nl/en/about-us/thijs-terluin/" org="Teluin Webdesign" org-url="https://www.terluinwebdesign.nl/en/" itemprop="author">}}{{</mention-work>}}, is a comprehensive guide to calculating intrinsic size values.
Using containment for content at the end of the page is relatively safe. Using it for content earlier in the page risks introducing [layout shifts](#layout-shifts). Eliminate the layout shifts by calculating a value for the `contain-intrinsic-size` property. {{<mention-work itemtype="TechArticle">}}{{<cited-work url="https://www.terluinwebdesign.nl/en/css/calculating-contain-intrinsic-size-for-content-visibility/" name="Calculating 'contain-intrinsic-size' for 'content-visibility'" extraName="headline">}}, by {{<indieweb-person first-name="Thijs" last-name="Terluin" url="https://www.terluinwebdesign.nl/en/author/thijs-terluin/" org="Teluin Webdesign" org-url="https://www.terluinwebdesign.nl/en/" itemprop="author">}}{{</mention-work>}}, is a comprehensive guide to calculating intrinsic size values.
### Performance of assistive technologies

View file

@ -1,4 +1,4 @@
{{- $wbmLinks := (slice "https://si3t.ch/log/2021-04-18-entetes-floc.html" "https://xmpp.org/2021/02/newsletter-02-feburary/" "https://gurlic.com/technology/post/393626430212145157" "https://gurlic.com/technology/post/343249858599059461" "https://www.librepunk.club/@penryn/108411423190214816" "https://benign.town/@josias/108457015755310198" "http://www.tuxmachines.org/node/148146") -}}
{{- $wbmLinks := (slice "https://si3t.ch/log/2021-04-18-entetes-floc.html" "https://xmpp.org/2021/02/newsletter-02-feburary/" "https://gurlic.com/technology/post/393626430212145157" "https://gurlic.com/technology/post/343249858599059461" "https://www.librepunk.club/@penryn/108411423190214816" "https://benign.town/@josias/108457015755310198" "http://www.tuxmachines.org/node/148146" "https://i.reddit.com/r/web_design/comments/k0dmpj/an_opinionated_list_of_best_practices_for_textual/gdmxy4u/" "https://bbbhltz.space/posts/thoughts-on-tech-feb2021/") -}}
<hr />
<section aria-labelledby="webmentions">
<h2 id="webmentions" tabindex="-1">Web&#173;mentions</h2>

View file

@ -36,13 +36,19 @@ IgnoreURLs:
- "https://seirdy.one/webmentions/"
- "http://creativecommons.org/ns"
- "https://seirdy.one/search/"
- "https://www.reddit.com/user/Seirdy" # reddit doesn't like htmltest
# - "https://i.reddit.com/r/web_design/comments/k0dmpj/an_opinionated_list_of_best_practices_for_textual/gdmxy4u/"
- "https://i.reddit.com"
- "https://fediring.net/(previous|next)" # redir
- "https://forum.palemoon.org/" # manual check: blocks crawlers
# - "https://forum.palemoon.org/viewtopic.php?f=1&t=25473" # manual check: blocks crawlers
- "https://forum.palemoon.org/viewtopic.php"
- "https://queue.acm.org/detail" # manual check: blocks crawlers
- "https://www.geocities.ws/jaup/jaup.htm" # manual check: blocks crawlers
- "https://plausible.io/blog/google-floc#" # manual check: I block this domain
- "https://twitter.com/" # manual check: 404 for some reason, using curl works fine.
- "https://bugs.debian.org/cgi-bin/bugreport.cgi" # manual check: 400 for some reason, using curl works fine.
- "https://forum.kuketz-blog.de/" # manual check: blocks crawlers
- "https://web.archive.org/web/0/http" # the wayback machine.
# - "https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=830173" # manual check: 400 for some reason, using curl works fine.
- "https://bugs.debian.org/cgi-bin/bugreport.cgi"
# - "https://forum.kuketz-blog.de/viewtopic.php?p=78202" # manual check: blocks crawlers
- "https://forum.kuketz-blog.de/viewtopic.php"
- "https://web.archive.org/web/0/http" # the wayback machine itself.
OutputDir: "linter-configs/htmltest"