1
0
Fork 0
mirror of https://git.sr.ht/~seirdy/seirdy.one synced 2024-11-27 14:12:09 +00:00

Compare commits

...

4 commits

Author SHA1 Message Date
Rohan Kumar
e4592387a3
syndicate 2023-04-21 22:45:43 -07:00
Rohan Kumar
2a8d60b896
New note: opting out of LLM indexing 2023-04-21 22:40:04 -07:00
Rohan Kumar
fb66d67114
syndicate 2023-04-20 14:41:17 -07:00
Rohan Kumar
2d230e21db
New note: re: automated workflows for websites 2023-04-20 14:36:16 -07:00
2 changed files with 40 additions and 0 deletions

View file

@ -0,0 +1,17 @@
---
title: "Opting out of LLM indexing"
date: 2023-04-21T22:40:04-07:00
replyURI: "https://chriscoyier.net/2023/04/21/the-secret-list-of-websites/"
replyTitle: "“the secret list of websites”"
replyType: "BlogPosting"
replyAuthor: "Chris Coyier"
replyAuthorURI: "https://chriscoyier.net/"
syndicatedCopies:
- title: 'The Fediverse'
url: 'https://pleroma.envs.net/notice/AUttq9kpOmeYZDHRTc'
---
I added an entry to [my robots.txt](https://seirdy.one/robots.txt) to block ChatGPT's crawler, but blocking crawling isn't the same as blocking indexing; it looks like Google chose to use the [Common Crawl](https://commoncrawl.org/) for this and sidestep the need to do crawling of its own. That's a strange decision; after all, Google has a much larger proprietary index at its disposal.
A "secret list of websites" was an ironic choice of words, given that this originates from the Common Crawl. It's sad to see Common Crawl (ab)used for this, but I suppose we should have seen it coming.
I know Google tells authors how to qualify/disqualify from rich results, but I don't see any docs for opting a site out of LLM/Bard training.

View file

@ -0,0 +1,23 @@
---
title: "Re: automated workflows for websites"
date: 2023-04-20T14:36:16-07:00
replyURI: "https://blog.lea.lgbt/posts/2023-04-20-automated-workflows-for-websites/"
replyTitle: "Automated workflows for websites"
replyType: "BlogPosting"
replyAuthor: "Lea Rosema"
replyAuthorURI: "https://blog.lea.lgbt/"
syndicatedCopies:
- title: 'The Fediverse'
url: 'https://pleroma.envs.net/notice/AUr8PE6SK6jXl3XaE4'
---
This is so similar to [my setup!]({{<relref "/meta/_index.md">}}) I run Stylelint and v.Nu too. I [send v.Nu output through a JQ filter](https://git.sr.ht/~seirdy/seirdy.one/tree/master/item/linter-configs/vnu_filter.jq) to filter out false-positives (after reporting them upstream); you might eventually do something similar, since there are a _lot_ of these. Your blog post reminds me that I need something better than regex substitutions for customizing footnote and section links; Hugo's parallel nature prevents it from doing post-processing of fully-assembled pages. Other tools I use:
- `xmllint` to validate that the markup is well-formed XHTML5 syntax; it runs much more quickly than v.Nu and does light auto-formatting, but is also more limited.
- There's also [a W3C feed validator](https://github.com/w3c/feedvalidator) written in Python worth checking out; I send my Atom feeds through that.
- I run `axe-core`, IBM Equal Access checker, and Webhint on every page with headless browsers.
- In the future: I'll need to figure out a good workflow for easily validating JSON according to a schema, and adding some Microformats + Microdata validation too (maybe using Schemara?).
The whole thing takes several minutes to run, so I don't run it every commit; just building my site (no linting or validation) requires only a tarball with some statically-linked binaries. It's more in line with the ["built to last"](https://jeffhuang.com/designed_to_last/) philosophy; I'm curious if you have any thoughts about it.