mirror of
https://git.sr.ht/~seirdy/seirdy.one
synced 2024-11-23 21:02:09 +00:00
Compare commits
4 commits
c39d1cd616
...
e4592387a3
Author | SHA1 | Date | |
---|---|---|---|
|
e4592387a3 | ||
|
2a8d60b896 | ||
|
fb66d67114 | ||
|
2d230e21db |
2 changed files with 40 additions and 0 deletions
17
content/notes/opting-out-of-llm-indexing.md
Normal file
17
content/notes/opting-out-of-llm-indexing.md
Normal file
|
@ -0,0 +1,17 @@
|
|||
---
|
||||
title: "Opting out of LLM indexing"
|
||||
date: 2023-04-21T22:40:04-07:00
|
||||
replyURI: "https://chriscoyier.net/2023/04/21/the-secret-list-of-websites/"
|
||||
replyTitle: "“the secret list of websites”"
|
||||
replyType: "BlogPosting"
|
||||
replyAuthor: "Chris Coyier"
|
||||
replyAuthorURI: "https://chriscoyier.net/"
|
||||
syndicatedCopies:
|
||||
- title: 'The Fediverse'
|
||||
url: 'https://pleroma.envs.net/notice/AUttq9kpOmeYZDHRTc'
|
||||
---
|
||||
I added an entry to [my robots.txt](https://seirdy.one/robots.txt) to block ChatGPT's crawler, but blocking crawling isn't the same as blocking indexing; it looks like Google chose to use the [Common Crawl](https://commoncrawl.org/) for this and sidestep the need to do crawling of its own. That's a strange decision; after all, Google has a much larger proprietary index at its disposal.
|
||||
|
||||
A "secret list of websites" was an ironic choice of words, given that this originates from the Common Crawl. It's sad to see Common Crawl (ab)used for this, but I suppose we should have seen it coming.
|
||||
|
||||
I know Google tells authors how to qualify/disqualify from rich results, but I don't see any docs for opting a site out of LLM/Bard training.
|
23
content/notes/re-automated-workflows-for-websites.md
Normal file
23
content/notes/re-automated-workflows-for-websites.md
Normal file
|
@ -0,0 +1,23 @@
|
|||
---
|
||||
title: "Re: automated workflows for websites"
|
||||
date: 2023-04-20T14:36:16-07:00
|
||||
replyURI: "https://blog.lea.lgbt/posts/2023-04-20-automated-workflows-for-websites/"
|
||||
replyTitle: "Automated workflows for websites"
|
||||
replyType: "BlogPosting"
|
||||
replyAuthor: "Lea Rosema"
|
||||
replyAuthorURI: "https://blog.lea.lgbt/"
|
||||
syndicatedCopies:
|
||||
- title: 'The Fediverse'
|
||||
url: 'https://pleroma.envs.net/notice/AUr8PE6SK6jXl3XaE4'
|
||||
---
|
||||
This is so similar to [my setup!]({{<relref "/meta/_index.md">}}) I run Stylelint and v.Nu too. I [send v.Nu output through a JQ filter](https://git.sr.ht/~seirdy/seirdy.one/tree/master/item/linter-configs/vnu_filter.jq) to filter out false-positives (after reporting them upstream); you might eventually do something similar, since there are a _lot_ of these. Your blog post reminds me that I need something better than regex substitutions for customizing footnote and section links; Hugo's parallel nature prevents it from doing post-processing of fully-assembled pages. Other tools I use:
|
||||
|
||||
- `xmllint` to validate that the markup is well-formed XHTML5 syntax; it runs much more quickly than v.Nu and does light auto-formatting, but is also more limited.
|
||||
|
||||
- There's also [a W3C feed validator](https://github.com/w3c/feedvalidator) written in Python worth checking out; I send my Atom feeds through that.
|
||||
|
||||
- I run `axe-core`, IBM Equal Access checker, and Webhint on every page with headless browsers.
|
||||
|
||||
- In the future: I'll need to figure out a good workflow for easily validating JSON according to a schema, and adding some Microformats + Microdata validation too (maybe using Schemara?).
|
||||
|
||||
The whole thing takes several minutes to run, so I don't run it every commit; just building my site (no linting or validation) requires only a tarball with some statically-linked binaries. It's more in line with the ["built to last"](https://jeffhuang.com/designed_to_last/) philosophy; I'm curious if you have any thoughts about it.
|
Loading…
Reference in a new issue