mirror of
https://git.sr.ht/~seirdy/seirdy.one
synced 2024-11-24 05:02:10 +00:00
Compare commits
No commits in common. "e4592387a3f8df7bc88b7bdb7b9e285e5055dfec" and "c39d1cd6161101e6cbe42b7f08634abca9a052d9" have entirely different histories.
e4592387a3
...
c39d1cd616
2 changed files with 0 additions and 40 deletions
|
@ -1,17 +0,0 @@
|
||||||
---
|
|
||||||
title: "Opting out of LLM indexing"
|
|
||||||
date: 2023-04-21T22:40:04-07:00
|
|
||||||
replyURI: "https://chriscoyier.net/2023/04/21/the-secret-list-of-websites/"
|
|
||||||
replyTitle: "“the secret list of websites”"
|
|
||||||
replyType: "BlogPosting"
|
|
||||||
replyAuthor: "Chris Coyier"
|
|
||||||
replyAuthorURI: "https://chriscoyier.net/"
|
|
||||||
syndicatedCopies:
|
|
||||||
- title: 'The Fediverse'
|
|
||||||
url: 'https://pleroma.envs.net/notice/AUttq9kpOmeYZDHRTc'
|
|
||||||
---
|
|
||||||
I added an entry to [my robots.txt](https://seirdy.one/robots.txt) to block ChatGPT's crawler, but blocking crawling isn't the same as blocking indexing; it looks like Google chose to use the [Common Crawl](https://commoncrawl.org/) for this and sidestep the need to do crawling of its own. That's a strange decision; after all, Google has a much larger proprietary index at its disposal.
|
|
||||||
|
|
||||||
A "secret list of websites" was an ironic choice of words, given that this originates from the Common Crawl. It's sad to see Common Crawl (ab)used for this, but I suppose we should have seen it coming.
|
|
||||||
|
|
||||||
I know Google tells authors how to qualify/disqualify from rich results, but I don't see any docs for opting a site out of LLM/Bard training.
|
|
|
@ -1,23 +0,0 @@
|
||||||
---
|
|
||||||
title: "Re: automated workflows for websites"
|
|
||||||
date: 2023-04-20T14:36:16-07:00
|
|
||||||
replyURI: "https://blog.lea.lgbt/posts/2023-04-20-automated-workflows-for-websites/"
|
|
||||||
replyTitle: "Automated workflows for websites"
|
|
||||||
replyType: "BlogPosting"
|
|
||||||
replyAuthor: "Lea Rosema"
|
|
||||||
replyAuthorURI: "https://blog.lea.lgbt/"
|
|
||||||
syndicatedCopies:
|
|
||||||
- title: 'The Fediverse'
|
|
||||||
url: 'https://pleroma.envs.net/notice/AUr8PE6SK6jXl3XaE4'
|
|
||||||
---
|
|
||||||
This is so similar to [my setup!]({{<relref "/meta/_index.md">}}) I run Stylelint and v.Nu too. I [send v.Nu output through a JQ filter](https://git.sr.ht/~seirdy/seirdy.one/tree/master/item/linter-configs/vnu_filter.jq) to filter out false-positives (after reporting them upstream); you might eventually do something similar, since there are a _lot_ of these. Your blog post reminds me that I need something better than regex substitutions for customizing footnote and section links; Hugo's parallel nature prevents it from doing post-processing of fully-assembled pages. Other tools I use:
|
|
||||||
|
|
||||||
- `xmllint` to validate that the markup is well-formed XHTML5 syntax; it runs much more quickly than v.Nu and does light auto-formatting, but is also more limited.
|
|
||||||
|
|
||||||
- There's also [a W3C feed validator](https://github.com/w3c/feedvalidator) written in Python worth checking out; I send my Atom feeds through that.
|
|
||||||
|
|
||||||
- I run `axe-core`, IBM Equal Access checker, and Webhint on every page with headless browsers.
|
|
||||||
|
|
||||||
- In the future: I'll need to figure out a good workflow for easily validating JSON according to a schema, and adding some Microformats + Microdata validation too (maybe using Schemara?).
|
|
||||||
|
|
||||||
The whole thing takes several minutes to run, so I don't run it every commit; just building my site (no linting or validation) requires only a tarball with some statically-linked binaries. It's more in line with the ["built to last"](https://jeffhuang.com/designed_to_last/) philosophy; I'm curious if you have any thoughts about it.
|
|
Loading…
Reference in a new issue