1
0
Fork 0
mirror of https://git.sr.ht/~seirdy/seirdy.one synced 2024-11-23 21:02:09 +00:00

New post: Resilient Git, part 1

This commit is contained in:
rohan kumar 2020-11-18 18:25:33 -08:00
parent e614497a36
commit 9fb0545ebc
No known key found for this signature in database
GPG key ID: 1E892DB2A5F84479
4 changed files with 179 additions and 12 deletions

View file

@ -1,20 +1,25 @@
Recently, GitHub re-instated the youtube-dl git repository after following a takedown request by the RIAA under the DMCA. Recently, GitHub re-instated the youtube-dl git repository after following a takedown request by the RIAA under the DMCA.
=> https://github.com/ytdl-org/youtube-dl https://github.com/ytdl-org/youtube-dl => https://github.com/ytdl-org/youtube-dl ytdl-org/youtube-dl
Shortly after the takedown, many members of the community showed great interest in "decentralizing git" and setting up a more resilient forge. What many of these people fail to understand is that the Git-based project setup is designed to support decentralization by being fully distributed. Shortly after the takedown, many members of the community showed great interest in "decentralizing git" and setting up a more resilient forge. What many of these people fail to understand is that the Git-based project setup is designed to support decentralization by being fully distributed.
Following the drama, I'm putting together a multi-part guide on how to leverage the decentralized, distributed nature of git and its ecosystem. I made every effort to include all parts of a typical project. Following the drama, I'm putting together a multi-part guide on how to leverage the decentralized, distributed nature of git and its ecosystem. I made every effort to include all parts of a typical project.
I'll update this post as I add articles to the series. At the moment, I've planned to write the following articles: I'll update this post as I add articles to the series.
Articles in this series:
=> ../../../2020/11/18/git-workflow-1.gmi 1. Hydra Hosting
Articles yet to be written:
1. Repository hosting
2. Community feedback (issues, support, etc.) 2. Community feedback (issues, support, etc.)
3. Community contributions (patches) 3. Community contributions (patches)
4. CI/CD 4. CI/CD
5. Distribution 5. Distribution
The result of the workflows this series covers will be minimal dependence on outside parties; all members of the community will easily be able to get a copy of the software, its code, development history, issues, and patches offline on their machines. With the exception of CI/CD, the offline content will follow implementation-neutral open standards. Following open standards is the killer feature: nothing in this workflow depends on a specific platform (GitHub, GitLab, Gitea, Bitbucket, Docker, Nix, Jenkins, et cetera), almost eliminating your project's "bus factor". The result of the workflows this series covers will be minimal dependence on outside parties; all members of the community will easily be able to get a copy of the software, its code, development history, issues, and patches offline on their machines with implementation-neutral open standards. Following open standards is the killer feature: nothing in this workflow depends on a specific platform (GitHub, GitLab, Gitea, Bitbucket, Docker, Nix, Jenkins, etc.), almost eliminating your project's "bus factor".
Providing a way to get everything offline, in a format that won't go obsolete if a project dies, is the key to a resilient git workflow. Providing a way to get everything offline, in a format that won't go obsolete if a project dies, is the key to a resilient git workflow.
@ -34,4 +39,7 @@ A: "Difficult" is subjective. I recommend TRYING this before jumping to conclusi
Q: I'm not interested in trying anything new, no matter what the benefits are. Q: I'm not interested in trying anything new, no matter what the benefits are.
A: Ok, first of all, that wasn't a question. Second, this series isn't for you. You should not read this. I recommend doing literally anything else. A: First of all, that wasn't a question. Second, this series isn't for you. You should not read this. I recommend doing literally anything else.
=> ../../../2020/11/18/git-workflow-1.gmi Next: Resilient Git, Part 2: Hydra Hosting

View file

@ -23,7 +23,7 @@ include all parts of a typical project.
I'll update this post as I add articles to the series. At the moment, I've planned to I'll update this post as I add articles to the series. At the moment, I've planned to
write the following articles: write the following articles:
1. Repository hosting 1. [Hydra Hosting](../../../2020/11/18/git-workflow-1.html): repository hosting.
2. Community feedback (issues, support, etc.) 2. Community feedback (issues, support, etc.)
3. Community contributions (patches) 3. Community contributions (patches)
4. CI/CD 4. CI/CD
@ -32,11 +32,10 @@ write the following articles:
The result of the workflows this series covers will be minimal dependence on outside The result of the workflows this series covers will be minimal dependence on outside
parties; all members of the community will easily be able to get a copy of the parties; all members of the community will easily be able to get a copy of the
software, its code, development history, issues, and patches offline on their software, its code, development history, issues, and patches offline on their
machines. With the exception of CI/CD, the offline content will follow machines with implementation-neutral open standards. Following open standards is the
implementation-neutral open standards. Following open standards is the killer killer feature: nothing in this workflow depends on a specific platform (GitHub,
feature: nothing in this workflow depends on a specific platform (GitHub, GitLab, GitLab, Gitea, Bitbucket, Docker, Nix, Jenkins, etc.), almost eliminating your
Gitea, Bitbucket, Docker, Nix, Jenkins, et cetera), almost eliminating your project's project's "bus factor".
"bus factor".
Providing a way to get everything offline, in a format that won't go obsolete if a Providing a way to get everything offline, in a format that won't go obsolete if a
project dies, is the key to a resilient git workflow. project dies, is the key to a resilient git workflow.
@ -62,5 +61,7 @@ it).
Q: I'm not interested in trying anything new, no matter what the benefits are. Q: I'm not interested in trying anything new, no matter what the benefits are.
A: Ok, first of all, that wasn't a question. Second, this series isn't for you. You A: First of all, that wasn't a question. Second, this series isn't for you. You
should not read this. I recommend doing literally anything else. should not read this. I recommend doing literally anything else.
Next: Resilient Git, Part 2: [Hydra Hosting](../../../2020/11/18/git-workflow-1.html)

View file

@ -0,0 +1,64 @@
This is Part 1 of a series called Resilient Git:
=> ../../../2020/11/17/git-workflow-0.gmi Resilient Git
The most important part of a project is its code. Resilient projects should have their code in multiple places of equal weight so that work continues normally if a single remote goes down.
Many projects already do something similar: they have one "primary" remote and several mirrors. I'm suggesting something different. Treating a remote as a "mirror" implies that the remote is a second-class citizen. Mirrors are often out of date and aren't usually the preferred place to fetch code. Instead of setting up a primary remote and mirrors, I propose hydra hosting: setting up multiple primary remotes of equal status and pushing to/fetching from them in parallel.
Having multiple primary remotes of equal status might sound like a bad idea. If there are multiple remotes, how do people know which one to use? Where do they file bug reports, get code, or send patches? Do maintainers need to check multiple places?
No. Of course not. A good distributed system should automatically keep its nodes in sync to avoid the hassle of checking multiple places for updates.
## Adding remotes
This process should pretty straightforward. You can run git remote add (see git-remote(1)) or edit your repo's .git/config directly:
``` gitconfig
[remote "origin"]
url = git@git.sr.ht:~seirdy/seirdy.one
fetch = +refs/heads/*:refs/remotes/origin/*
[remote "gl_upstream"]
url = git@gitlab.com:seirdy/seirdy.one.git
fetch = +refs/heads/*:refs/remotes/gl_upstream/*
[remote "gh_upstream"]
url = git@github.com:seirdy/seirdy.one.git
fetch = +refs/heads/*:refs/remotes/gh_upstream/*
```
If that's too much work--a perfectly understandable complaint--automating the process is trivial. Here's an example from my dotfiles:
=> https://git.sr.ht/%7Eseirdy/dotfiles/tree/master/Executables/shell-scripts/bin/git-remote-setup git-remote-setup
## Seamless pushing and pulling
Having multiple remotes is fine, but pushing to and fetching from all of them can be slow. Two simple git aliases fix that:
``` gitconfig
[alias]
pushall = !git remote | grep -E 'origin|upstream' | xargs -L1 -P 0 git push --all --follow-tags
fetchall = !git remote | grep -E 'origin|upstream' | xargs -L1 -P 0 git fetch
```
Now, git pushall and git fetchall will push to and fetch from all remotes in parallel, respectively. Only one remote needs to be online for project members to keep working.
## Advertising remotes
I'd recommend advertising at least three remotes in your README: your personal favorite and two determined by popularity. Tell users to run git remote set-url to switch remote locations if one goes down.
## Before you ask...
Q: Why not use a cloud service to automate mirroring?
A: Such a setup depends upon the cloud service and a primary repo for that service to watch, defeating the purpose (resiliency). Hydra hosting automates this without introducing new tools, dependencies, or closed platforms to the mix.
Q: What about issues, patches, etc.?
A: Stay tuned for Part 3, coming soon to a weblog/gemlog near you™.
Q: Why did you call this "hydra hosting"?
A: It's a reference to the Hydra of Lerna from Greek Mythology, famous for keeping its brain in a nested RAID array to protect against disk failures and beheading. It could also be a reference to a fictional organization of the same name from Marvel Comics named after the Greek monster for similar reasons:
=> https://www.youtube.com/watch?v=assccoyvntI&t=37 Hail Hydra!
=> https://seirdy.one/misc/hail_hydra.webm (direct webm)

View file

@ -0,0 +1,94 @@
---
date: "2020-11-18T22:46:15-08:00"
outputs:
- html
- gemtext
tags:
- git
- foss
title: "Resilient Git, Part 1: Hydra Hosting"
---
This is Part 1 of a series called [Resilient
Git](../../../2020/11/17/git-workflow-0.html).
The most important part of a project is its code. Resilient projects should have
their code in multiple places of equal weight so that work continues normally if a
single remote goes down.
Many projects already do something similar: they have one "primary" remote and
several mirrors. I'm suggesting something different. Treating a remote as a "mirror"
implies that the remote is a second-class citizen. Mirrors are often out of date and
aren't usually the preferred place to fetch code. Instead of setting up a primary
remote and mirrors, I propose **hydra hosting:** setting up multiple primary remotes
of equal status and pushing to/fetching from them in parallel.
Having multiple primary remotes of equal status might sound like a bad idea. If there
are multiple remotes, how do people know which one to use? Where do they file bug
reports, get code, or send patches? Do maintainers need to check multiple places?
No. Of course not. A good distributed system should automatically keep its nodes in
sync to avoid the hassle of checking multiple places for updates.
## Adding remotes
This process should pretty straightforward. You can run `git remote add` (see
`git-remote(1)`) or edit your repo's `.git/config` directly:
``` gitconfig
[remote "origin"]
url = git@git.sr.ht:~seirdy/seirdy.one
fetch = +refs/heads/*:refs/remotes/origin/*
[remote "gl_upstream"]
url = git@gitlab.com:seirdy/seirdy.one.git
fetch = +refs/heads/*:refs/remotes/gl_upstream/*
[remote "gh_upstream"]
url = git@github.com:seirdy/seirdy.one.git
fetch = +refs/heads/*:refs/remotes/gh_upstream/*
```
If that's too much work--a perfectly understandable complaint--automating the process
is trivial. Here's [an example from my
dotfiles](https://git.sr.ht/~seirdy/dotfiles/tree/master/Executables/shell-scripts/bin/git-remote-setup).
## Seamless pushing and pulling
Having multiple remotes is fine, but pushing to and fetching from all of them can be
slow. Two simple git aliases fix that:
``` gitconfig
[alias]
pushall = !git remote | grep -E 'origin|upstream' | xargs -L1 -P 0 git push --all --follow-tags
fetchall = !git remote | grep -E 'origin|upstream' | xargs -L1 -P 0 git fetch
```
Now, `git pushall` and `git fetchall` will push to and fetch from all remotes in
parallel, respectively. Only one remote needs to be online for project members to
keep working.
## Advertising remotes
I'd recommend advertising at least three remotes in your README: your personal
favorite and two determined by popularity. Tell users to run `git remote set-url` to
switch remote locations if one goes down.
## Before you ask...
Q: Why not use a cloud service to automate mirroring?
A: Such a setup depends upon the cloud service and a primary repo for that service to
watch, defeating the purpose (resiliency). Hydra hosting automates this without
introducing new tools, dependencies, or closed platforms to the mix.
Q: What about issues, patches, etc.?
A: Stay tuned for Part 3, coming soon to a weblog/gemlog near you™.
Q: Why did you call this "hydra hosting"?
A: It's a reference to the Hydra of Lerna from Greek Mythology, famous for keeping
its brain in a nested RAID array to protect against disk failures and beheading. It
could also be a reference to a fictional organization of the same name from Marvel
Comics named after the Greek monster for [similar
reasons](https://www.youtube.com/watch?v=assccoyvntI&t=37) ([direct
webm](https://seirdy.one/misc/hail_hydra.webm)).