1
0
Fork 0
mirror of https://git.sr.ht/~seirdy/seirdy.one synced 2024-11-23 21:02:09 +00:00

Compare commits

...

2 commits

Author SHA1 Message Date
Rohan Kumar
de6a37b092
Document the need for intentional moderation 2023-07-24 21:41:32 -07:00
Rohan Kumar
287a0a5dc0
More robots.txt exclusions
For shitty services that at least respect robots.txt
2023-07-24 15:33:02 -07:00
3 changed files with 15 additions and 1 deletions

View file

@ -70,6 +70,8 @@ I pared that down to FediNuke.txt, which contains instances that were both reall
I think if you're starting a well-moderated instance, Tier 0 is a decent place to start (that's why it's in the standard CSV format). You should add and remove entries as you see fit. If you're making a client and want to give it a built-in blocklist, or are looking for a good "default" blocklist, then FediNuke is a good option. I think if you're starting a well-moderated instance, Tier 0 is a decent place to start (that's why it's in the standard CSV format). You should add and remove entries as you see fit. If you're making a client and want to give it a built-in blocklist, or are looking for a good "default" blocklist, then FediNuke is a good option.
However: if your instance grows larger (or if you intend to grow): you should be intentional about your moderation decisions, present and past. Your members ostensibly trust you, but not me. See the "trust but verify" section for for more information.
### Rationale for creating two subsets ### Rationale for creating two subsets
I used to just make a Tier-0 list. Later, I added the FediNuke list. Some people have asked why I don't just use one or the other; if Tier-0 was big enough to warrant FediNuke, why publish Tier-0 at all? I used to just make a Tier-0 list. Later, I added the FediNuke list. Some people have asked why I don't just use one or the other; if Tier-0 was big enough to warrant FediNuke, why publish Tier-0 at all?

View file

@ -78,6 +78,8 @@ I pared that down to `FediNuke.txt`, which contains instances that were both rea
I think if you're starting a well-moderated instance, Tier 0 is a decent place to start (that's why it's in the standard CSV format). You should add and remove entries as you see fit. If you're making a client and want to give it a built-in blocklist, or are looking for a good "default" blocklist: FediNuke is a good option. I think if you're starting a well-moderated instance, Tier 0 is a decent place to start (that's why it's in the standard CSV format). You should add and remove entries as you see fit. If you're making a client and want to give it a built-in blocklist, or are looking for a good "default" blocklist: FediNuke is a good option.
However: if your instance grows larger (or if you intend to grow): you should be intentional about your moderation decisions, present and past. Your members ostensibly trust you, but not me. See [the "trust but verify" section](#trust-but-verify) for for more information.
### Rationale for creating two subsets ### Rationale for creating two subsets
I used to just make a Tier-0 list. Later, I added the FediNuke list. Some people have asked why I don't just use one or the other; if Tier-0 was big enough to warrant FediNuke, why publish Tier-0 at all? I used to just make a Tier-0 list. Later, I added the FediNuke list. Some people have asked why I don't just use one or the other; if Tier-0 was big enough to warrant FediNuke, why publish Tier-0 at all?

View file

@ -3,10 +3,14 @@ Disallow: /noindex/
Disallow: /misc/ Disallow: /misc/
# I opt out of online advertising so malware that injects ads on my site won't get paid. # I opt out of online advertising so malware that injects ads on my site won't get paid.
# You should do the same. # You should do the same. my ads.txt file contains a standard placeholder to forbid any
# compliant ad networks from paying for ad placement on my domain.
User-Agent: Adsbot User-Agent: Adsbot
Disallow: / Disallow: /
Allow: /ads.txt Allow: /ads.txt
Allow: /app-ads.txt
# The next three are borrowed from https://www.videolan.org/robots.txt
# > This robot collects content from the Internet for the sole purpose of # helping educational institutions prevent plagiarism. [...] we compare student papers against the content we find on the Internet to see if we # can find similarities. (http://www.turnitin.com/robot/crawlerinfo.html) # > This robot collects content from the Internet for the sole purpose of # helping educational institutions prevent plagiarism. [...] we compare student papers against the content we find on the Internet to see if we # can find similarities. (http://www.turnitin.com/robot/crawlerinfo.html)
# --> fuck off. # --> fuck off.
@ -28,6 +32,12 @@ Disallow: /
User-Agent: BLEXBot User-Agent: BLEXBot
Disallow: / Disallow: /
# Providing Intellectual Property professionals with superior brand protection services by artfully merging the latest technology with expert analysis. (https://www.checkmarknetwork.com/spider.html/)
# "The Internet is just way to big to effectively police alone." (ACTUAL quote)
# --> fuck off.
User-agent: CheckMarkNetwork/1.0 (+https://www.checkmarknetwork.com/spider.html)
Disallow: /
# Eat shit, OpenAI. # Eat shit, OpenAI.
User-agent: ChatGPT-User User-agent: ChatGPT-User
Disallow: / Disallow: /