1
0
Fork 0
mirror of https://git.sr.ht/~seirdy/seirdy.one synced 2024-11-10 00:12:09 +00:00

Compare commits

...

4 commits

Author SHA1 Message Date
Rohan Kumar
734fc16df4
New note: opt-out telemetry 2022-06-03 02:27:43 -07:00
Rohan Kumar
03e292f57b
More whitespace fixes 2022-06-02 21:48:23 -07:00
Rohan Kumar
576ab1453c
Remove draft:true, typo 2022-06-02 21:00:51 -07:00
Rohan Kumar
2e0babc6de
New note: DuckDuckGo and Bing 2022-06-02 20:59:39 -07:00
3 changed files with 36 additions and 1 deletions

View file

@ -0,0 +1,19 @@
---
title: "DuckDuckGo and Bing"
date: 2022-06-02T20:59:38-07:00
replyURI: "https://www.librepunk.club/@penryn/108411423190214816"
replyTitle: "how would html.duckduckgo.com fit into this?"
replyType: "SocialMediaPosting"
replyAuthor: "@penryn@www.librepunk.club"
replyAuthorURI: "https://www.librepunk.club/@penryn"
---
I was referring to crawlers that build indexes for search engines to use. DuckDuckGo does have a crawler---DuckDuckBot---but it's only used for fetching favicons and scraping certain sites for infoboxes ("instant answers", the fancy widgets next to/above the classic link results).
DuckDuckGo and other engines that use Bing's commercial API have contractual arrangements that typically include a clause that says something like "don't you dare change our results, we don't want to create a competitor to Bing that has better results than us)". Very few companies manage to negotiate an exception; DuckDuckGo is not one of those companies, to my knowledge.
So to answer your question: it's irrelevant. "html.duckduckgo.com" is a JS-free front-end to DuckDuckGo's backend, and mostly serves as a proxy to Bing results.
For the record, Google isn't any different when it comes to their API. That's why Ixquick shut down and pivoted to Startpage; Google wasn't happy with Ixquick integrating multiple sources.
[More info on search engines](https://seirdy.one/posts/2021/03/10/search-engines-with-own-indexes/).

View file

@ -0,0 +1,15 @@
---
title: "Opt in telemetry"
date: 2022-06-03T02:27:05-07:00
replyURI: "https://news.ycombinator.com/item?id=31604932"
replyTitle: "As far as I am concerned, telemetry is a good thing"
replyType: "SocialMediaPosting"
replyAuthor: "eterevsky"
replyAuthorURI: "https://news.ycombinator.com/user?id=eterevsky"
---
Being enrolled in a study should require prior informed consent. Terms of the data collection, including what data can be collected and how that data will be used, must be presented to all participants in language they can understand. Only then can they provide informed consent.
Harvesting data without permission is just exploitation. Software improvements and user engagement are not more important than basic respect for user agency.
Moreover, not everyone is like you. People who do have reason to care about data collection should not have their critical needs outweighed for the mere convenience of the majority. This type of rhetoric is often used to dismiss accessibility concerns, which is why we have to turn to legislation.

View file

@ -31,7 +31,8 @@ sed 7d "$html_file" | xmllint --format --encode UTF-8 --noent - -o "$tmp_file"
tail -n +8 "$tmp_file" \
| sd '<pre(?: tabindex="0")?>\n\t*<code ' '<pre tabindex="0"><code ' \
| sd '(?:\n)?</code>\n(?:[\t\s]*)?</pre>' '</code></pre>' \
| sd '</span>.span itemprop="familyName"' '</span> <span itemprop="familyName"'
| sd '</span>.span itemprop="familyName"' '</span> <span itemprop="familyName"' \
| sd '([a-z])<(data|time)' '$1 <$2'
} >>"$xhtml_file"
# replace the html file with the formatted xhtml5 file, excluding the xml declaration