diff --git a/assets/p/xkcd_2501.png b/assets/p/xkcd_2501.png new file mode 100644 index 0000000..49b138f Binary files /dev/null and b/assets/p/xkcd_2501.png differ diff --git a/assets/p/xkcd_2501.webp b/assets/p/xkcd_2501.webp new file mode 100644 index 0000000..4d98a36 Binary files /dev/null and b/assets/p/xkcd_2501.webp differ diff --git a/assets/p/xkcd_2501_dark.png b/assets/p/xkcd_2501_dark.png new file mode 100644 index 0000000..ab50471 Binary files /dev/null and b/assets/p/xkcd_2501_dark.png differ diff --git a/assets/p/xkcd_2501_dark.webp b/assets/p/xkcd_2501_dark.webp new file mode 100644 index 0000000..e9cfe55 Binary files /dev/null and b/assets/p/xkcd_2501_dark.webp differ diff --git a/content/posts/post-ocsp-revocation.md b/content/posts/post-ocsp-revocation.md new file mode 100644 index 0000000..6ff7c03 --- /dev/null +++ b/content/posts/post-ocsp-revocation.md @@ -0,0 +1,448 @@ +--- +title: "Post-OCSP certificate revocation in the Web PKI" +description: "OCSP, including OCSP Stapling, is leaving the Web PKI. Here's a complete look at revocation beyond OCSP: its past, present, and possible futures." +date: "2024-09-25T11:29:38-04:00" +outputs: + - html +# - gemtext +syndicatedCopies: + - title: 'The Fediverse' + url: 'https://pleroma.envs.net/objects/4371b456-e37f-487f-886d-8b4fd7b705c2' + - title: 'Bluesky' + url: 'https://bsky.app/profile/seirdy.one/post/3l4yhmtm7it2a' +--- +
+ +## Introduction + +Today, TLS certificates in the Web public key infrastructure (PKI) have long validity: almost all remain valid for at least _three months!_ An attacker compromising a certificate early enough in its lifetime[^1] keeps it compromised for months. Certificate revocation addresses this problem: a client must know to distrust a certain key for a domain, even if the valid key hasn't expired yet. + +The issue? Billions of clients use the Web PKI: browsers, crawlers, link-preview generators, chatbots, email servers, email clients, etc. The easy part for a CA is knowing when to revoke a certificate.[^2] The hard part is telling every client to ignore a certain compromised certificate. All approaches to revocation trace their roots to at least one of the following: + +- Certificate Revocation Lists (CRLs) +- Online Certificate Status Protocol (OCSP) +- Short-lived certificates + +Initial approaches to each option showed major shortcomings. All three evolved: + +- CRLs became sharded CRLs and combined client-side "summarized CRLs". + +- OCSP became OCSP stapling and OCSP Must-Staple. It almost evolved into OCSP Expect-Staple. The shutdown of OCSP resolvers impacts all three; we need alternatives. + +- Short-lived certificates became ACME-STAR and delegated credentials. + +I'll break down each of these, along with some of my proposals. We should double down on client-side summarized CRLs[^3] and work towards reducing certificate lifetimes. + +Most solutions still have caveats or won't gain mainstream adoption for several years. Until then, I propose offering six-week certificates. + +### Motivation + +Why discuss revocation now? + +In August 2023, the CA/Browser Forum voted in favor of [Ballot SC-064 v4](https://cabforum.org/2023/07/14/ballot-sc-063-v4-make-ocsp-optional-require-crls-and-incentivize-automation/). This ballot made the Online Certificate Status Protocol (OCSP) optional and outlined where CAs should focus instead. {{}}Let's Encrypt published {{}} on {{}}, and will likely shut down its OCSP service sometime in late 2025 or 2026. Let's Encrypt issues TLS certificates for almost 60% of websites.[^4] This means most websites will soon have no more OCSP: while it might live on in other networks, OCSP will leave the Web PKI. + +The announcement from Let's Encrypt merely declares intent. Some staff members show openness to a migration period for stapling, with at least a year before any changes.[^5] After the migration period, I expect webmasters to lose the option for OCSP Stapling or working OCSP Must-Staple. Without OCSP, most clients either trust all certificates from their issuance until expiry or have to constantly update compressed revocation filters ("summarized CRLs") to check the revocation status of a certificate. We can also look to other solutions on the horizon. + +Note: I made this article ACME-centric.[^6] Most websites use ACME to automate the generation and renewal of certificates validated by a certificate authority (CA). Where non-ACME setups are relevant, I disregard them. Use ACME! + +### Target audience + +I wrote this for people with some basic familiarity with TLS, certificate authorities, and maybe OCSP; this represents what I knew before I started writing this article. I did my best to link other resources and define terms where appropriate.[^7] The [Let's Encrypt glossary](https://letsencrypt.org/docs/glossary/) might assist you. + +For readers less familiar, I recommend {{}}{{}} by {{}} for a primer on how TLS certificates and Web PKI work. + +{{< transcribed-image type="comic" itemtype="VisualArtwork" id="xkcd-2501" >}} + +#### xkcd comic: Average Familiarity {#average-familiarity} + +{{< transcribed-image-figure id="xkcd-2501" has-transcript="true" >}} + +{{< picture name="xkcd_2501" alt="Comic: two stick figures talking to each other. Transcript follows." >}} + +
+ +I'm not an "expert" but this comic captures how the first draft of this article read. From + +
+ +{{< /transcribed-image-figure >}} {{< transcribed-image-transcript >}} + +

Ponytail and Cueball are talking. Ponytail has her hand raised, palm up, towards Cueball.

+ + +Ponytail +: Silicate chemistry is second nature to us geochemists, so it's easy to forget that the average person only knows the formulas for olivine and one or two feldspars. + +Cueball +: And quartz, of course. + +Ponytail +: Of course. + +A caption below the panel reads, Even when they're trying to compensate for it, experts in anything wildly overestimate the average person's familiarity with their field. + +

Transcript from the explain xkcd wiki entry for xkcd #2501.

+ +{{< /transcribed-image-transcript >}} {{< /transcribed-image >}} + +
+ +{{}} + +## Certificate Revocation Lists + +CAs regularly publish live lists of revoked certificates called Certificate Revocation Lists (CRLs). When revoking a certificate, CAs update OCSP responses and push an entry to a CRL. They have up to a seven-day deadline to do so (less in certain cases[^8]). CAs annotate entries with one of ten possible reasons for revocation. See {{}}{{RFC 5280: Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile" extraName="headline" url="https://www.rfc-editor.org/rfc/rfc5280.html">}}{{}} for details on the contents of CRLs. + +CRLs' large size and short validity make them a poor fit for client-side revocation checking: most clients can't regularly re-download such large lists for revocation lookups. CAs started offering OCSP as a lightweight alternative. CRLs are far from obsolete, though: they support transparency, research, and other revocation technologies. + +CRLs, the oldest revocation method in this article, only recently joined the CA/Browser Baseline Requirements in Ballot SC-064 (the ballot I mentioned in [the "motivation" section](#motivation)).[^9] + +## Online Certificate Status Protocol (OCSP) + +Before trusting a certificate, browsers can ask its issuing CA if the CA has revoked the certificate by using the Online Certificate Status Protocol (OCSP). OCSP has a host of problems: + + +Performance +: Asking a CA about a new certificate before trusting it and loading a page can take hundreds of milliseconds or even seconds. That slows down page loads. + +Privacy +: This leaks domain names to the CAs in question. Users can always switch their DNS providers to options they trust, but they can't select a different place to check for OCSP status. + +CA reliability +: A CA needs good uptime and needs to respond to a massive number of requests from HTTPS clients worldwide. OCSP-supporting clients query every certificate from the CA they encounter and expect a low-latency response. + +Robustness +: A bad firewall or a poor connection could prevent successful and timely connections to the CA for an OCSP check. For this reason, **failed live OCSP checks typically soft-fail.** If the OCSP check doesn't succeed or takes too long, the client trusts the certificate. An attacker can also block connections to the CA to get a victim to trust a compromised certificate. + +### OCSP Stapling and Must-Staple + +OCSP Stapling addresses performance, privacy, and CA reliability with a certificate extension. A web server can run an OCSP check on its own certificates and attach the successful responses to them, refreshing them periodically. Clients receiving a fresh signed OCSP response with a certificate need not fetch a copy themselves. It does slightly increase the size of a certificate, but the size increases pale in comparison to the footprint of live OCSP lookups.[^10] + +OCSP Must-Staple addresses OCSP's robustness concerns. A certificate can include metadata instructing clients to reject the certificate if it doesn't include a successful stapled OCSP response. With OCSP Must-Staple, certificates behave almost as if they have a short one-week lifetime. + +Must-Staple has limited adoption: fewer than 1% of all unexpired TLS certificates use this feature.[^11] Any failure in the OCSP pipeline means downtime. This risk prevents most ACME clients from making it the default setting. Most webmasters won't find the benefits worth the cost of adoption and the risks involved. See {{}}{{}} by {{}} for Mozilla's rationale. + +### OCSP Expect-Staple + +The OCSP Expect-Staple proposal gained some adoption from a limited number of websites and received some browser-side experimentation but never took off. {{}}{{}} by {{}} introduces the concept. + +The design of OCSP Expect-Staple resembles HTTP Strict Transport Security (HSTS), so let's examine HSTS first. HTTPS-only websites almost always keep an HTTP site live on port 80 for redirection to the HTTPS site. An attacker can still intercept unencrypted HTTP traffic on the HTTP request before the HTTPS redirection. HSTS mitigates this: the website instructs the browser to prefer HTTPS for all future navigation. Websites can opt-in to a client-side list of domains forbidding plaintext HTTP, called the HSTS Preload list. Membership in the preload list ensures that TLS protects the initial request to a website, too.[^12] + +OCSP Expect-Staple applies the concepts behind HSTS to OCSP Must-Staple instead of HTTPS. In a response header, the website can tell browsers to expect stapling on all future renewals of a given certificate. Browsers will know to reject any certificates without a stapled OCSP response, should a malicious server send a revoked certificate and omit the staple. + +[Chromium experimented with Expect-Staple in 2016](https://docs.google.com/document/d/1aISglJIIwglcOAhqNfK-2vtQl-_dWAapc-VLDh-9-BE/edit) and even had an initial preloaded list, but [Chromium later retired all support](https://issues.chromium.org/issues/41230705). Then, it retired all support for OCSP (including stapling and Must-Staple). + +### Poor implementations + +[The most popular Web servers (uncustomized Apache and Nginx) have poor support for stapling](https://blog.hboeck.de/archives/886-The-Problem-with-OCSP-Stapling-and-Must-Staple-and-why-Certificate-Revocation-is-still-broken.html). I'm unfamiliar with Apache, but Nginx will happily serve a stale OCSP response. [Nginx has a broken DNS resolver that can cache the wrong IP of an OCSP stapling server](https://github.com/mozilla/server-side-tls/issues/283). I worked around the issue with a custom version of [certbot-ocsp-fetcher](https://github.com/tomwassenberg/certbot-ocsp-fetcher), but I don't expect most system administrators to use a shell script to work around broken servers or to migrate to something that gets it right (such as Caddy). When the most popular web servers ship broken stapling implementations for _years,_ we need to try something else. + +## Querying CRL shards + +Recall that most clients can't query and update giant CRLs directly. This inefficiency motivated the creation of OCSP. Ballot SC-063 also permits CAs to partition their CRLs into smaller "shards". Each shard has a publicly available URL. + +OCSP-enabled certificates contain an OCSP URL in their metadata for clients to submit OCSP queries. Let's Encrypt plans to replace these OCSP URLs with the URL to a CRL shard containing revocation information for that certificate. Instead of querying a CA's OCSP endpoint, clients download the relevant CRL shard containing the revocation status of several certificates at a time.[^13] + +Downloading the revocation status for many certificates represents a partial privacy improvement. Querying entire shards instead of individual domains adds entropy: a CA doesn't know which entry on a CRL shard a client needs to check. + +Clients will also see bigger downloads, as shards have footprints several times the size of single OCSP lookups. Remember that "modern" websites often contain sub-resources from many domains. As CRL shard downloads accumulate, the small size savings over summarized CRLs (see next section) might not justify the privacy risk. + +CAs find this solution easier than OCSP, but I don't think this resolves most of the OCSP issues clients experience. Directly querying CRL shards only partially mitigates each of the four OCSP issues I enumerated. + +## Client-side summarized CRLs + +Browsers subset or compress the giant set of all CRLs into compressed client-side lists with frequent push-based updates to keep browsing traffic private. Chromium CRLSets subset the complete set of CRLs with some inclusion criteria, while Mozilla's CRLite uses Bloom filters to compress a list of all current and revoked certificates. [Let's Encrypt refers to both technologies as summarized CRLs](https://letsencrypt.org/2022/09/07/new-life-for-crls.html). + +### Google's approach: CRLSets + +Chromium has a data set called "CRLSets". [CRLSets](https://www.chromium.org/Home/chromium-security/crlsets/) initially contained a small fraction of all unexpired and currently-revoked certificates, preferring high-impact and Extended-Validation sites over smaller, unimportant domains such as `seirdy.one`. It later expanded to cover certain reasons for revocation. + +My search through scattered documentation ended with three excellent articles: + +1. {{}}{{}} by {{}}, introduces the concept and motivation behind CRLSets. + +2. {{}}{{}} by {{}}, explains why the Chromium team didn't design CRLSets to scale to all revoked certs. He suggests probabilistic data structures but cites their inherent error rate. + +3. {{}}{{PKI" extraName="headline" url="https://doi.org/10.1145/2815675.2815685">}}, published on {{}}, evaluates several certificate revocation mechanisms and finds that CRLSets covered 0.35% of all revoked certificates. + +In 2024 (this year, at the time of publishing), Chromium expanded CRLSets to cover compromised keys to improve the situation. CRLSets still don't cover mass revocation events due to scalability issues. + +{{}} + +
+ +We've added support for certificate revocations due to key compromise to CRLSet, and enabled enforcement. Any certificate revoked with the key compromise reason code should now be blocked by Chrome clients within 24-48 hours. This approach should work for day-to-day revocation, but will not work for mass revocation events, due to a limit on the max size of a CRLSet. + +
+{{}} +{{}} +{{}} +{{
}} + +Given the subset of revoked certificates covered by CRLSets, I don't consider them a robust solution to certificate revocation. This remains Chromium's single current method for certificate revocation. Firefox used a similar mechanism called OneCRL, but it later migrated to a new approach called CRLite with full coverage. + +### Mozilla's approach: CRLite + +The second article by Adam Langley mentioned compressing certificate revocation lists into probabilistic error-prone data structures for better coverage with a small size. Firefox's [CRLite](https://blog.mozilla.org/security/2020/01/09/crlite-part-2-end-to-end-design/) takes this approach with Bloom filters but without the errors. + +Certain browsers (Safari, Chromium/Edge) require CAs to participate in Certificate Transparency (CT). Certificate Transparency logs are append-only distributed ledgers that publicize all CA-issued certificates for public scrutiny.[^14] All certificates in CT receive a signed certificate timestamp (SCT) cryptographically proving their presence in a CT log, which browsers can check. Safari, Chromium, and derivatives perform these checks; Firefox currently lacks support for CT.[^15] + +The CRLite Bloom filter generation process can compare the filter against actual CT logs to remove all errors. The result has _complete_ coverage of certificate revocations with a tiny footprint and an error rate of zero! + +CRLite has a much smaller storage footprint than the hundreds of megabytes CRLSets would need for complete coverage. I still find this footprint too large for sufficiently constrained clients. When it first rolled out, it weighed 10 megabytes with a 580 kilobyte daily download for updates. CRLite's footprint has at least doubled since then and continues to grow.[^16] + +### Apple's approach: valid.apple.com + +Apple adopted an approach similar to CRLite using Bloom filters; it's known as "valid.apple.com", or simply "valid". I found {{}} a now-unlisted WWDC 2017 presentation called {{}} (video/quicktime); the relevant part begins around .{{}} According to the 2017 presentation, when a lookup against Apple's summarized CRL has a positive signal, clients perform an OCSP lookup when no stapled response exists. + +This approach reduces the number of live OCSP queries to leak less traffic and slow down fewer connections. At the time of writing, the device I tested still makes regular live OCSP checks. + +I won't discuss valid.apple.com in detail because I'm unfamiliar with Apple's approach, but this article would be incomplete without mentioning it. Braver souls can [browse Apple's maze of public source code](https://opensource.apple.com/source/Security/) or [Apple's API documentation for Web PKI policies](https://developer.apple.com/documentation/security/policies). valid.apple.com lacks any meaningful documentation. + +The documentation I cited comes from seven years ago. I suspect the approach has changed: the devices I tested seemed to make OCSP requests for every domain, and [a thread on Apple Developer Forums](https://forums.developer.apple.com/forums/thread/706629) seems to indicate that Apple stopped using client-side summarized CRLs. I expect major changes coming soon: Apple voted to make OCSP optional in SC-064. + +### Shortcomings of summarized CRLs + +As shown in the earlier section, revocation data sets such as CRLSets can't scale to cover all revoked certificates. Probabilistic filters can, but both have additional issues. + +Summarized CRLs make sense for browsers, but not for smaller clients. cURL, small chat apps, mail clients, feed readers, weather notifications, etc. all use HTTPS and all need a summarized CRL updated every few hours. This creates the need for an operating-system-managed summarized CRL, which I doubt would work well if OS-managed certificate bundles serve any indication. + +Tiny embedded clients, Internet of Things devices, machines running ancient releases of stable-release distros,[^17] retro computers, etc. won't constantly update an evergreen multi-megabyte revocation filter. Getting a live-updating compressed summarized CRL on short-lived spin-up-spin-down servers might require effort from cloud vendors. Short-lived certificates make for a more realistic option to cater to such clients. + +### Let's Revoke + +Unlike the other options in this section, [Let's Revoke](https://www.ndss-symposium.org/ndss-paper/lets-revoke-scalable-global-certificate-revocation/) is just a proposal. Like CRLite, Let's Revoke involves pushing all active and revoked certificates to a client in a highly-compressed form. However, it scales far better than existing options. + +Let's Revoke requires changes from CAs: they need to generate Certificate Revocation Vectors (CRVs), small bit-vectors, and include them in every issued certificate. Under Let's Revoke, a CA pushes an incrementally-updated, highly-compressed archive of all CRVs for active revoked certificates to clients. Clients then use the archive to look up every certificate from that CA. Let's Revoke uses a fraction of CRLite's storage footprint. + +For extra-constrained clients, the CA can offer pull-based Let's Revoke checks similar to OCSP and querying CRL shards. I'll disregard live Let's Revoke lookups in this article, as CAs have already settled on using CRL shards for live lookups as the successor to OCSP. + +Getting all CAs on board will prove difficult: getting them on board with certificate transparency, signed certificate timestamps, and CRLs took a long time. Let's Revoke only offers binary information about revocation status (no reasons, no revocation timestamp, etc.), so clients capable of handling CRLite's footprint might find Let's Revoke less useful. However, its ability to scale with a tiny footprint makes it promising for lightweight clients such as Internet of Things devices. + +## ACME Renewal Information (ARI) + +ACME Renewal Information (ARI) is a protocol that notifies a server's ACME client when a certificate needs renewal. It can tell an ACME client when a certificate grows stale (e.g., after for a certificate) or upon certificate revocation to facilitate quick renewal. + +Other revocation checks on this page ensure clients don't trust revoked certificates. ARI ensures servers don't offer them. Even though ARI isn't a revocation check, an article describing the complete landscape of revocation checking should at least mention it. + +ARI complements other forms of revocation. Both ARI and OCSP Stapling get servers with long-lived certificates to check in with CAs more often to ensure that their certificates remain unrevoked. + +The [lego](https://go-acme.github.io/lego/) ACME client supports ARI thanks to contributions from Let's Encrypt. See {{}}{{}} by {{}} for more on the implementation. + +## Short-lived certificates + +Recall that CAs have a deadline to revoke a certificate; seven days represents the longest possible deadline. A certificate with a lifetime shorter than this can abstain from both OCSP and CRLs.[^18] + +If we reduce a certificate's lifetime to less than one week, revocation becomes much less of a problem. Instead of "revoking" a certificate, a CA stops re-issuing certificates for a domain and waits for the most recent certificate to expire. The European Telecommunications Standards Institute (ETSI) calls certificates with a lifetime of no longer than short-lived certificates.[^19] + +Live OCSP checks, sharded CRL lookups, and summarized CRLs add centralized points of failure for client-side revocation look-ups. Attackers can block component updates, online OCSP checks, or CRL shard fetches.[^20] Short-lived certificates add no more points of failure. + +Short-lived certificates come with their own set of challenges: + +- Webmasters have narrow time windows to fix certificate issues. They'll need to set up monitoring, alerts, and backup CAs should they not want a CA's reliability to bottleneck their own. + +- CAs need to issue certificates _far_ more often, increasing their load. Phasing out OCSP frees up some resources for them to do this, although CAs need much fewer resources for serving an OCSP response than for renewing a certificate. + +- CAs need to improve their availability. Large ones can take down millions of services by going offline long enough. + +Overcoming these obstacles will take much work. ACME-STAR and delegated credentials for TLS look like promising attempts to shorten the longevity of certificate trust in the Web PKI. + +### ACME-STAR + +{{}}{{RFC 8739: Support for Short-Term, Automatically Renewed (STAR) Certificates in the Automated Certificate Management Environment (ACME)" extraName="headline" url="https://www.rfc-editor.org/rfc/rfc8739.html">}}{{}} outlines how ACME-STAR improves efficiency when issuing short-lived certificates at scale. + +Requesting a certificate from a CA requires the CA to issue a certificate immediately, even during heavy load. Shrinking the issuance window from two months to mere days would dramatically increase this load. We require an alternative to issuing certificates on-demand. CAs already generate OCSP responses ahead of time and offer them for ACME clients to download and staple, so why should they handle short-lived certificates differently? + +With ACME-STAR, CAs can schedule certificate re-issuance, generating them ahead of time for ACME clients to download later. ACME clients no longer need to request certificate generation; instead, they regularly re-fetch the most recently generated certificate. + +### Delegated credentials + +An Identifier Owner (IdO) is a party that operates and controls an identifier (usually a domain name[^21]). Traditionally, CAs issue a TLS certificate to an ACME client operated by the IdO. Often, the IdO delegates another party with more infrastructure to serve responses on its behalf. How do we handle certificate issuance when the IdO delegates a separate ACME client operator? + +Servers fronted by Content Delivery Networks (CDNs) generally trust the CDN to handle TLS. The CDN has a TLS certificate that verifies a trusted link between ownership of the certificate and use of the domain name. The TLS certificate doesn't, however, prove any involvement from the IdO, the one who holds and controls the domain name. The IdO ostensibly communicates with the CDN over one TLS connection, and the CDN communicates with clients using a separate TLS connection. We extend the trusted CDN-to-domain link to include the IdO-to-CDN link with a delegated credential. + +A delegated credential is a short-lived certificate that the delegate (such as a CDN) frequently re-generates on behalf of its IdO. A CA's private keys sign a normal TLS certificate to show approval from the CA. A delegated credential receives one more signature: the IdO's longer-lived TLS key _also_ signs the delegated credential. + +Delegated credentials solve three problems: + +1. Clients can now verify that the TLS key sent by a CDN has approval from both the CDN and the IdO behind the CDN. + +2. Clients receive the benefits of short-lived certificates: they don't have to worry about revocation. + +3. We have a standard, vendor-neutral way for CDNs to generate certificates on behalf of an IdO without access to the IdO's private keys, in a more efficient manner than current approaches such as [Cloudflare's "Keyless SSL"](https://blog.cloudflare.com/announcing-keyless-ssl-all-the-benefits-of-cloudflare-without-having-to-turn-over-your-private-ssl-keys/).[^22] + +Delegated credentials solve revocation of the certificate issued to the CDN, but not to the IdO. **Should a CA revoke an IdO's long-lived certificate, we need another revocation solution.** Delegated credentials address the need to _distrust_ CDNs, so we can't rely on CDNs to respond to the revocation of the IdO certificate. + +I still find delegated credentials worthwhile in the context of revocation. They could lay a foundation for future advancements to enable revocation checking of both certificates: the delegate's and the IdO's. + +{{}}{{RFC 9345: Delegated Credentials for TLS and DTLS" extraName="headline" url="https://www.rfc-editor.org/rfc/rfc9345.html">}}{{}} outlines the mechanism that Cloudflare, Facebook, and Firefox use for delegated credentials with other rationales. + +### ACME-STAR Delegation + +ACME-STAR lays its own foundation for delegated credentials in {{}}{{}}{{}}. + +Don't feel too optimistic about this proposal: the version of delegated credentials used by Cloudflare, Facebook, and Firefox doesn't use ACME-STAR, let alone ACME-STAR Delegation. Its publication came almost two years after the final revision of ACME-STAR Delegation, and it doesn't mention ACME-STAR Delegation anywhere in its "Related Work" section. + +For an introduction to the proposal, see {{}}{{}} by {{}}, one of its co-authors. + +### My proposals {#own-proposals} + +### Eliminating single points of failure {#no-spof} + +Browser-summarized CRLs and live checks have a single point of failure: if a malicious party blocks component updates or live checks, they can block revocation information for a compromised certificate. We can mitigate this issue by decentralizing the points of failure. I like the idea of using [Signed HTTP Exchanges (SXGs)](https://wicg.github.io/webpackage/draft-yasskin-http-origin-signed-responses.html) to enable other parties to serve revocation filters while verifying their authenticity. I recall hearing proposals for serving them over a WebRTC or WebTransport-based peer-to-peer swarm resembling WebTorrent. + +We can also use SXGs to distribute CRLs or CRL shards. + +### Enforcing short-term certificates {#enforcing-stc} + +I have two proposals that can work alongside existing proposals for short-lived certificates. + +Today, Certificate Authority Authorization (CAA) DNS records restrict the issuance of certificates in several ways: + +- Restrict CAs permitted to issue certificates to a given domain. +- Restrict domain validation methods accepted (RFC 8657). +- Restrict ACME account URIs for a given CA (also RFC 8657). + +CAA records should receive two more extensions: + +- Restrict issuance to short-lived certificates. +- Restrict approved delegates for delegated credentials. + +With the first extension, an attacker who triggers a misissuance would compromise it for a few days or hours months. The second extension limits the potential for rogue delegates to serve traffic on behalf of an IdO. + +I want to see the protections offered by Expect-Staple preloading for short-lived certificates. HTTPS Resource Records (RRs) or client-side preload lists can proactively tell clients to distrust any long-lived certificate for a domain.[^12] + +These two proposals might initially seem identical and redundant, but they serve different purposes. A CAA extension tells CAs not to issue long-lived certificates; HTTPS clients ignore these. An HTTPS RR or client-side preload list tells clients not to trust a long-lived certificate if one already exists. Both proposals work together to protect against long-lived compromises on two fronts. + +## Incremental change: shorter long-lived certificates + +Every week we shave from the average certificate's lifetime means: + +- One less week an attacker has to exfiltrate a key. +- One less week a compromised cert has to stay in a revocation list. + +Every reduction we make to certificate lifetimes translates to smaller, more manageable revocation lists. Client-side Bloom filters can shrink, CAs can more gradually scale up and address difficulties with shorter lifetimes, and everybody has a lower likelihood of trusting a compromised certificate. + +We don't have to go down to ten-day lifetimes right away. I propose starting by **shrinking lifetimes from three months to six weeks,** with biweekly renewal. This would encourage webmasters to set up alerting systems for renewal failures, as they'll have just two weeks to notice failures. It'll also potentially reduce the growth rate of revocation filters until we adopt better options. + +### Requesting a shorter lifetime + +ACME clients can set a `notBefore` and `notAfter` parameter in their certificate request to customize the exact certificate lifetimes. CAs that support this feature include: + +- Sectigo (through ZeroSSL) +- Google Trust Services + +ACME clients that support these parameters include: + +- lego (recommended) +- acme.sh[^23] + +
+ +## Conclusion + +With OCSP on its way out, the end draws near even for robust OCSP-based options such as OCSP Must-Staple. The future lies in: + +1. Live lookups using sharded CRLs +2. Summarized CRLs such as CRLite and Let's Revoke +3. Short-lived certificates using ACME-STAR. Delegated credentials only partially address revocation. + +Live sharded-CRL lookups resemble live OCSP lookups. They differ in their lower operational complexity for CAs, added noise for a slight privacy improvement, and larger downloads. Summarized CRLs seem like the opposite approach to OCSP, downloading all current revocations in advance and checking them locally. + +I find short-lived certificates a spiritual successor to both by applying OCSP's approach to certificate issuance. All three options represent improvements, but I find short-lived certificates more robust. Unfortunately, without better tools, short-lived certificates place a greater burden on webmasters. The fact that `seirdy.one` has a three-month certificate at the time of writing (admittedly, with OCSP Must-Staple) illustrates the height of this barrier as of late 2024. + +That said, not all sites need to adopt short-lived certificates. Client-side revocation filters increase in total size and daily download footprint every year, but a large enough share of sites adopting short-lived certificates would mitigate that trend. + +We have so much more work to do. Most non-browser clients support _none_ of the revocation solutions outlined on this page. All BoringSSL-based clients, including Chromium, have no support for OCSP (including OCSP stapling). This means most Web traffic lacks robust revocation checking. We need libraries that support delegated credentials. We need CA support and server tools for ACME extensions, such as ACME-STAR. + +### Aside: browser wars + +[Firefox continues (rapidly) playing catch-up to Chromium on important aspects of browser security](https://seirdy.one/notes/2022/07/12/firefox-hardening-progress/), but its security edges ahead in some areas. It's the best at revocation checking, but the worst at alerting webmasters that they need to revoke. + +- Firefox has the best support for OCSP (including OCSP Must-Staple and an option to require OCSP and make it hard-fail), years after Chromium removed all OCSP support. + +- Firefox uses complete and error-free Bloom filters in CRLite. Chromium covers a fraction of all revoked certs in CRLSets. I'm too unfamiliar with Safari's approach to comment. + +- At the time of writing, Firefox stands alone as the single browser with support for delegated credentials. + +For years, Firefox has also been the only one of the three major browsers to lack any support for Certificate Transparency enforcement. Without enforcing CT, domain owners won't know when misissued certificates exist. All the revocation _checking_ in the world won't tell a domain owner when they need to revoke. + +
+ +
+ +## Ack­nowledge­ments {#acknowledgements} + +Thanks to {{}} for giving me detailed feedback on an early draft. He offered feedback that significantly improved this article by: + +- Pointing out several areas needing elaboration. +- Linking me to Chromium's Q2 2024 update to fix my outdated information. +- Highlighting at least six factual errors. +- introducing me to valid.apple.com. +- Telling me about the relevance of Firefox lacking CT support. + +{{}} also reviewed a draft. Among other feedback, they encouraged me to mention pre-OCSP CRLs; this was the only part of the history of revocation I hadn't covered. + +
+ + +[^1]: See [the section with my proposals](#enforcing-stc) for an example of how an attacker can compromise certificate issuance for a domain unprotected by CAA DNS records. An attacker might also steal your private keys, but at that point, you have worse things to worry about. + +[^2]: Relatively speaking, this part is _easier_ than revocation but that doesn't make it _easy._ High-volume free CAs such as Let's Encrypt can't handle individual requests for revocation. They usually revoke certificates in bulk when they experience an issue. + +[^3]: I noticed people referring to CRLSets and CRLite as types of CRLs but I find the term inaccurate. A Certificate Revocation List contains revoked certificates for a given CA complete with metadata about reasons for revocation; browsers' revocation databases accumulate and subset or compress all CRLs into a single list or filter with limited metadata. + + I still needed a term that referred to CRLSets, CRLite, and Let's Revoke. Let's Encrypt picked "summarized CRLs", so I decided to compromise and use their term despite my reservations. + +[^4]: Source: [W3Techs](https://w3techs.com/technologies/history_overview/ssl_certificate). + +[^5]: For more staff comments, see some forum replies: + + - {{}}{{}} by {{}} + - {{}}{{}} by {{}}. + - {{}}{{}} by {{}}. + + Both staff members find OCSP, even with Must-Staple, the wrong long-term solution. + +[^6]: ACME stands for [Automatic Certificate Management Environment](https://en.wikipedia.org/wiki/Automatic_Certificate_Management_Environment), the standard protocol for automating certificate management between certificate authorities and servers. + +[^7]: I even used the `dfn` HTML element like a good little HTML author. 🥺 + +[^8]: See section 4.9.1.1 of the CA/B Baseline Requirements[^9] for and deadlines. The next section (section 4.9.1.2) describes seven-day deadlines. Past compromise of a subscriber key, authorization during certificate requests, or domain validation leaves a CA with to revoke. + + The Baseline Requirements exempt short-lived certificates from this deadline. I find the tradeoff acceptable, given how many compromised certificates still won't experience revocation in time but will be renewed. + +[^9]: For all the requirements CAs must obey, read [the CA/Browser Baseline Requirements](https://cabforum.org/working-groups/server/baseline-requirements/documents/). I recommend reading through them for a fascinating overview of what CAs do. + +[^10]: I oversimplified the nuance of the size increase. HTTPS clients that support OCSP cache live OCSP responses for a week, so later sessions will have a lower footprint than they would if the certificate had a stapled OCSP response. The footprint of a webpage might eclipse the footprint of a stapled OCSP response, but remember that HTTPS responses besides webpages exist. Instant messages, API responses, etc. tend to weigh less and often have significant latency constraints. + +[^11]: [Ballot SC-064's Google Doc for background, rationale, and considerations](https://docs.google.com/document/d/180T6cDSWPy54Rb5d6R4zN7MuLEMShaZ4IRLQgdPqE98/mobilebasic) cites [a survey of all unexpired certificates at the time in CT logs](https://docs.google.com/document/d/1C0i0pOaI84gNccGzREPOrr5kMfpYkUEr87cBMZ09q_4/mobilebasic), finding that 0.0622% of all certificates used Must-Staple. + +[^12]: I always found protecting the initial request with a client-side domain list a horrible, but effective, hack. We should have a way to query available connection methods for a domain using the correct venue for querying domain metadata set by a domain owner: DNS. + + We have a better solution today with HTTPS resource records! These specify how to access an HTTPS service before making an HTTPS connection. They list the availability of HTTP/2 and HTTP/3 (both require TLS), an IPv4/IPv6 address, and Encrypted Client Hello keys. + + In an alternate timeline where OCSP Expect-Staple took off, perhaps HTTPS resource records could include Expect-Staple metadata. + +[^13]: If you're too impatient to wait for shard URLs embedded in certificates to roll out, look up the CRL for a given certificate in [the Common CA Database (CCADB)](https://www.ccadb.org/resources). All three major browser vendors require CA participation in the CCADB; it's _the_ place to look for CRLs today, and therefore critical to their generation of summarized CRLs. + +[^14]: Note that CT logs don't specify revocation status. Revocation happens _after_ certificate issuance; the CT log already lists the certificate by then. + +[^15]: Follow Firefox's progress for implementing CT in [Bug 1281469](https://bugzilla.mozilla.org/show_bug.cgi?id=1281469). + +[^16]: Updating summarized CRLs requires browsers to make regular automatic connections by default. [I find browser patchsets to turn off all automated connections misguided](https://seirdy.one/notes/2024/07/19/on-a-more-selective-google/) partly because they break revocation. + +[^17]: Here's your regular reminder that [Debian Extended Long-Term Support exists](https://wiki.debian.org/LTS/Extended). Debian 8 "Jessie", released on , will continue to keep security professionals up at night with officially-endorsed third-party limited commercial support from Freexian through . Debian 12 support lasts till 2033. + +[^18]: Yes, one week. Firefox skips revocation checking for certificates with a validity period shorter than . Until recently, CAs and browsers agreed that the definition of a short-lived certificate required a validity period shorter than . The CA/Browser Ballot SC-064 v4 specifies adopting the European Telecommunications Standards Institute (ETSI) specification for short-lived certificates,[^19] constraining their lifetime to the maximum time to process a revocation request. This will shorten the maximum lifetime of a short-lived certificate from 10 to by . + +[^19]: Fittingly, the canonical location of this ETSI specification lives on a server experiencing live OCSP failures. [I saved an archived copy of the ETSI specification for certificate profiles](https://web.archive.org/web/20240902154502if_/https://www.etsi.org/deliver/etsi_en/319400_319499/31941201/01.04.04_60/en_31941201v010404p.pdf). + +[^20]: See [my own proposals](#no-spof) for a mitigation to summarized-CRLs' single point of failure. + +[^21]: Valid identifiers in the Web PKI also include IP addresses and `.onion` addresses, but most CAs don't offer free certificates for those. + +[^22]: {{}}{{}} on The Cloudflare Blog{{}} describes this motivation in more detail. Essentially, a delegation credential gets rapidly pushed to the edge server. It represents an efficiency improvement over edge servers periodically requesting new certificates. + +[^23]: I don't recommend acme.sh. Sandboxing complex shell scripts proves difficult. acme.sh has had severe arbitrary remote-code-execution vulnerabilities exploited by a CA: see [CVE-2023-38198](https://nvd.nist.gov/vuln/detail/CVE-2023-38198). While acme.sh developers fixed the vulnerabilities, they revealed the difficulty of securing a complex shell script that handles untrusted content. + +