Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request - Machine readable version of /certificates #1162

Open
jvanasco opened this issue Dec 4, 2020 · 9 comments
Open

Feature Request - Machine readable version of /certificates #1162

jvanasco opened this issue Dec 4, 2020 · 9 comments

Comments

@jvanasco
Copy link
Contributor

jvanasco commented Dec 4, 2020

I believe it would be useful to have a machine readable version of the information in /certificates. That would allow for client developers and integrators to quickly check if anything has changed, and aid in automatically tracking certificate lineage.

The data could be maintained in a json file, similar to those in /data and it could serve two purposes:

  1. it could be published statically
  2. the data file could be used for templating all the language translations of "/certificates". the letsencrypt staff would only have to update one json file to create updates for all the language translation pages.
@tdelmas
Copy link
Collaborator

tdelmas commented Dec 4, 2020

I agree that using a similar model than for CT would be interesting (https://github.com/letsencrypt/website/blob/master/data/transparency.json)

@aarongable
Copy link
Contributor

This is a good idea! I don't expect to get to it very soon, since I'm pretty focused on both the upcoming chain switch and ECDSA issuance, but will keep this in mind (or would be happy to review if someone else tackled it!).

@GriffinSoftware
Copy link
Contributor

GriffinSoftware commented Dec 9, 2020

The numerous PRs I've made recently against the Chain of Trust page have been directed towards supplementing and structuring the information therein with an eye towards a PR to overhaul the entire page to streamline the presentation and remove redundancies. What say you all towards construction of a JSON tree structure corresponding to the diagram (that @aarongable is hopefully going to commit soon ; ) )?

The initial vision I have is that of an array of root certificate objects (including DST Root CA X3) with each having amongst its properties an array of "signed" intermediate certificate objects.

@aarongable
Copy link
Contributor

I'd recommend a structure more like this:

[
  {
    "displayname": "ISRG Root X1",
    "algorithm": "RSA 4096",
    "o": "Internet Security Research Group",
    "cn": "ISRG Root X1",
    "type": "root",
    "status": "active",
    "certificates": [
      {
        "displayname": "Self-signed by ISRG Root X1",
        "crtsh": <url>,
        "txt": <url>,
        "pem": <url>,
        "der": <url>
      },
      { <repeat for cross-sign> }
    ]
  },
  { <repeat for other certs> }
]

The types would be root and intermediate, the statuses would be active, upcoming, backup, and retired.

I suggest this format because the json files in the data directory are consumed by Hugo, and Hugo's templating isn't going to be good at arbitrary-depth descent through a tree of nested roots and intermediates. Of course, you could reflect the structure of the current page more strongly by hauling type and status up to be dictionary keys, but then updating requires moving entire sections rather than simply changing a single value.

@petercooperjr
Copy link

Just a couple random thoughts:

  1. If "status" is going to be something machine-understandable, it may make sense to distinguish "retired" between "this intermediate no longer signs certificates" and "all certificates signed by this intermediate have already expired".
  2. Does it make sense to somehow integrate with or get data from CCADB? I guess I just like the idea of there only being one authoritative place for information to be, so maybe this should extract data from CCADB or this should feed its data into CCADB or something like that?

Just brainstorming; these may be terrible ideas.

@jvanasco
Copy link
Contributor Author

jvanasco commented Dec 9, 2020

I have been tracking this stuff manually for a while and while I am far from a final form, I do have a small preference/learning — using a flat structure appeared to be better when dealing with the cross signs, and the intermediaries then list the roots that signed them. I also list the IdentTrust/trustId/DST root as well. This allows me to build the full chain - including the trusted root - for extensive tests.

IMHO, the payload should also have the have the notAfter/expiry date for each cert too.

@aarongable
Copy link
Contributor

This page should not feed its data into CCADB; CCADB is authoritative and only a few people have the right/ability to disclose certificates into it. Having this data file be autogenerated from CCADB would be nice, but doing so requires getting someone to create a new public report (similar to https://ccadb-public.secure.force.com/mozilla/CACertificatesInFirefoxReport) listing all certificates owned by Internet Security Research Group (ISRG), so let's save that idea for future improvements.

@jvanasco
Copy link
Contributor Author

jvanasco commented Dec 10, 2020

I generated a quick proof-of-concept here: master...jvanasco:feature-machine_readable_certificates

I don't expect this to work as-is, but changes are trivial, as certificates_build.py generates the certificates.json file. The bulk of the work was generating the input data of certificates.

General overview:

I split the certificate payload out from issuer, and split the algorithm into separate type and bits fields. Why? Python is generating this data, and it has it in two fields - so it makes more sense to keep it that way. This script has the same python requirements as Certbot.

The input is a human curated file "_certificate_data.json". It has some basic info about the certs, which can not be derived, such as the URLs and status/type. "_name" is just for editing the input (which could be another file with a "lastmod" date).

The script checks to ensure all the urls are valid and are not duplicated within the payload. It also checks to ensure all the URLs for the certificates are online.

It derives data from the PEM version. it could check the versions against one another. "type" and "status" are copied over. the "signed_by" is used to track the issuer and pegged to the issuer's "pem". if there is a cross-signed version, that is tracked too. the URL of the pem is used as a UUID to link certificates together.

why the flat, not-nested approach?

I keep thinking about how i - and others - would use this data. keeping it flat seems easier and more database like.

the workflow I envision, is that a LetsEncrypt staff member could just alter the input on a file with minimal information, run a script, and a machine readable version that has data which is checked and tested is then generated.

in any event, I'd be happy to submit a PR for this if LetsEncrypt wants to take it over for the reformatting. Otherwise, people can feel free to fork and work on it. If keeping a flat structure, the real customization will be in the output template (lines 193+).

input:

    {
    "_name": "ISRG Root X1",
    "type": "root",
    "status": "active",
    "crtsh": "https://crt.sh/?id=9314791",
    "txt": "https://letsencrypt.org/certs/isrgrootx1.txt",
    "pem": "https://letsencrypt.org/certs/isrgrootx1.pem",
    "der": "https://letsencrypt.org/certs/isrgrootx1.der",
    "signed_by": "https://letsencrypt.org/certs/isrgrootx1.pem",  # self-signed
    },

output:

    {
      "certificate": {
        "algorithm": "RSA", 
        "bits": 4096, 
        "cn": "ISRG Root X1", 
        "notAfter": "20350604110438Z", 
        "notBefore": "20150604110438Z", 
        "o": "Internet Security Research Group", 
        "selfsigned": true
      }, 
      "issuer": {
        "cn": "ISRG Root X1", 
        "o": "Internet Security Research Group", 
        "url_pem": "https://letsencrypt.org/certs/isrgrootx1.pem"
      }, 
      "status": "active", 
      "type": "root", 
      "urls": {
        "crtsh": "https://crt.sh/?id=9314791", 
        "der": "https://letsencrypt.org/certs/isrgrootx1.der", 
        "pem": "https://letsencrypt.org/certs/isrgrootx1.pem", 
        "txt": "https://letsencrypt.org/certs/isrgrootx1.txt"
      }
    }, 

@aarongable
Copy link
Contributor

That's pretty cool! A couple notes:

  • Yes, the flat approach is definitely best. But IMO it should be a flat listing of public/private key pairs, and then the list of certificates corresponding to that keypair should be contained within each entry. This reflects the graph structure of the WebPKI, where nodes are keypairs and edges are certificates representing trust relationships between keypairs.
  • If we want to incorporate this or something like it into the website repo itself, it should be Go, rather than Python.
  • I'm pretty opposed to checking in auto-generated files. It would be best if the only file actually checked in to the repo is the human-editable input file, and the rest of the generation happens at Hugo build-time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants