-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add encryptionScheme parameter to htsget #808
base: master
Are you sure you want to change the base?
Conversation
How exactly are you implementing crypt4gh in There's basically two ways you could do it - either encrypt each segment linked in the htsget response separately, or have them all join together to make a single encrypted file. HTSlib has historically supported the latter, and also looks for the magic number so it can tell an encrypted payload apart from a plain one without having to make any changes to the htsget specification. |
It's implemented by joining segments together into a single encrypted file. It calculates the bytes to return to the client, and then re-encrypts the Crypt4GH header and adds an edit list packet using a configured public/private key pair.
This works, however the The advantage of adding an additional parameter is that it allows the client to directly request encrypted or non-encrypted files without having to inspect the response header type. I think this lines up with the logic of specifying CRAM/BAM or BCF/VCF in the Similar discussion to #581, where clients can give hints on the data they are expecting to make their logic easier. |
@daviesrob If needed, shall we discuss this over a the 4-weekly htsget APAC call?:
|
I agree with @daviesrob that detecting what you got back is and should be the primary way of selecting how to decode the response. However it does seem like there would be some benefit in enabling the client to say it wants encryption and having a way for the client and server to negotiate that. Somewhat unclear that that should be in the initial request URL rather than something negotiated via the ticket… 🤔 |
Where do these 4-character codes come from? Are they in common with some other protocol, or have they been invented for this PR? Are letters really in such short supply that this has to be specifically 4 characters? |
htsget.md
Outdated
|
||
If `encryptionScheme` is specified a 4 letter code for the encryption standard MUST be returned. When `c4gh` is specified, Crypt4GH payload for a particular object will be returned (if available). | ||
|
||
The server SHOULD reply with a `NotFound` error if the requested reference does not exist. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't make sense and appears to have been copy/pasted from the referenceName
section. Do you mean if the server does not recognise a requested encryption code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I meant (encrypted) object Id, fixed in 4e6fa1a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is about the resource's <id>
, i.e., what forms the bulk of the path part of the URL, then it should be in the URL parameters
section (in fact, it seems this has always been missing there!). That would be a separate PR.
If this is about the 4-letter code designating the encryption method, then it should be here and use the same code
terminology.
I was hesitant to go for free-form string and I borrowed the idea of restricting format names to (at most) a 4 letter enum: BAM, CRAM, VCF, BCF. Admittedly a lot of encryption schemes might require more than 4 letters to declare, I wish there was a standard that defined this (IETF? W3C? OpenSSL source code?) that enumerates the most common encryption scheme strings. Anyways, very good point, what would you suggest @jmarshall? |
Why? |
This is a non-breaking, optional change for htsget.
This is already supported as
experimental
feature in htsget-rs since umccr/htsget-rs#298. It specifically allows for Crypt4GH requests for each underlying bioinformatics format.There's also the corresponding work in progress in htsget-compliance testsuite to support this extra parameter/schemes in ga4gh/htsget-compliance#11.
/cc @ohofmann @mmalenic @mlin @jppay