Feature: Storage backend for ID-based APIs (DRS-style resolution)

### Feature

A new storage backend that supports APIs where files are addressed by opaque
IDs rather than direct paths. The backend performs a resolution step —
translating the htsget request path into a file ID — before constructing
ticket URLs.

### Motivation

htsget and [DRS (Data Repository Service)](https://ga4gh.github.io/data-repository-service-schemas/)
are both GA4GH standards, but htsget-rs currently has no way to serve data
from DRS-style repositories where file access requires an ID lookup.

The existing backends all assume a direct mapping from the htsget request ID
to a storage location:
- **FileStorage** — ID maps to filesystem path
- **S3Storage** — ID maps to S3 key
- **UrlStorage** — ID maps to `{base_url}/{key}`

This works when the data backend uses the same addressing scheme as htsget.
But a growing number of genomic data repositories use ID-based APIs where the
relationship between a human-readable path and the download endpoint requires
a lookup step:

```
htsget request:    GET /reads/{dataset}/{filepath}
                         ↓
resolution step:   dataset + filepath  →  fileId     (via API call)
                         ↓
ticket URLs:       GET /files/{fileId}/content  (with Range header)
```

UrlStorage cannot do this — it constructs URLs by concatenating a base URL
with the key, with no intermediate resolution.

Note: the ID-based data endpoint (`/files/{fileId}/content`) already provides
streaming and Range support. The missing piece is the `{dataset}/{filepath}`
→ `fileId` resolution step before URL construction.

**Concrete use case:** The [NeIC Sensitive Data Archive](https://github.com/neicnordic/sensitive-data-archive)
(SDA) is a federated genomic archive used by Nordic research institutions.
We are building a new download API ([v2 spec](https://github.com/neicnordic/sensitive-data-archive/blob/main/sda/cmd/download/swagger_v2.yml))
with a DRS-inspired design — ID-based file access, split header/content
endpoints, and GA4GH service-info. The API is not a standalone DRS service
today but is designed to be easily separable into one in the future.

The v2 API endpoints relevant for htsget:

- `GET /datasets/{datasetId}/files` → list files (returns fileId per file)
- `GET /files/{fileId}/content` → encrypted data segments (Range-capable)
- `GET /files/{fileId}/header` → Crypt4GH header

htsget-rs is already used with SDA via UrlStorage pointed at an internal
path-based endpoint (`/s3-encrypted/{dataset}/{filepath}`). This path-based
endpoint is being retired in the v2 API. An ID-resolving backend would let
htsget-rs work with the new API without maintaining a legacy endpoint.

### Proposed approach

A new backend (working name: `DrsStorage` or `ResolverStorage`) that:

1. **Resolves** the key to a file ID via a configurable API call
   (e.g. `GET /datasets/{datasetId}/files?filePath={filepath}` → returns fileId)
2. **Caches** the resolution (genomic archives typically treat files as immutable)
3. **Constructs ticket URLs** using the resolved ID
   (e.g. `{response_url}/files/{fileId}/content`)
4. **Implements `get()`/`head()`** by proxying to the ID-based data endpoint

It would reuse the existing `response_url` pattern and `forward_headers`
mechanism (forwarding request headers to backend/ticket fetch path per config)
from UrlStorage, and could be feature-gated like the S3 and URL backends.

### Example config

```toml
[[locations]]
regex = "^(?P<dataset>[^/]+)/(?P<filepath>.+)$"
substitution_string = "$dataset/$filepath"

backend.kind = "Drs"
backend.api_url = "http://download-internal:8080"
backend.response_url = "https://download.example.org"
backend.resolve_endpoint = "/datasets/{dataset}/files?filePath={filepath}"
backend.content_endpoint = "/files/{fileId}/content"
backend.forward_headers = true
backend.header_blacklist = ["Host"]
```

### Alternatives considered

1. **Keep a path-based endpoint in the data repository** — works but forces
   repositories to maintain legacy endpoints just for htsget compatibility
2. **External sidecar that pre-resolves paths** — operationally complex,
   static mapping breaks when files are added
3. **Regex/substitution in UrlStorage** — cannot do HTTP lookups, only
   string transformations

### Scope

Happy to contribute an implementation if there's interest. The SDA team has
experience with htsget-rs (we maintain a deployment using UrlStorage + C4GH)
and can provide integration testing against a real archive.

Would like to hear your thoughts on:
- Whether this fits htsget-rs scope or is better as an external plugin
- Naming: `DrsStorage` vs `ResolverStorage` vs something else
- Any architectural preferences for the resolution/caching layer

### References

- [SDA repository](https://github.com/neicnordic/sensitive-data-archive)
- [SDA Download API v2 spec](https://github.com/neicnordic/sensitive-data-archive/blob/main/sda/cmd/download/swagger_v2.yml)
- [GA4GH DRS spec](https://ga4gh.github.io/data-repository-service-schemas/)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Storage backend for ID-based APIs (DRS-style resolution) #356

Feature

Motivation

Proposed approach

Example config

Alternatives considered

Scope

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature: Storage backend for ID-based APIs (DRS-style resolution) #356

Description

Feature

Motivation

Proposed approach

Example config

Alternatives considered

Scope

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions