Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return storage document header with reads to allow external sorting by sequenceNumber or time64 #150

Open
albe opened this issue Feb 22, 2021 · 1 comment
Labels
enhancement P: Storage Affects the storage layer

Comments

@albe
Copy link
Owner

albe commented Feb 22, 2021

Since 0.7 the storage layer stores an external sequence number and a monotonic time64 timestamp into every document. Until now that information is not returned back when reading from the storage.

This requires an API change, which is breaking.

@albe albe added enhancement P: Storage Affects the storage layer labels Feb 22, 2021
@albe
Copy link
Owner Author

albe commented May 30, 2021

The EventStore read API should not be dealing with internal document sequence numbers and timestamps, so that part should not change. The goal though is, that the storage sequence number can be used to replace the storage level global index for cross-stream (partition) reading purposes. At least the global index should only be optional for performance improvements and not mandatory to reconstruct the document order. See #24, which requires iterating all partitions in insertion order to reindex documents.

The Storage read API currently consists of two methods:

read(number, index): document

This API method does not need to change. If you want to read a single document from the storage, the sequence number is already known and timestamp is likely not of interest. For the case they are, a new method can be added.

*readRange(from, until = -1, index = null): Generator<document>

This API method is supposed to return all documents in the order they were written to the storage. If an index is specified only the documents in that index (stream) should be returned. Hence, technically this API also shouldn't change - a reader is likely not concerned with the individual document's sequence number (he only wants them in the given range and in order) or timestamp. Again, an additional API method can be added to allow this use-case.

So effectively, the *iterateRange(from, until, index) implementation should not read from the global index, but instead iterate over all partitions and return the documents in the sequenceNumber order.

A potential additional API method could be something like

*readTimeRange(fromTime, untilTime): Generator<document>

which would return all documents within a given time range, rather than sequence number range. Once a method that iterates all documents and orders by the document metadata is implemented, adding this API should be straightforward. The biggest issue to solve is how to efficiently find the start/end point for the range. That could be solved by indexing the document time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement P: Storage Affects the storage layer
Projects
None yet
Development

No branches or pull requests

1 participant