Skip to content

Conversation

@jnicoulaud-ledger
Copy link
Contributor

  • create log directory if missing
  • retry on some 5XX statuses
  • add request retry to unclaimed_sv_rewards.py

Pull Request Checklist

Cluster Testing

  • If a cluster test is required, comment /cluster_test on this PR to request it, and ping someone with access to the DA-internal system to approve it.
  • If a hard-migration test is required (from the latest release), comment /hdm_test on this PR to request it, and ping someone with access to the DA-internal system to approve it.

PR Guidelines

  • Include any change that might be observable by our partners or affect their deployment in the release notes.
  • Specify fixed issues with Fixes #n, and mention issues worked on using #n
  • Include a screenshot for frontend-related PRs - see README or use your favorite screenshot tool

Merge Guidelines

  • Make the git commit message look sensible when squash-merging on GitHub (most likely: just copy your PR description).

@martinflorian-da
Copy link
Contributor

@jnicoulaud-ledger please let us know if/once you'd like us to trigger a full CI run! You also might need to force push to rewrite your git log with Signed-off-by commit messages... (see https://github.com/hyperledger-labs/splice/blob/main/CONTRIBUTING.md#testing).

f"{self.url}/api/scan/v0/updates", json=payload

json = await self.__post_with_retry_on_statuses(
f"{self._get_current_url()}/api/scan/v0/updates",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably best to switch to /v2/updates as a prudent engineering measure:

  • /v1/updates/{update_id}:
    get:
    deprecated: true
    tags: [ deprecated ]
    x-jvm-package: scan
    operationId: "getUpdateByIdV1"
    description: |
    Returns the update with the given update_id.
    Unlike /v0/updates/{update_id}, this endpoint returns responses that are consistent across different
    scan instances. Event ids returned by this endpoint are not comparable to event ids returned by /v0/updates.
    The order of items in events_by_id is not defined.
  • /v2/updates:
    post:
    tags: [external, scan]
    x-jvm-package: scan
    operationId: "getUpdateHistoryV2"
    description: |
    Returns the update history in ascending order, paged, from ledger begin or optionally starting after a record time.
    Compared to `/v1/updates`, the `/v2/updates` removes the `offset` field in responses,
    which was hardcoded to 1 in `/v1/updates` for compatibility, and is now removed.
    `/v2/updates` sorts events lexicographically in `events_by_id` by `ID` for convenience, which should not be confused with the
    order of events in the transaction, for this you should rely on the order of `root_event_ids` and `child_event_ids`.
    Updates are ordered lexicographically by `(migration id, record time)`.
    For a given migration id, each update has a unique record time.
    The record time ranges of different migrations may overlap, i.e.,
    it is not guaranteed that the maximum record time of one migration is smaller than the minimum record time of the next migration,
    and there may be two updates with the same record time but different migration ids.

@jose-velasco-ieu
Copy link
Contributor

Maybe we could increase the default page-size from 100 to 1000. I’ve been testing locally (connected through a VPN with 20 Mb download speed), and using 1000 improves the timing a bit.

@jose-velasco-ieu
Copy link
Contributor

jose-velasco-ieu commented Dec 2, 2025

I've been thinking about improving performance by issuing parallel calls to the Scan API, but I’m not sure that’s possible. From what I can tell, pagination in /v2/updates is strictly sequential: each request needs the after_record_time and after_migration_id returned by the last transaction of the previous page. That creates a hard dependency chain where page N+1 cannot be requested until page N has been fetched. In other words, it seems like the Scan API enforces linear pagination, which would prevent issuing multiple updates() calls in parallel — unless there’s some alternative pagination mechanism I’m missing.

@meiersi-da
Copy link
Contributor

I've been thinking about improving performance by issuing parallel calls to the Scan API, but I’m not sure that’s possible. From what I can tell, pagination in /v2/updates is strictly sequential: each request needs the after_record_time and after_migration_id returned by the last transaction of the previous page. That creates a hard dependency chain where page N+1 cannot be requested until page N has been fetched. In other words, it seems like the Scan API enforces linear pagination, which would prevent issuing multiple updates() calls in parallel — unless there’s some alternative pagination mechanism I’m missing.

I believe you could organize the parallel fetching by time intervals. For example, run ingestion for every day separately. The ingestion can start at the beginning of the day, and then fetch all the pages until it sees the first update after the current day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants