[ci] bulk-storage support for acs snapshots #3423

isegall-da · 2025-12-17T20:26:07Z

Part of #3429
Implements dumping an ACS snapshots to S3 storage, with a unit test. Not yet integrated with anything else.

Pull Request Checklist

Cluster Testing

If a cluster test is required, comment /cluster_test on this PR to request it, and ping someone with access to the DA-internal system to approve it.
If a hard-migration test is required (from the latest release), comment /hdm_test on this PR to request it, and ping someone with access to the DA-internal system to approve it.

PR Guidelines

Include any change that might be observable by our partners or affect their deployment in the release notes.
Specify fixed issues with Fixes #n, and mention issues worked on using #n
Include a screenshot for frontend-related PRs - see README or use your favorite screenshot tool

Merge Guidelines

Make the git commit message look sensible when squash-merging on GitHub (most likely: just copy your PR description).

Signed-off-by: Itai Segall <[email protected]>

…-snapshots Signed-off-by: Itai Segall <[email protected]>

Signed-off-by: Itai Segall <[email protected]>

ray-roestenburg-da

Going in the right direction IMHO, some initial comments. The possible memory leak, handling exceptions, and always making sure the zstd object is closed are my most important comments.

ray-roestenburg-da · 2025-12-18T16:07:35Z

.../src/main/scala/org/lfdecentralizedtrust/splice/scan/store/bulk/AcsSnapshotBulkStorage.scala

+    1000,
+    (64 * 1024 * 1024).toLong,
+  )
+  val bulkStorageTestConfig = BulkStorageConfig(


This can be defined in a test instead of here?

.../src/main/scala/org/lfdecentralizedtrust/splice/scan/store/bulk/AcsSnapshotBulkStorage.scala

ray-roestenburg-da · 2025-12-18T16:12:26Z

.../scan/src/main/scala/org/lfdecentralizedtrust/splice/scan/store/bulk/AcsSnapshotSource.scala

+import scala.concurrent.ExecutionContext
+import scala.util.{Failure, Success}
+
+case class AcsSnapshotSource(


I don't think you need to write your own graphstage for this, you can do this with unfoldAsync (or statefulMapConcat but unfoldAsync is easier probably)

Argh, sorry, this is an unused leftover. I indeed already reimplemented it with unfoldAsync.

ray-roestenburg-da · 2025-12-18T16:13:55Z

...scan/src/main/scala/org/lfdecentralizedtrust/splice/scan/store/bulk/S3BucketConnection.scala

+  }
+
+  // Writes a full object from memory into an s3 object
+  def writeFullObject(key: String, content: ByteBuffer)(implicit tc: TraceContext) = {


would be nice to add the return type here since it's a public def, is this a blocking call?

The reason I omitted the return type is that I didn't find anything useful in it, so I don't actually care about returning it. Would it be cleaner to add a () in the end to make that explicit?

Indeed, it is blocking, hence the Future { } wrapper in the call site. Added to the comment.

yes it's easier to read with ()

ray-roestenburg-da · 2025-12-18T16:16:43Z

.../scan/src/main/scala/org/lfdecentralizedtrust/splice/scan/store/bulk/ZstdGroupedWeight.scala

+      val compressionLevel: Int = 3,
+  ) {
+
+    val tmpBuffer = ByteBuffer.allocateDirect(tmpBufferSize)


This is allocated outside of JVM, not GC-ed

I think you are likely currently leaking native memory and not cleaning up the compressingStream correctly

you need to at least deref the tmpBuffer at some point so that it becomes unreachable and can get deleted. (there is still the problem that you don't know when the JVM will actually free it)
there is cleanup code for this kind of stuff but that's not portable I think. And once you use directbuffers you really need to set -XX:MaxDirectMemorySize to you know when this happens.

The problem was that zstd asserts that the buffer is direct, but I'll definitely look into not leaking it at least.

ray-roestenburg-da · 2025-12-18T16:41:53Z

apps/common/src/main/scala/org/lfdecentralizedtrust/splice/store/Limit.scala


+/** Limit with no constraints. Must not be used for production, use only for testing.
+  */
+case class UnboundLimit private (limit: Int) extends Limit


Maybe rather do // TODO (#767): make configurable on Limit.MaxPage so you don't need this?

Fair. I half-did #767 by making it an argument, but not fully configurable (e.g. by an app config, so this tackles only the use case of overriding the max per usage in code).

Nice, thanks

ray-roestenburg-da · 2025-12-18T16:45:26Z

.../src/main/scala/org/lfdecentralizedtrust/splice/scan/store/bulk/AcsSnapshotBulkStorage.scala

+        CompactJsonScanHttpEncodings.javaToHttpCreatedEvent(event.eventId, event.event)
+      )
+      val contractsStr = encoded.map(_.asJson.noSpacesSortKeys).mkString("\n") + "\n"
+      val contractsBytes = ByteString(contractsStr)


contractsStr.getBytes(StandardCharSets.UTF_8)

ray-roestenburg-da · 2025-12-18T16:46:50Z

.../src/main/scala/org/lfdecentralizedtrust/splice/scan/store/bulk/AcsSnapshotBulkStorage.scala

+  def dumpAcsSnapshot(migrationId: Long, timestamp: CantonTimestamp): Future[Unit] = {
+
+    @SuppressWarnings(Array("org.wartremover.warts.Var"))
+    var idx = 0


This needs to be a volatile member or needs to be atomic integer or reference, you are mutating it concurrently from a future (from a thread on the executioncontext that runs the stream)

hmm.. I originally had it an atomicInt, but I'm mutating it from a future with parallelism 1, so convinced myself that it can be a var. But you're probably right than it's better to just be prudent and revert to that.

Just for my education, is my reasoning that parallelism 1 is good enough solid, or am I still missing something?

the 1 does not matter you are still accessing the var from that one other thread which isn't allowed in principle. I shoot myself in the foot every time I want to be too clever or loose around these rules, as well as the fact that code changes and the next person doesn't see the special case, or starts copying a bad way of doing things.

ray-roestenburg-da · 2025-12-18T16:54:40Z

.../scan/src/main/scala/org/lfdecentralizedtrust/splice/scan/store/bulk/ZstdGroupedWeight.scala

+  }
+
+  override def createLogic(inheritedAttributes: Attributes): GraphStageLogic =
+    new GraphStageLogic(shape) with InHandler with OutHandler {


So this needs s preStart to create the buffer and a postStop to deref it (and close the zstd stream)

ray-roestenburg-da · 2025-12-18T16:57:13Z

...scan/src/main/scala/org/lfdecentralizedtrust/splice/scan/store/bulk/S3BucketConnection.scala

+}
+
+object S3BucketConnection {
+  def apply(s3Config: S3Config, bucketName: String, loggerFactory: NamedLoggerFactory) = {


Suggested change

def apply(s3Config: S3Config, bucketName: String, loggerFactory: NamedLoggerFactory) = {

def apply(s3Config: S3Config, bucketName: String, loggerFactory: NamedLoggerFactory): S3BucketConnection = {

Signed-off-by: Itai Segall <[email protected]>

isegall-da · 2025-12-18T20:21:03Z

@ray-roestenburg-da I believe I addressed all your comments, mind taking another look please?

Signed-off-by: Itai Segall <[email protected]>

ray-roestenburg-da · 2025-12-18T20:52:59Z

.../src/main/scala/org/lfdecentralizedtrust/splice/scan/store/bulk/AcsSnapshotBulkStorage.scala

+object BulkStorageConfigs {
+  val bulkStorageConfigV1 = BulkStorageConfig(
+    1000,
+    (64 * 1024 * 1024).toLong,


nitpick but you can also do 64L * 1024 * 1024

ray-roestenburg-da · 2025-12-18T20:54:17Z

.../src/main/scala/org/lfdecentralizedtrust/splice/scan/store/bulk/AcsSnapshotBulkStorage.scala

+  )
+}
+
+sealed trait Position


nitpick: Maybe nice to put these in an object

ray-roestenburg-da · 2025-12-18T20:58:26Z

.../src/main/scala/org/lfdecentralizedtrust/splice/scan/store/bulk/AcsSnapshotBulkStorage.scala

+        snapshot = timestamp,
+        after,
+        limit = HardLimit.tryCreate(config.dbReadChunkSize),
+        Seq.empty,


nit, since you do limit = , maybe nice to do partyIds = Seq.empty, templates = Seq.empty for clarity

Fixed (removed the limit =)

I think it makes sense to default partyIds: Seq[PartyId] = Seq.empty, templates: Seq[PackageQualifiedName] = Seq.empty, on def queryAcsSnapshot( which is nice and clean, but not necessary in this PR

ray-roestenburg-da · 2025-12-18T21:10:13Z

.../src/main/scala/org/lfdecentralizedtrust/splice/scan/store/bulk/AcsSnapshotBulkStorage.scala

+        )
+        .mapAsync(1) { zstdObj =>
+          val objectKey = s"snapshot_$idx.zstd"
+          Future {


You should use a blocking threadpool for this future, since it blocks, now you're likely using the default dispatcher. sadly we can't use connector which does use proper non-blocking I/O. So you need to do
Future{}(ec) and get the ec which is used for blocking I/O in canton or get one from pekko configured as a dispatcher.

of course it depends on the ec implicit on this class but it's not visible that way that this class needs a separate threadpool for blocking I/O. maybe better just too create one threadpool fixed for this

Probably should wrap it in blocking { } anyway (the writeFullObject). And then if it uses canton's executorservice stuff it might know what to do..

If this was a separate app/service just give it a fixed threadpool to work with.

(but later on I noticed that there is a aws S3AsyncClient you should just use instead and not worry about the blocking, if this thing works well)

Fixed with the asyncClient that you pointed to.

ray-roestenburg-da · 2025-12-18T21:23:08Z

.../src/main/scala/org/lfdecentralizedtrust/splice/scan/store/bulk/AcsSnapshotBulkStorage.scala

+          case None =>
+            logger.debug(s"$prefixMsg No failure, normal restart.")
+        }
+        // Always retry (TODO(#3429): consider a max number of retries?)


This is one of the reasons why I would think it is nice for this thing to be separate from the scan app pod. Let it crash when it can't write, recover on startup where you were, and let k8s restart the pod, this then also far easier shows up on failures in monitoring, and you don't have to crash the scan app. And the reboot effect sometimes works far better than staying in process. My two cents.

yes, but as discussed, breaking scan up into multiple microservices has its own downsides.

ok fair, also maybe not the best comment I made on this particular PR 😄

ray-roestenburg-da · 2025-12-18T21:26:43Z

.../src/main/scala/org/lfdecentralizedtrust/splice/scan/store/bulk/AcsSnapshotBulkStorage.scala

+
+    // TODO(#3429): tweak the retry parameters here
+    val delay = FiniteDuration(5, "seconds")
+    val policy = new RetrySourcePolicy[Unit, Int] {


Is there a test that retry works, that writeFullObject can overwrite?

I tested it manually for now, added a TODO to add a unit test for it.

...scan/src/main/scala/org/lfdecentralizedtrust/splice/scan/store/bulk/S3BucketConnection.scala

ray-roestenburg-da · 2025-12-18T21:27:33Z

...scan/src/main/scala/org/lfdecentralizedtrust/splice/scan/store/bulk/S3BucketConnection.scala

+  def readFullObject(key: String): ByteBuffer = {
+    val obj = s3Client.getObject(GetObjectRequest.builder().bucket(bucketName).key(key).build())
+    val bytes = obj.readAllBytes()
+    val ret = ByteBuffer.allocateDirect(bytes.length)


not necessary to be a direct buffer right? (I mean as part of readFullObject)

Fixed (as part of moving to the asyncClient)

ray-roestenburg-da · 2025-12-18T21:33:06Z

...scan/src/main/scala/org/lfdecentralizedtrust/splice/scan/store/bulk/S3BucketConnection.scala

+      .bucket(bucketName)
+      .key(key)
+      .build()
+    s3Client.putObject(


Have you looked at S3AsyncClient ? you can convert CompletableFuture in java to Future in Scala with FutureConverters, and I read that this is non blocking based on Netty, worth a try.

Nice, adopted that indeed.

ray-roestenburg-da · 2025-12-18T21:37:02Z

.../scan/src/main/scala/org/lfdecentralizedtrust/splice/scan/store/bulk/ZstdGroupedWeight.scala

+    }
+
+    override def close(): Unit = {
+      compressingStream.close()


Make this class own the tmpBuffer as well (let is create the buffer in constructor body and assign to private var and also add tmpBuffer = null here in close, then everything is nicely following AutoCloseable

ray-roestenburg-da

Left some more comments, especially to try out S3AsyncClient

Signed-off-by: Itai Segall <[email protected]>

isegall-da · 2025-12-19T00:42:34Z

Left some more comments, especially to try out S3AsyncClient

Thanks, that indeed seems to work nicely.

Signed-off-by: Itai Segall <[email protected]>

ray-roestenburg-da · 2025-12-19T10:38:40Z

.../src/main/scala/org/lfdecentralizedtrust/splice/scan/store/bulk/AcsSnapshotBulkStorage.scala

+
+  def dumpAcsSnapshot(migrationId: Long, timestamp: CantonTimestamp): Future[Unit] = {
+
+    // TODO(#3429): currently, if this crashes half-way through, there is no indication in the S3 objects that


Curious, why not do that right now?

(or is this just a next PR that you'll work on, fine by me of course)

Yeah, just followup work in coming PRs, this one's already quite large.

ray-roestenburg-da · 2025-12-19T10:41:21Z

...scan/src/main/scala/org/lfdecentralizedtrust/splice/scan/store/bulk/S3BucketConnection.scala

+        AsyncRequestBody.fromBytes(content.array()),
+      )
+      .asScala
+      .map(_ => ())


Maybe at a later stage you can get the result and check ETag against md5 hash of what you wrote, or something like that to guarantee it's really there. (not meant now in this PR)

ray-roestenburg-da · 2025-12-19T10:43:09Z

...scan/src/main/scala/org/lfdecentralizedtrust/splice/scan/store/bulk/S3BucketConnection.scala

+        .endpointOverride(s3Config.endpoint)
+        .region(s3Config.region)
+        .credentialsProvider(StaticCredentialsProvider.create(s3Config.credentials))
+        // TODO(#3429): mockS3 and GCS support only path style access. Do we need to make this configurable?


Only if some other S3 compatible storage does not work with this, could investigate that, or wait until someone asks for it. In any case it should be defaulted to this, if we would make it configurable, and then you would need to explicitly turn it off, IMHO.

Yup, that's what I meant (check if it's needed), and that's the plan.

ray-roestenburg-da · 2025-12-19T10:45:57Z

.../scan/src/main/scala/org/lfdecentralizedtrust/splice/scan/store/bulk/ZstdGroupedWeight.scala

+case class ZstdGroupedWeight(minSize: Long) extends GraphStage[FlowShape[ByteString, ByteString]] {
+  require(minSize > 0, "minSize must be greater than 0")
+
+  val zstdTmpBufferSize = 10 * 1024 * 1024; // TODO(#3429): make configurable?


I think makes sense to configure as part of BulkStorageConfig, with sensible default, so it can be tuned if necessary. (though likely this is a fine number)

ray-roestenburg-da · 2025-12-19T10:47:19Z

.../scan/src/main/scala/org/lfdecentralizedtrust/splice/scan/store/bulk/ZstdGroupedWeight.scala

+
+  override def createLogic(inheritedAttributes: Attributes): GraphStageLogic =
+    new GraphStageLogic(shape) with InHandler with OutHandler {
+      // TODO(#3429): consider implementing a pool of tmp buffers to avoid allocating a new one for each stage


yes good idea, quite standard to use a pool for this (and then manage cleanup / deref there)

ray-roestenburg-da · 2025-12-19T10:49:34Z

.../scan/src/main/scala/org/lfdecentralizedtrust/splice/scan/store/bulk/ZstdGroupedWeight.scala

+  override def createLogic(inheritedAttributes: Attributes): GraphStageLogic =
+    new GraphStageLogic(shape) with InHandler with OutHandler {
+      // TODO(#3429): consider implementing a pool of tmp buffers to avoid allocating a new one for each stage
+      private val tmpBuffer = ByteBuffer.allocateDirect(zstdTmpBufferSize)


Doing this in preStart is better, otherwise you allocate even if the stream never runs or fails before starting.

👍 added to the todo for the pool, since we'll revisit this code then anyway.

ray-roestenburg-da · 2025-12-19T10:50:19Z

.../scan/src/main/scala/org/lfdecentralizedtrust/splice/scan/store/bulk/ZstdGroupedWeight.scala

+      override def postStop(): Unit = {
+        super.postStop()
+        if (zstd.get() != null) {
+          zstd.get().close()


Since you allocate the buffer in this graphStageLogic you also need to deref it here. (or move everything to Zstd)

Will be improved in a followup PR that will add a buffer pool

ray-roestenburg-da · 2025-12-19T10:54:59Z

.../scan/src/main/scala/org/lfdecentralizedtrust/splice/scan/store/bulk/ZstdGroupedWeight.scala

+      new ZstdDirectBufferCompressingStreamNoFinalizer(tmpBuffer, compressionLevel)
+
+    def compress(input: ByteString): ByteString = {
+      val inputBB = ByteBuffer.allocateDirect(input.size)


can you not reuse this buffer instead of creating a new one for every compression? allocateDirect is slow, GC does not see native mem so it can take a while before it gets cleaned up, in the meantime native memory starts piling up. maybe better, since you always use this from graphstagelogic, is to have a reusable direct buffer there, pass it into this method, and reuse it. (otherwise you risk running out of memory by piling up native memory while gc is not necessary, if you start streaming a lot of files)

direct buffers are a PITA.

https://openjdk.org/jeps/442 Project Panama in JDK 22 should make things a lot better, but likely that still would not work with this library, they'll need to update it, or we of course could consider contributing...

Added another TODO for a buffer pool here as well.

Signed-off-by: Itai Segall <[email protected]>

isegall-da · 2025-12-19T14:39:19Z

@ray-roestenburg-da I believe I addressed all your comments (or put TODOs to address in followup PRs). Ready for an approve?

ray-roestenburg-da

Very nice, thanks!

isegall-da added 8 commits December 17, 2025 20:25

[ci] bulk-storage support for acs snapshots

d8dd90f

Signed-off-by: Itai Segall <[email protected]>

[ci] cleanup

d2f9b76

Signed-off-by: Itai Segall <[email protected]>

[ci] cleanup

7331008

Signed-off-by: Itai Segall <[email protected]>

[ci] cleanup

f913382

Signed-off-by: Itai Segall <[email protected]>

[ci] cleanup

c1f4c97

Signed-off-by: Itai Segall <[email protected]>

move it to localnet for now

342aebe

Signed-off-by: Itai Segall <[email protected]>

[ci] Merge remote-tracking branch 'origin/main' into isegall/bulk-acs…

bbf5af2

…-snapshots Signed-off-by: Itai Segall <[email protected]>

[ci] cleanup

3744d31

Signed-off-by: Itai Segall <[email protected]>

isegall-da marked this pull request as ready for review December 18, 2025 15:25

isegall-da requested review from rautenrieth-da and ray-roestenburg-da December 18, 2025 15:25

[ci] cleanup

3294ea7

Signed-off-by: Itai Segall <[email protected]>

ray-roestenburg-da reviewed Dec 18, 2025

View reviewed changes

isegall-da added 2 commits December 18, 2025 17:35

[ci] reviews 1

2cd38ba

Signed-off-by: Itai Segall <[email protected]>

[ci] more review comments

2f7ca2b

Signed-off-by: Itai Segall <[email protected]>

isegall-da added 2 commits December 18, 2025 20:22

[ci] cleanup

9a8b0cd

Signed-off-by: Itai Segall <[email protected]>

[ci] cleanup

c06bd8f

Signed-off-by: Itai Segall <[email protected]>

ray-roestenburg-da reviewed Dec 18, 2025

View reviewed changes

...scan/src/main/scala/org/lfdecentralizedtrust/splice/scan/store/bulk/S3BucketConnection.scala Show resolved Hide resolved

ray-roestenburg-da reviewed Dec 18, 2025

View reviewed changes

[ci] more review comments

0bd1419

Signed-off-by: Itai Segall <[email protected]>

isegall-da added 2 commits December 19, 2025 01:52

[ci] fix it

2c84efe

Signed-off-by: Itai Segall <[email protected]>

[ci] fmt

8235c96

Signed-off-by: Itai Segall <[email protected]>

ray-roestenburg-da reviewed Dec 19, 2025

View reviewed changes

[ci] more review comments

2c126c8

Signed-off-by: Itai Segall <[email protected]>

isegall-da requested a review from ray-roestenburg-da December 19, 2025 16:22

ray-roestenburg-da approved these changes Dec 19, 2025

View reviewed changes

isegall-da merged commit 9c58bcc into main Dec 19, 2025
116 of 118 checks passed

isegall-da deleted the isegall/bulk-acs-snapshots branch December 19, 2025 19:21

	def apply(s3Config: S3Config, bucketName: String, loggerFactory: NamedLoggerFactory) = {
	def apply(s3Config: S3Config, bucketName: String, loggerFactory: NamedLoggerFactory): S3BucketConnection = {


		def dumpAcsSnapshot(migrationId: Long, timestamp: CantonTimestamp): Future[Unit] = {

		// TODO(#3429): currently, if this crashes half-way through, there is no indication in the S3 objects that

[ci] bulk-storage support for acs snapshots #3423

[ci] bulk-storage support for acs snapshots #3423

Conversation

isegall-da commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Checklist

Cluster Testing

PR Guidelines

Merge Guidelines

Uh oh!

ray-roestenburg-da left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

isegall-da Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

isegall-da commented Dec 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ray-roestenburg-da Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

isegall-da commented Dec 17, 2025 •

edited

Loading

isegall-da Dec 18, 2025 •

edited

Loading

ray-roestenburg-da Dec 18, 2025 •

edited

Loading

ray-roestenburg-da Dec 18, 2025 •

edited

Loading