feat: onnx runtime upgrade #478

kallebysantos · 2025-01-23T19:48:19Z

What kind of change does this PR introduce?

Refactor, upgrade

What is the current behavior?

Current the ort rust backend is using ort rc-2 & onnx v1.19.2

What is the new behavior?

This PR introduces:

ort: library upgrade from rc-2 to rc-9
onnx: support from 1.19.2 to 1.20.1
ndarray_linalg: removal of unnecessary library, that last release was about 2y ago
applied some code refactor and cleaning

TODO:

Add support for String tensors.
~~Upgrade to upcoming ort rc-10~~ Lets use rc-9 for now, then I push another PR for rc-10

Need help:
I would like to ask @nyannyacha 💚, if possible, to add updated tests snapshots for other platforms as well do k6 tests comparing to the latest version of that.

nyannyacha · 2025-01-23T22:22:40Z

I would like to ask @nyannyacha 💚, if possible, to add updated tests snapshots for other platforms as well do k6 tests comparing to the latest version of that.

Sure! I will do it for you.

nyannyacha · 2025-02-04T03:38:11Z

@kallebysantos I think your inference api branch is ahead of this PR, what do you think? If you don't intend to merge this PR, I think it's probably a good idea to close it.

(But if that was a temporary branch for testing, I think it would be fine to leave this PR open.)

kallebysantos · 2025-02-04T08:03:12Z

@kallebysantos I think your inference api branch is ahead of this PR, what do you think? If you don't intend to merge this PR, I think it's probably a good idea to close it.

(But if that was a temporary branch for testing, I think it would be fine to leave this PR open.)

Hi @nyannyacha thank you for reviewing it, you're partially right let me explain:

this PR, I put here all the onnx runtime version upgrades. If you check the Todo bullet point I'm still waiting for ort pyke to release the rc-10, but if you think that is fine to keep the version I already set we can merge it and then in the future I will push another PR for rc-10.

The ort maintener tell me that was planning to release rc-10 probably this month (February)

The inference API branch is planned to be a future PR after this one. So I started it from here, since I did already establish the rc-9 version.

There's another thing, I would like to support String Tensors cause I will probably need them for inference api - Maybe I could move this task to be included on the next PR while we discuss better about this topic. I found some issues in adding String Tensors into the current code base without use serde

nyannyacha · 2025-02-05T01:35:20Z

Sidenote:
There was no significant change in the request throughput when the k6 was run locally

nyannyacha

LGTM

nyannyacha · 2025-02-08T22:10:51Z

I believe the reason Coveralls reported that this PR's coverage was reduced is probably due to the tests related to s3 fs.
They require a github secret, which is not accessible on the fork branch. This is why the step to set the secret was disabled, and the tests for s3 fs were skipped.

- `ndarray_linalg`: Since `ort` deps to `ndarray=0.16`, decided to remove this extra dependency. Normalization can be done in a simple small funtion instead of deriving it from an external crate.

using the `rc-9` apis to get the same results from older `rc-2`

Since `ONNX`version has been upgraded, the snapshots need to be recreated

- Since `ndarray_linalg` has been removed we need to manually apply L2 normalization

- Applying `dot product` between two embeddings to ensures that `mean_pool` and `normalize` is working properly

Co-authored-by: Nyannyacha <[email protected]> Co-authored-by: kallebysantos <[email protected]>

kallebysantos · 2025-02-22T16:05:40Z

Hello @nyannyacha, I did rebase to develop and the integration tests is passing, but I'm facing some issue when executing with lower memory limits (default main.ts limits):

Deno v1

$ ORT_DYLIB_PATH="$(pwd)/temp/libonnxruntime.1.20.dylib" ./scripts/run.sh
event manager running
main function started
2025-02-22T16:02:56.831585Z ERROR ThreadId(32) cpu_timer: CPU timer: not enabled (need Linux)
{
  timestamp: "2025-02-22T16:02:56.832Z",
  event_type: "Boot",
  event: [Object: null prototype] { boot_time: 105 },
  metadata: [Object: null prototype] {
    service_path: "./examples/ort-rust-backend",
    execution_id: "10bb7da6-05c9-4498-8386-cbbf66d4ef21"
  }
}
dtype not specified for "model". Using the default dtype (fp32) for this device (auto).

{
  timestamp: "2025-02-22T16:03:01.009Z",
  event_type: "Shutdown",
  event: [Object: null prototype] {
    reason: "EarlyDrop",
    cpu_time_used: 1313,
    memory_used: [Object: null prototype] {
      total: 145202195,
      heap: 9199016,
      external: 136003179,
      mem_check_captured: [Object: null prototype] {
        current: [Object: null prototype] {
          totalHeapSize: 12025856,
          totalHeapSizeExecutable: 524288,
          totalPhysicalSize: 11206656,
          totalAvailableSize: 148545372,
          totalGlobalHandlesSize: 24576,
          usedGlobalHandlesSize: 20000,
          usedHeapSize: 9200480,
          mallocedMemory: 540772,
          externalMemory: 136003179,
          peakMallocedMemory: 8513536
        },
        exceeded: false
      }
    }
  },
  metadata: [Object: null prototype] {
    service_path: "./examples/ort-rust-backend",
    execution_id: "10bb7da6-05c9-4498-8386-cbbf66d4ef21"
  }
}

$ curl localhost:9998/ort-rust-backend
{"ort_tensor":{"cpuData":{"0":-0.03076750412583351,"1":0.0007690626662224531, ...

Deno v2

$ ORT_DYLIB_PATH="$(pwd)/temp/libonnxruntime.1.20.dylib" ./scripts/run.sh
event manager running
main function started
{
  deno: "supabase-edge-runtime-0.1.0 (compatible with Deno v2.1.4)",
  v8: "11.6.189.12",
  typescript: "5.1.6"
}
2025-02-22T15:39:12.856039Z ERROR ThreadId(35) cpu_timer: CPU timer: not enabled (need Linux)
{
  timestamp: "2025-02-22T15:39:12.856Z",
  event_type: "Boot",
  event: [Object: null prototype] { boot_time: 107 },
  metadata: [Object: null prototype] {
    service_path: "./examples/ort-rust-backend",
    execution_id: "a85bfb82-cd8c-4ce6-ac14-4fc5ae207d1e"
  }
}
dtype not specified for "model". Using the default dtype (fp32) for this device (auto).

2025-02-22T15:39:15.709103Z ERROR sb-supervisor base::worker::supervisor::strategy_per_worker: memory limit reached for the worker: isolate: a85bfb82-cd8c-4ce6-ac14-4fc5ae207d1e
2025-02-22T15:39:15.776976Z ERROR          main base::worker::pool: failed to send request to user worker: request has been cancelled by supervisor
2025-02-22T15:39:15.777106Z ERROR ThreadId(03) ext_workers: user worker failed to respond: request has been cancelled by supervisor
WorkerRequestCancelled: request has been cancelled by supervisor
    at async Function.allSettled (<anonymous>)
    at async UserWorker.fetch (ext:user_workers/user_workers.js:78:63)
    at async callWorker (file:///var/tmp/sb-compile-edge-runtime/edge-runtime/examples/main/index.ts:131:14)
    at async respond (ext:runtime/http.js:202:16) {
  name: "WorkerRequestCancelled"
}
{
  timestamp: "2025-02-22T15:39:16.352Z",
  event_type: "Shutdown",
  event: [Object: null prototype] {
    reason: "Memory",
    cpu_time_used: 440,
    memory_used: [Object: null prototype] {
      total: 151218656,
      heap: 15113016,
      external: 136105640,
      mem_check_captured: [Object: null prototype] {
        current: [Object: null prototype] {
          totalHeapSize: 25051136,
          totalHeapSizeExecutable: 524288,
          totalPhysicalSize: 24936448,
          totalAvailableSize: 135253480,
          totalGlobalHandlesSize: 24576,
          usedGlobalHandlesSize: 20640,
          usedHeapSize: 21983728,
          mallocedMemory: 1867952,
          externalMemory: 136817301,
          peakMallocedMemory: 8480768
        },
        exceeded: true
      }
    }
  },
  metadata: [Object: null prototype] {
    service_path: "./examples/ort-rust-backend",
    execution_id: "a85bfb82-cd8c-4ce6-ac14-4fc5ae207d1e"
  }
}

Same happens if I checkout develop without this PR

nyannyacha · 2025-02-24T00:47:23Z

Yeah, when we upgraded to deno 2, the used heap size increased by about 2.4 x (9.2 MiB -> 21.9 MiB). So, it exceeded the 150 MiB memory limit, which was already tight.

kallebysantos marked this pull request as ready for review February 4, 2025 10:41

kallebysantos force-pushed the feat-ort-upgrade branch 2 times, most recently from bfe239b to 7970f6a Compare February 4, 2025 20:51

nyannyacha approved these changes Feb 5, 2025

View reviewed changes

kallebysantos force-pushed the feat-ort-upgrade branch from 068fb62 to 36ae2ca Compare February 8, 2025 16:44

kallebysantos and others added 19 commits February 22, 2025 13:42

chore(sb_ai): update Cargo.toml

aed6606

chore(sb_ai): update Cargo.lock

1687971

chore(sb_ai): update ONNXRUNTIME_VERSION

b732c90

stamp: solving ort api breaking changes

9c92016

- `ndarray_linalg`: Since `ort` deps to `ndarray=0.16`, decided to remove this extra dependency. Normalization can be done in a simple small funtion instead of deriving it from an external crate.

stamp: converting from_ort_tensor without clonning

a6a793b

using the `rc-9` apis to get the same results from older `rc-2`

test(sb_ai): removing snapshots from onnx v1.19.2

275ca4f

Since `ONNX`version has been upgraded, the snapshots need to be recreated

stamp: cleaning

fc54554

test(sb_ai): adding macos_aarch64 snapshots with onnx v1.20

5fc90e3

feat: apply normalize without external deps

08db555

- Since `ndarray_linalg` has been removed we need to manually apply L2 normalization

test(sb_ai): ensuring mean_pool and normalize

65c2beb

- Applying `dot product` between two embeddings to ensures that `mean_pool` and `normalize` is working properly

chore: add aarch64 (linux) snapshots for integration tests

f36ce6e

chore: add x86_64 (macos) snapshots for integration tests

9960bcd

chore: add x86_64 (linux) snapshots for integration tests

e1e167a

feat: add Tensor<String> support

7e7a600

Co-authored-by: Nyannyacha <[email protected]> Co-authored-by: kallebysantos <[email protected]>

test(sb_ai): adding Tensor<string> integration test

55a32f7

fix(sb_ai): adding 'bool' alias to JsTensorData

8a990d6

fix(sb_ai): parsing input tensors correctly

3c64647

test(sb_ai): updating extract_tensor_ref to match JsTensorData enum

6d7ec9a

fix(ai): reflecting module rename

0893e43

kallebysantos changed the base branch from main to develop February 22, 2025 15:36

kallebysantos force-pushed the feat-ort-upgrade branch from 36ae2ca to 0893e43 Compare February 22, 2025 15:37

kallebysantos requested a review from nyannyacha February 22, 2025 15:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: onnx runtime upgrade #478

feat: onnx runtime upgrade #478

kallebysantos commented Jan 23, 2025 •

edited

Loading

nyannyacha commented Jan 23, 2025

nyannyacha commented Feb 4, 2025 •

edited

Loading

kallebysantos commented Feb 4, 2025

nyannyacha commented Feb 5, 2025

nyannyacha left a comment

nyannyacha commented Feb 8, 2025

kallebysantos commented Feb 22, 2025 •

edited

Loading

nyannyacha commented Feb 24, 2025

feat: onnx runtime upgrade #478

Are you sure you want to change the base?

feat: onnx runtime upgrade #478

Conversation

kallebysantos commented Jan 23, 2025 • edited Loading

What kind of change does this PR introduce?

What is the current behavior?

What is the new behavior?

nyannyacha commented Jan 23, 2025

nyannyacha commented Feb 4, 2025 • edited Loading

kallebysantos commented Feb 4, 2025

nyannyacha commented Feb 5, 2025

nyannyacha left a comment

Choose a reason for hiding this comment

nyannyacha commented Feb 8, 2025

kallebysantos commented Feb 22, 2025 • edited Loading

nyannyacha commented Feb 24, 2025

kallebysantos commented Jan 23, 2025 •

edited

Loading

nyannyacha commented Feb 4, 2025 •

edited

Loading

kallebysantos commented Feb 22, 2025 •

edited

Loading