Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: onnx runtime upgrade #478

Open
wants to merge 19 commits into
base: develop
Choose a base branch
from

Conversation

kallebysantos
Copy link
Contributor

@kallebysantos kallebysantos commented Jan 23, 2025

What kind of change does this PR introduce?

Refactor, upgrade

What is the current behavior?

Current the ort rust backend is using ort rc-2 & onnx v1.19.2

What is the new behavior?

This PR introduces:

  • ort: library upgrade from rc-2 to rc-9
  • onnx: support from 1.19.2 to 1.20.1
  • ndarray_linalg: removal of unnecessary library, that last release was about 2y ago
  • applied some code refactor and cleaning

TODO:

  • Add support for String tensors.
  • Upgrade to upcoming ort rc-10 Lets use rc-9 for now, then I push another PR for rc-10

Need help:
I would like to ask @nyannyacha 💚, if possible, to add updated tests snapshots for other platforms as well do k6 tests comparing to the latest version of that.

@nyannyacha
Copy link
Contributor

I would like to ask @nyannyacha 💚, if possible, to add updated tests snapshots for other platforms as well do k6 tests comparing to the latest version of that.

Sure! I will do it for you.

@nyannyacha
Copy link
Contributor

nyannyacha commented Feb 4, 2025

@kallebysantos I think your inference api branch is ahead of this PR, what do you think? If you don't intend to merge this PR, I think it's probably a good idea to close it.

(But if that was a temporary branch for testing, I think it would be fine to leave this PR open.)

@kallebysantos
Copy link
Contributor Author

@kallebysantos I think your inference api branch is ahead of this PR, what do you think? If you don't intend to merge this PR, I think it's probably a good idea to close it.

(But if that was a temporary branch for testing, I think it would be fine to leave this PR open.)

Hi @nyannyacha thank you for reviewing it, you're partially right let me explain:

  • this PR, I put here all the onnx runtime version upgrades. If you check the Todo bullet point I'm still waiting for ort pyke to release the rc-10, but if you think that is fine to keep the version I already set we can merge it and then in the future I will push another PR for rc-10.

The ort maintener tell me that was planning to release rc-10 probably this month (February)

  • The inference API branch is planned to be a future PR after this one. So I started it from here, since I did already establish the rc-9 version.

There's another thing, I would like to support String Tensors cause I will probably need them for inference api - Maybe I could move this task to be included on the next PR while we discuss better about this topic. I found some issues in adding String Tensors into the current code base without use serde

@kallebysantos kallebysantos marked this pull request as ready for review February 4, 2025 10:41
@kallebysantos kallebysantos force-pushed the feat-ort-upgrade branch 2 times, most recently from bfe239b to 7970f6a Compare February 4, 2025 20:51
@nyannyacha
Copy link
Contributor

Sidenote:
There was no significant change in the request throughput when the k6 was run locally

Copy link
Contributor

@nyannyacha nyannyacha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nyannyacha
Copy link
Contributor

I believe the reason Coveralls reported that this PR's coverage was reduced is probably due to the tests related to s3 fs.
They require a github secret, which is not accessible on the fork branch. This is why the step to set the secret was disabled, and the tests for s3 fs were skipped.

kallebysantos and others added 19 commits February 22, 2025 13:42
- `ndarray_linalg`: Since `ort` deps to `ndarray=0.16`, decided to
remove this extra dependency. Normalization can be done in a simple
small funtion instead of deriving it from an external crate.
using the `rc-9` apis to get the same results from older `rc-2`
Since `ONNX`version has been upgraded, the snapshots need to be
recreated
- Since `ndarray_linalg` has been removed we need to manually apply L2
normalization
- Applying `dot product` between two embeddings to ensures that
`mean_pool` and `normalize` is working properly
Co-authored-by: Nyannyacha <[email protected]>
Co-authored-by: kallebysantos <[email protected]>
@kallebysantos kallebysantos changed the base branch from main to develop February 22, 2025 15:36
@kallebysantos
Copy link
Contributor Author

kallebysantos commented Feb 22, 2025

Hello @nyannyacha, I did rebase to develop and the integration tests is passing, but I'm facing some issue when executing with lower memory limits (default main.ts limits):

Deno v1
$ ORT_DYLIB_PATH="$(pwd)/temp/libonnxruntime.1.20.dylib" ./scripts/run.sh
event manager running
main function started
2025-02-22T16:02:56.831585Z ERROR ThreadId(32) cpu_timer: CPU timer: not enabled (need Linux)
{
  timestamp: "2025-02-22T16:02:56.832Z",
  event_type: "Boot",
  event: [Object: null prototype] { boot_time: 105 },
  metadata: [Object: null prototype] {
    service_path: "./examples/ort-rust-backend",
    execution_id: "10bb7da6-05c9-4498-8386-cbbf66d4ef21"
  }
}
dtype not specified for "model". Using the default dtype (fp32) for this device (auto).

{
  timestamp: "2025-02-22T16:03:01.009Z",
  event_type: "Shutdown",
  event: [Object: null prototype] {
    reason: "EarlyDrop",
    cpu_time_used: 1313,
    memory_used: [Object: null prototype] {
      total: 145202195,
      heap: 9199016,
      external: 136003179,
      mem_check_captured: [Object: null prototype] {
        current: [Object: null prototype] {
          totalHeapSize: 12025856,
          totalHeapSizeExecutable: 524288,
          totalPhysicalSize: 11206656,
          totalAvailableSize: 148545372,
          totalGlobalHandlesSize: 24576,
          usedGlobalHandlesSize: 20000,
          usedHeapSize: 9200480,
          mallocedMemory: 540772,
          externalMemory: 136003179,
          peakMallocedMemory: 8513536
        },
        exceeded: false
      }
    }
  },
  metadata: [Object: null prototype] {
    service_path: "./examples/ort-rust-backend",
    execution_id: "10bb7da6-05c9-4498-8386-cbbf66d4ef21"
  }
}
$ curl localhost:9998/ort-rust-backend
{"ort_tensor":{"cpuData":{"0":-0.03076750412583351,"1":0.0007690626662224531, ... 
Deno v2
$ ORT_DYLIB_PATH="$(pwd)/temp/libonnxruntime.1.20.dylib" ./scripts/run.sh
event manager running
main function started
{
  deno: "supabase-edge-runtime-0.1.0 (compatible with Deno v2.1.4)",
  v8: "11.6.189.12",
  typescript: "5.1.6"
}
2025-02-22T15:39:12.856039Z ERROR ThreadId(35) cpu_timer: CPU timer: not enabled (need Linux)
{
  timestamp: "2025-02-22T15:39:12.856Z",
  event_type: "Boot",
  event: [Object: null prototype] { boot_time: 107 },
  metadata: [Object: null prototype] {
    service_path: "./examples/ort-rust-backend",
    execution_id: "a85bfb82-cd8c-4ce6-ac14-4fc5ae207d1e"
  }
}
dtype not specified for "model". Using the default dtype (fp32) for this device (auto).

2025-02-22T15:39:15.709103Z ERROR sb-supervisor base::worker::supervisor::strategy_per_worker: memory limit reached for the worker: isolate: a85bfb82-cd8c-4ce6-ac14-4fc5ae207d1e
2025-02-22T15:39:15.776976Z ERROR          main base::worker::pool: failed to send request to user worker: request has been cancelled by supervisor
2025-02-22T15:39:15.777106Z ERROR ThreadId(03) ext_workers: user worker failed to respond: request has been cancelled by supervisor
WorkerRequestCancelled: request has been cancelled by supervisor
    at async Function.allSettled (<anonymous>)
    at async UserWorker.fetch (ext:user_workers/user_workers.js:78:63)
    at async callWorker (file:///var/tmp/sb-compile-edge-runtime/edge-runtime/examples/main/index.ts:131:14)
    at async respond (ext:runtime/http.js:202:16) {
  name: "WorkerRequestCancelled"
}
{
  timestamp: "2025-02-22T15:39:16.352Z",
  event_type: "Shutdown",
  event: [Object: null prototype] {
    reason: "Memory",
    cpu_time_used: 440,
    memory_used: [Object: null prototype] {
      total: 151218656,
      heap: 15113016,
      external: 136105640,
      mem_check_captured: [Object: null prototype] {
        current: [Object: null prototype] {
          totalHeapSize: 25051136,
          totalHeapSizeExecutable: 524288,
          totalPhysicalSize: 24936448,
          totalAvailableSize: 135253480,
          totalGlobalHandlesSize: 24576,
          usedGlobalHandlesSize: 20640,
          usedHeapSize: 21983728,
          mallocedMemory: 1867952,
          externalMemory: 136817301,
          peakMallocedMemory: 8480768
        },
        exceeded: true
      }
    }
  },
  metadata: [Object: null prototype] {
    service_path: "./examples/ort-rust-backend",
    execution_id: "a85bfb82-cd8c-4ce6-ac14-4fc5ae207d1e"
  }
}

Same happens if I checkout develop without this PR

@nyannyacha
Copy link
Contributor

Yeah, when we upgraded to deno 2, the used heap size increased by about 2.4 x (9.2 MiB -> 21.9 MiB). So, it exceeded the 150 MiB memory limit, which was already tight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants