ONNX Embedding Model Thread-Safety Issue

## Bug description
When using the default ONNX embedding model in Spring AI (`all-MiniLM-L6-v2`), running the embedding process asynchronously with a `ThreadPoolTaskExecutor` results in inconsistent behavior and occasional runtime exceptions. The issue does not occur when executing the process synchronously.

## Environment
**Java Version:** 17
**Spring Boot Version:** Latest
**Spring AI Version:** 1.0.0-M5
**Vector Store**: Qdrant (though likely unrelated)
**ONNX Model:** Default (`all-MiniLM-L6-v2`)

## Steps to reproduce
Configure a ThreadPoolTaskExecutor for handling embedding asynchronously

```java
@Bean(name = "embeddingThreadPoolTaskExecutor")
public ThreadPoolTaskExecutor threadPoolTaskExecutor() {
    ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
    executor.setCorePoolSize(2);
    executor.setMaxPoolSize(4);
    executor.setQueueCapacity(25);
    executor.setThreadNamePrefix("EmbeddingAsyncExecutor-");
    executor.setWaitForTasksToCompleteOnShutdown(true);
    executor.setAwaitTerminationSeconds(60);
    executor.initialize();
    return executor;
}
``` 
Call an embedding process asynchronously
```java
@Async("embeddingThreadPoolTaskExecutor")
public UploadResponse load(String tenant, MultipartFile file, DocumentMetadata metadata) {
    try {
        byte[] fileBytes = file.getBytes();
        ByteArrayResource resource = new ByteArrayResource(fileBytes);
        TikaDocumentReader documentReader = new TikaDocumentReader(resource);

        List<Document> documents = documentReader.get();
        documents.forEach(document -> document.getMetadata().put("entityId", metadata.getEntityId()));
        List<Document> splitDocuments = tokenTextSplitter.apply(documents);

        vectorStoreService.getVectorStore(tenant).add(splitDocuments);
        return new UploadResponse(true, "OK");
    } catch (Exception e) {
        log.error("Error processing file: {}", file.getOriginalFilename(), e);
        return new UploadResponse(false, e.getMessage());
    }
}

``` 
Process multiple files in parallel
```java
@Test
@SneakyThrows
public void load() {
    Resource folderResource = new ClassPathResource("foo");
    File folder = folderResource.getFile();
    for (File fileEntry : Objects.requireNonNull(folder.listFiles())) {
        InputStream inputStream = new FileInputStream(fileEntry);
        String mimeType = URLConnection.guessContentTypeFromName(fileEntry.getName());
        MultipartFile file = new MockMultipartFile("file", fileEntry.getName(), mimeType, inputStream);
        documentVectorService.loadAsync("foo", file, new DocumentMetadata());
    }
}

```

## Expected behavior
The embedding process should run correctly across multiple threads. 

## Observed behavior 

- Running the method synchronously works fine.
- Running it asynchronously causes intermittent failures.
- Running the embedding model in a single-threaded executor gives the same error.
- Switching to OpenAI embeddings works fine, reinforcing the idea that the problem is ONNX-related.

## Logs 
```
2025-01-31 17:06:11.001 ERROR [semantic-search-server,,] [EmbeddingAsyncExecutor-1] i.c.w.s.s.etl.DocumentVectorService     : Errore durante l'upload del file: Svizzera.pdf (load DocumentVectorService.java 70)
java.lang.RuntimeException: ai.onnxruntime.OrtException: Error code - ORT_RUNTIME_EXCEPTION - message: Non-zero status code returned while running Add node. Name:'/encoder/layer.0/attention/self/Add' Status Message: D:\a\_work\1\s\include\onnxruntime\core/common/logging/logging.h:340 onnxruntime::logging::LoggingManager::DefaultLogger Attempt to use DefaultLogger but none has been registered.

	at org.springframework.ai.transformers.TransformersEmbeddingModel.lambda$call$3(TransformersEmbeddingModel.java:351)
	at io.micrometer.observation.Observation.observe(Observation.java:564)
	at org.springframework.ai.transformers.TransformersEmbeddingModel.call(TransformersEmbeddingModel.java:298)
	at org.springframework.ai.embedding.EmbeddingModel.embed(EmbeddingModel.java:91)
	at org.springframework.ai.vectorstore.qdrant.QdrantVectorStore.doAdd(QdrantVectorStore.java:220)
	at org.springframework.ai.vectorstore.observation.AbstractObservationVectorStore.lambda$add$1(AbstractObservationVectorStore.java:91)
	at io.micrometer.observation.Observation.observe(Observation.java:498)
	at org.springframework.ai.vectorstore.observation.AbstractObservationVectorStore.add(AbstractObservationVectorStore.java:91)
	at it.cegeka.wemaind.semantic_search_server.service.etl.DocumentVectorService.load(DocumentVectorService.java:65)
	at it.cegeka.wemaind.semantic_search_server.service.etl.DocumentVectorService.loadAsync(DocumentVectorService.java:42)
	at jdk.internal.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:359)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:196)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
	at org.springframework.aop.interceptor.AsyncExecutionInterceptor.lambda$invoke$0(AsyncExecutionInterceptor.java:114)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: ai.onnxruntime.OrtException: Error code - ORT_RUNTIME_EXCEPTION - message: Non-zero status code returned while running Add node. Name:'/encoder/layer.0/attention/self/Add' Status Message: D:\a\_work\1\s\include\onnxruntime\core/common/logging/logging.h:340 onnxruntime::logging::LoggingManager::DefaultLogger Attempt to use DefaultLogger but none has been registered.

	at ai.onnxruntime.OrtSession.run(Native Method)
	at ai.onnxruntime.OrtSession.run(OrtSession.java:395)
	at ai.onnxruntime.OrtSession.run(OrtSession.java:242)
	at ai.onnxruntime.OrtSession.run(OrtSession.java:210)
	at org.springframework.ai.transformers.TransformersEmbeddingModel.lambda$call$3(TransformersEmbeddingModel.java:327)
	... 20 common frames omitted
``` 

## Other infos
The JVM crashed in some tests: 
```
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00007fff9109bef3, pid=6540, tid=18268
#
# JRE version: OpenJDK Runtime Environment (17.0.10+13) (build 17.0.10+13-LTS)
# Java VM: OpenJDK 64-Bit Server VM (17.0.10+13-LTS, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, windows-amd64)
# Problematic frame:
# C[thread 29912 also had an error]
  [onnxruntime.dll+0x71bef3]
#
# No core dump will be written. Minidumps are not enabled by default on client versions of Windows
#
# An error report file with more information is saved as:
# C:\Users\alfredog\IdeaProjects\Microservices\semantic-search-server\hs_err_pid6540.log
#
# If you would like to submit a bug report, please visit:
#   https://bell-sw.com/support
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
``` 





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ONNX Embedding Model Thread-Safety Issue #2152

Bug description

Environment

Steps to reproduce

Expected behavior

Observed behavior

Logs

Other infos

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ONNX Embedding Model Thread-Safety Issue #2152

Description

Bug description

Environment

Steps to reproduce

Expected behavior

Observed behavior

Logs

Other infos

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions