Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ollama Portable Zip on Windows with Intel ARC B580 and nomic-embed-text wsarecv an existing connection was forcibly closed by the remote host #12914

Open
DediCATeD88 opened this issue Feb 28, 2025 · 6 comments
Assignees

Comments

@DediCATeD88
Copy link

Ollama Portable Zip on Windows with Intel ARC B580 and nomic-embed-text:

ERROR source=routes.go:479 msg="embedding generation failed" error="do embedding request: Post "http://127.0.0.1:55895/embedding\": read tcp 127.0.0.1:55905->127.0.0.1:55895: wsarecv: Eine vorhandene Verbindung wurde vom Remotehost geschlossen."

Found 1 SYCL devices:
| | | | |Max | |Max |Global |
|
| | | | |compute|Max work|sub |mem |
|

ID Device Type Name Version units group group size Driver version
0 [level_zero:gpu:0] Intel Arc B580 Graphics 20.1 160 1024 32 12508M 1.6.31896
llama_kv_cache_init: SYCL0 KV buffer size = 288.00 MiB
llama_new_context_with_model: KV self size = 288.00 MiB, K (i8): 144.00 MiB, V (i8): 144.00 MiB
llama_new_context_with_model: SYCL_Host output buffer size = 0.00 MiB
llama_new_context_with_model: SYCL0 compute buffer size = 17.50 MiB
llama_new_context_with_model: SYCL_Host compute buffer size = 3.50 MiB
llama_new_context_with_model: graph nodes = 429
llama_new_context_with_model: graph splits = 4 (with bs=512), 2 (with bs=1)
time=2025-02-28T17:21:20.882+01:00 level=WARN source=runner.go:892 msg="%s: warming up the model with an empty run - please wait ... " !BADKEY=loadModel
time=2025-02-28T17:21:20.922+01:00 level=INFO source=server.go:610 msg="llama runner started in 3.02 seconds"
llama_load_model_from_file: using device SYCL0 (Intel(R) Arc(TM) B580 Graphics) - 8392 MiB free
llama_model_loader: loaded meta data with 24 key-value pairs and 112 tensors from C:\Users\Admin.ollama\models\blobs\sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = nomic-bert
llama_model_loader: - kv 1: general.name str = nomic-embed-text-v1.5
llama_model_loader: - kv 2: nomic-bert.block_count u32 = 12
llama_model_loader: - kv 3: nomic-bert.context_length u32 = 2048
llama_model_loader: - kv 4: nomic-bert.embedding_length u32 = 768
llama_model_loader: - kv 5: nomic-bert.feed_forward_length u32 = 3072
llama_model_loader: - kv 6: nomic-bert.attention.head_count u32 = 12
llama_model_loader: - kv 7: nomic-bert.attention.layer_norm_epsilon f32 = 0.000000
llama_model_loader: - kv 8: general.file_type u32 = 1
llama_model_loader: - kv 9: nomic-bert.attention.causal bool = false
llama_model_loader: - kv 10: nomic-bert.pooling_type u32 = 1
llama_model_loader: - kv 11: nomic-bert.rope.freq_base f32 = 1000.000000
llama_model_loader: - kv 12: tokenizer.ggml.token_type_count u32 = 2
llama_model_loader: - kv 13: tokenizer.ggml.bos_token_id u32 = 101
llama_model_loader: - kv 14: tokenizer.ggml.eos_token_id u32 = 102
llama_model_loader: - kv 15: tokenizer.ggml.model str = bert
llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,30522] = ["[PAD]", "[unused0]", "[unused1]", "...
llama_model_loader: - kv 17: tokenizer.ggml.scores arr[f32,30522] = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,30522] = [3, 1, 1, 1, 1, 1, 1, 1, 1,1, 1, 1, ...
llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 100
llama_model_loader: - kv 20: tokenizer.ggml.seperator_token_id u32 = 102
llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 0
llama_model_loader: - kv 22: tokenizer.ggml.cls_token_id u32 = 101
llama_model_loader: - kv 23: tokenizer.ggml.mask_token_id u32 = 103
llama_model_loader: - type f32: 51 tensors
llama_model_loader: - type f16: 61 tensors
llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
llm_load_vocab: special tokens cache size = 5
llm_load_vocab: token to piece cache size = 0.2032 MB
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = nomic-bert
llm_load_print_meta: vocab type = WPM
llm_load_print_meta: n_vocab = 30522
llm_load_print_meta: n_merges = 0
llm_load_print_meta: vocab_only = 1
llm_load_print_meta: model type = ?B
llm_load_print_meta: model ftype = all F32
llm_load_print_meta: model params = 136.73 M
llm_load_print_meta: model size = 260.86 MiB (16.00 BPW)
llm_load_print_meta: general.name = nomic-embed-text-v1.5
llm_load_print_meta: BOS token = 101 '[CLS]'
llm_load_print_meta: EOS token = 102 '[SEP]'
llm_load_print_meta: UNK token = 100 '[UNK]'
llm_load_print_meta: SEP token = 102 '[SEP]'
llm_load_print_meta: PAD token = 0 '[PAD]'
llm_load_print_meta: CLS token = 101 '[CLS]'
llm_load_print_meta: MASK token = 103 '[MASK]'
llm_load_print_meta: LF token = 0 '[PAD]'
llm_load_print_meta: EOG token = 102 '[SEP]'
llm_load_print_meta: max token length = 21
llama_model_load: vocab only - skipping tensors
Exception 0xc000001d 0x0 0x0 0x7ffa1a5c2a16
PC=0x7ffa1a5c2a16
signal arrived during external code execution

runtime.cgocall(0x7ff77812c940, 0xc00040bb90)
runtime/cgocall.go:167 +0x3e fp=0xc00040bb68 sp=0xc00040bb00 pc=0x7ff777579c1e
ollama/llama/llamafile._Cfunc_llama_decode(0x21cfa01d520, {0x1, 0x21cf9ea45d0, 0x0, 0x0, 0x21cfa0c62e0, 0x21cfa24c550, 0x21cf9ea7620, 0x21cf9e5dd70})
_cgo_gotypes.go:550 +0x55 fp=0xc00040bb90 sp=0xc00040bb68 pc=0x7ff777950735
ollama/llama/llamafile.(*Context).Decode.func1(0x7ff77795f8eb?, 0x21cfa01d520?)
ollama/llama/llamafile/llama.go:143 +0xf5 fp=0xc00040bc80 sp=0xc00040bb90 pc=0x7ff777953715
ollama/llama/llamafile.(*Context).Decode(0x7ff778da3400?, 0x0?)
ollama/llama/llamafile/llama.go:143 +0x13 fp=0xc00040bcc8 sp=0xc00040bc80 pc=0x7ff777953593
ollama/llama/runner.(*Server).processBatch(0xc000121560, 0xc0002002a0, 0xc00040bf20)
ollama/llama/runner/runner.go:434 +0x23f fp=0xc00040bee0 sp=0xc00040bcc8 pc=0x7ff77795e5bf
ollama/llama/runner.(*Server).run(0xc000121560, {0x7ff77856c360, 0xc0000f12c0})
ollama/llama/runner/runner.go:342 +0x1d5 fp=0xc00040bfb8 sp=0xc00040bee0 pc=0x7ff77795dff5
ollama/llama/runner.Execute.gowrap2()
ollama/llama/runner/runner.go:1006 +0x28 fp=0xc00040bfe0 sp=0xc00040bfb8 pc=0x7ff777963268
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc00040bfe8 sp=0xc00040bfe0 pc=0x7ff777588901
created by ollama/llama/runner.Execute in goroutine 1
ollama/llama/runner/runner.go:1006 +0xde5

goroutine 1 gp=0xc00005c000 m=nil [IO wait]:
runtime.gopark(0x7ff77758a0c0?, 0x7ff778d32ac0?, 0x20?, 0xf?, 0xc000530fcc?)
runtime/proc.go:424 +0xce fp=0xc00037d418 sp=0xc00037d3f8 pc=0x7ff7775803ce
runtime.netpollblock(0x3ec?, 0x77518366?, 0xf7?)
runtime/netpoll.go:575 +0xf7 fp=0xc00037d450 sp=0xc00037d418 pc=0x7ff777544f97
internal/poll.runtime_pollWait(0x21c9e5db670, 0x72)
runtime/netpoll.go:351 +0x85 fp=0xc00037d470 sp=0xc00037d450 pc=0x7ff77757f645
internal/poll.(*pollDesc).wait(0x7ff777612bd5?, 0x7ff77757ae7d?, 0x0)
internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00037d498 sp=0xc00037d470 pc=0x7ff777614207
internal/poll.execIO(0xc000530f20, 0xc000375540)
internal/poll/fd_windows.go:177 +0x105 fp=0xc00037d510 sp=0xc00037d498 pc=0x7ff777615645
internal/poll.(*FD).acceptOne(0xc000530f08, 0x3d4, {0xc0005580f0?, 0xc0003755a0?, 0x7ff77761d3c5?}, 0xc0003755d4?)
internal/poll/fd_windows.go:946 +0x65 fp=0xc00037d570 sp=0xc00037d510 pc=0x7ff777619c85
internal/poll.(*FD).Accept(0xc000530f08, 0xc00037d720)
internal/poll/fd_windows.go:980 +0x1b6 fp=0xc00037d628 sp=0xc00037d570 pc=0x7ff777619fb6
net.(*netFD).accept(0xc000530f08)
net/fd_windows.go:182 +0x4b fp=0xc00037d740 sp=0xc00037d628 pc=0x7ff77768082b
net.(*TCPListener).accept(0xc0002b6dc0)
net/tcpsock_posix.go:159 +0x1e fp=0xc00037d790 sp=0xc00037d740 pc=0x7ff77769699e
net.(*TCPListener).Accept(0xc0002b6dc0)
net/tcpsock.go:372 +0x30 fp=0xc00037d7c0 sp=0xc00037d790 pc=0x7ff777695750
net/http.(*onceCloseListener).Accept(0xc000121b00?)
:1 +0x24 fp=0xc00037d7d8 sp=0xc00037d7c0 pc=0x7ff777910044
net/http.(*Server).Serve(0xc0003c53b0, {0x7ff77856a0d0, 0xc0002b6dc0})
net/http/server.go:3330 +0x30c fp=0xc00037d908 sp=0xc00037d7d8 pc=0x7ff7778e7fcc
ollama/llama/runner.Execute({0xc0000c6010?, 0x0?, 0x0?})
ollama/llama/runner/runner.go:1027 +0x11a9 fp=0xc00037dca8 sp=0xc00037d908 pc=0x7ff777962f49
ollama/cmd.NewCLI.func2(0xc0004ca008?, {0x7ff7783ac40e?, 0x4?, 0x7ff7783ac412?})
ollama/cmd/cmd.go:1430 +0x45 fp=0xc00037dcd0 sp=0xc00037dca8 pc=0x7ff77812bda5
github.com/spf13/cobra.(*Command).execute(0xc0004ca008, {0xc0004ee0f0, 0xf, 0xf})
github.com/spf13/[email protected]/command.go:985 +0xaaa fp=0xc00037de58 sp=0xc00037dcd0 pc=0x7ff77771a4ea
github.com/spf13/cobra.(*Command).ExecuteC(0xc00053c308)
github.com/spf13/[email protected]/command.go:1117 +0x3ff fp=0xc00037df30 sp=0xc00037de58 pc=0x7ff77771adbf
github.com/spf13/cobra.(*Command).Execute(...)
github.com/spf13/[email protected]/command.go:1041
github.com/spf13/cobra.(*Command).ExecuteContext(...)
github.com/spf13/[email protected]/command.go:1034
main.main()
ollama/main.go:12 +0x4d fp=0xc00037df50 sp=0xc00037df30 pc=0x7ff77812c40d
runtime.main()
runtime/proc.go:272 +0x27d fp=0xc00037dfe0 sp=0xc00037df50 pc=0x7ff77754df9d
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc00037dfe8 sp=0xc00037dfe0 pc=0x7ff777588901

goroutine 2 gp=0xc00005c700 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:424 +0xce fp=0xc00005ffa8 sp=0xc00005ff88 pc=0x7ff7775803ce
runtime.goparkunlock(...)
runtime/proc.go:430
runtime.forcegchelper()
runtime/proc.go:337 +0xb8 fp=0xc00005ffe0 sp=0xc00005ffa8 pc=0x7ff77754e2b8
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc00005ffe8 sp=0xc00005ffe0 pc=0x7ff777588901
created by runtime.init.7 in goroutine 1
runtime/proc.go:325 +0x1a

goroutine 3 gp=0xc00005ca80 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:424 +0xce fp=0xc000061f80 sp=0xc000061f60 pc=0x7ff7775803ce
runtime.goparkunlock(...)
runtime/proc.go:430
runtime.bgsweep(0xc00006c000)
runtime/mgcsweep.go:317 +0xdf fp=0xc000061fc8 sp=0xc000061f80 pc=0x7ff777536f9f
runtime.gcenable.gowrap1()
runtime/mgc.go:204 +0x25 fp=0xc000061fe0 sp=0xc000061fc8 pc=0x7ff77752b5c5
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000061fe8 sp=0xc000061fe0 pc=0x7ff777588901
created by runtime.gcenable in goroutine 1
runtime/mgc.go:204 +0x66

goroutine 4 gp=0xc00005cc40 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x7ff778559520?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:424 +0xce fp=0xc000073f78 sp=0xc000073f58 pc=0x7ff7775803ce
runtime.goparkunlock(...)
runtime/proc.go:430
runtime.(*scavengerState).park(0x7ff778d56940)
runtime/mgcscavenge.go:425 +0x49 fp=0xc000073fa8 sp=0xc000073f78 pc=0x7ff777534969
runtime.bgscavenge(0xc00006c000)
runtime/mgcscavenge.go:658 +0x59 fp=0xc000073fc8 sp=0xc000073fa8 pc=0x7ff777534ef9
runtime.gcenable.gowrap2()
runtime/mgc.go:205 +0x25 fp=0xc000073fe0 sp=0xc000073fc8 pc=0x7ff77752b565
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000073fe8 sp=0xc000073fe0 pc=0x7ff777588901
created by runtime.gcenable in goroutine 1
runtime/mgc.go:205 +0xa5

goroutine 18 gp=0xc000086380 m=nil [finalizer wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:424 +0xce fp=0xc00006fe20 sp=0xc00006fe00 pc=0x7ff7775803ce
runtime.runfinq()
runtime/mfinal.go:193 +0x107 fp=0xc00006ffe0 sp=0xc00006fe20 pc=0x7ff77752a687
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc00006ffe8 sp=0xc00006ffe0 pc=0x7ff777588901
created by runtime.createfing in goroutine 1
runtime/mfinal.go:163 +0x3d

goroutine 19 gp=0xc000087500 m=nil [chan receive]:
runtime.gopark(0xc000063f60?, 0x7ff77766a2e5?, 0xe0?, 0x27?, 0x7ff778580260?)
runtime/proc.go:424 +0xce fp=0xc000063f18 sp=0xc000063ef8 pc=0x7ff7775803ce
runtime.chanrecv(0xc000088380, 0x0, 0x1)
runtime/chan.go:639 +0x41e fp=0xc000063f90 sp=0xc000063f18 pc=0x7ff77751ac9e
runtime.chanrecv1(0x7ff77754e100?, 0xc000063f76?)
runtime/chan.go:489 +0x12 fp=0xc000063fb8 sp=0xc000063f90 pc=0x7ff77751a852
runtime.unique_runtime_registerUniqueMapCleanup.func1(...)
runtime/mgc.go:1781
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
runtime/mgc.go:1784 +0x2f fp=0xc000063fe0 sp=0xc000063fb8 pc=0x7ff77752e6af
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000063fe8 sp=0xc000063fe0 pc=0x7ff777588901
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
runtime/mgc.go:1779 +0x96

goroutine 34 gp=0xc0003ce1c0 m=nil [GC worker (idle)]:
runtime.gopark(0x44ed30f908?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:424 +0xce fp=0xc00040df38 sp=0xc00040df18 pc=0x7ff7775803ce
runtime.gcBgMarkWorker(0xc000406000)
runtime/mgc.go:1412 +0xe9 fp=0xc00040dfc8 sp=0xc00040df38 pc=0x7ff77752d9a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1328 +0x25 fp=0xc00040dfe0 sp=0xc00040dfc8 pc=0x7ff77752d885
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc00040dfe8 sp=0xc00040dfe0 pc=0x7ff777588901
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1328 +0x105

goroutine 5 gp=0xc00005d180 m=nil [GC worker (idle)]:
runtime.gopark(0x44ed2a5bac?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:424 +0xce fp=0xc000075f38 sp=0xc000075f18 pc=0x7ff7775803ce
runtime.gcBgMarkWorker(0xc000406000)
runtime/mgc.go:1412 +0xe9 fp=0xc000075fc8 sp=0xc000075f38 pc=0x7ff77752d9a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1328 +0x25 fp=0xc000075fe0 sp=0xc000075fc8 pc=0x7ff77752d885
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000075fe8 sp=0xc000075fe0 pc=0x7ff777588901
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1328 +0x105

goroutine 20 gp=0xc0000876c0 m=nil [GC worker (idle)]:
runtime.gopark(0x44ed2a5bac?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:424 +0xce fp=0xc000071f38 sp=0xc000071f18 pc=0x7ff7775803ce
runtime.gcBgMarkWorker(0xc000406000)
runtime/mgc.go:1412 +0xe9 fp=0xc000071fc8 sp=0xc000071f38 pc=0x7ff77752d9a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1328 +0x25 fp=0xc000071fe0 sp=0xc000071fc8 pc=0x7ff77752d885
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000071fe8 sp=0xc000071fe0 pc=0x7ff777588901
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1328 +0x105

goroutine 35 gp=0xc0003ce380 m=nil [GC worker (idle)]:
runtime.gopark(0x44ed2a5bac?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:424 +0xce fp=0xc00040ff38 sp=0xc00040ff18 pc=0x7ff7775803ce
runtime.gcBgMarkWorker(0xc000406000)
runtime/mgc.go:1412 +0xe9 fp=0xc00040ffc8 sp=0xc00040ff38 pc=0x7ff77752d9a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1328 +0x25 fp=0xc00040ffe0 sp=0xc00040ffc8 pc=0x7ff77752d885
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc00040ffe8 sp=0xc00040ffe0 pc=0x7ff777588901
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1328 +0x105

goroutine 15 gp=0xc0003ce000 m=nil [chan receive]:
runtime.gopark(0x7ff777586917?, 0xc00024b898?, 0xd6?, 0xfd?, 0xc00024b880?)
runtime/proc.go:424 +0xce fp=0xc00024b860 sp=0xc00024b840 pc=0x7ff7775803ce
runtime.chanrecv(0xc000354150, 0xc00024ba10, 0x1)
runtime/chan.go:639 +0x41e fp=0xc00024b8d8 sp=0xc00024b860 pc=0x7ff77751ac9e
runtime.chanrecv1(0xc000146240?, 0xc00020c808?)
runtime/chan.go:489 +0x12 fp=0xc00024b900 sp=0xc00024b8d8 pc=0x7ff77751a852
ollama/llama/runner.(*Server).embeddings(0xc000121560, {0x7ff77856a2b0, 0xc0000bc0e0}, 0xc000014140)
ollama/llama/runner/runner.go:791 +0x746 fp=0xc00024bac0 sp=0xc00024b900 pc=0x7ff777961086
ollama/llama/runner.(*Server).embeddings-fm({0x7ff77856a2b0?, 0xc0000bc0e0?}, 0x7ff7778f1da7?)
:1 +0x36 fp=0xc00024baf0 sp=0xc00024bac0 pc=0x7ff777963a96
net/http.HandlerFunc.ServeHTTP(0xc0004376c0?, {0x7ff77856a2b0?, 0xc0000bc0e0?}, 0x67c1e282?)
net/http/server.go:2220 +0x29 fp=0xc00024bb18 sp=0xc00024baf0 pc=0x7ff7778e45c9
net/http.(*ServeMux).ServeHTTP(0x7ff777521b65?, {0x7ff77856a2b0, 0xc0000bc0e0}, 0xc000014140)
net/http/server.go:2747 +0x1ca fp=0xc00024bb68 sp=0xc00024bb18 pc=0x7ff7778e64ca
net/http.serverHandler.ServeHTTP({0x7ff778566e50?}, {0x7ff77856a2b0?, 0xc0000bc0e0?}, 0x6?)
net/http/server.go:3210 +0x8e fp=0xc00024bb98 sp=0xc00024bb68 pc=0x7ff777903a2e
net/http.(*conn).serve(0xc000121b00, {0x7ff77856c328, 0xc000456c30})
net/http/server.go:2092 +0x5d0 fp=0xc00024bfb8 sp=0xc00024bb98 pc=0x7ff7778e2f70
net/http.(*Server).Serve.gowrap3()
net/http/server.go:3360 +0x28 fp=0xc00024bfe0 sp=0xc00024bfb8 pc=0x7ff7778e83c8
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc00024bfe8 sp=0xc00024bfe0 pc=0x7ff777588901
created by net/http.(*Server).Serve in goroutine 1
net/http/server.go:3360 +0x485

goroutine 66 gp=0xc000174000 m=nil [IO wait]:
runtime.gopark(0x0?, 0xc0005311a0?, 0x48?, 0x12?, 0xc00053124c?)
runtime/proc.go:424 +0xce fp=0xc0004f7d20 sp=0xc0004f7d00 pc=0x7ff7775803ce
runtime.netpollblock(0x3d8?, 0x77518366?, 0xf7?)
runtime/netpoll.go:575 +0xf7 fp=0xc0004f7d58 sp=0xc0004f7d20 pc=0x7ff777544f97
internal/poll.runtime_pollWait(0x21c9e5db558, 0x72)
runtime/netpoll.go:351 +0x85 fp=0xc0004f7d78 sp=0xc0004f7d58 pc=0x7ff77757f645
internal/poll.(*pollDesc).wait(0xc0004f7dd8?, 0x7ff777526065?, 0x0)
internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0004f7da0 sp=0xc0004f7d78 pc=0x7ff777614207
internal/poll.execIO(0xc0005311a0, 0x7ff77842e0a0)
internal/poll/fd_windows.go:177 +0x105 fp=0xc0004f7e18 sp=0xc0004f7da0 pc=0x7ff777615645
internal/poll.(*FD).Read(0xc000531188, {0xc000192fd1, 0x1, 0x1})
internal/poll/fd_windows.go:438 +0x2a7 fp=0xc0004f7ec0 sp=0xc0004f7e18 pc=0x7ff777616347
net.(*netFD).Read(0xc000531188, {0xc000192fd1?, 0xc0004f7f48?, 0x7ff777581d10?})
net/fd_posix.go:55 +0x25 fp=0xc0004f7f08 sp=0xc0004f7ec0 pc=0x7ff77767e945
net.(*conn).Read(0xc000498518, {0xc000192fd1?, 0x0?, 0x7ff778da3400?})
net/net.go:189 +0x45 fp=0xc0004f7f50 sp=0xc0004f7f08 pc=0x7ff77768df25
net.(*TCPConn).Read(0x7ff778d0aa50?, {0xc000192fd1?, 0x0?, 0x0?})
:1 +0x25 fp=0xc0004f7f80 sp=0xc0004f7f50 pc=0x7ff77769f945
net/http.(*connReader).backgroundRead(0xc000192fc0)
net/http/server.go:690 +0x37 fp=0xc0004f7fc8 sp=0xc0004f7f80 pc=0x7ff7778dd8f7
net/http.(*connReader).startBackgroundRead.gowrap2()
net/http/server.go:686 +0x25 fp=0xc0004f7fe0 sp=0xc0004f7fc8 pc=0x7ff7778dd825
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0004f7fe8 sp=0xc0004f7fe0 pc=0x7ff777588901
created by net/http.(*connReader).startBackgroundRead in goroutine 15
net/http/server.go:686 +0xb6
rax 0x21c87c118e4
rbx 0x300
rcx 0x0
rdx 0xfffffffffffff400
rdi 0x21c87c10ce0
rsi 0x0
rbp 0x56f10fe6e0
rsp 0x56f10fe6b0
r8 0x21c87c100e0
r9 0x300
r10 0x0
r11 0x0
r12 0x0
r13 0x300
r14 0x0
r15 0x300
rip 0x7ffa1a5c2a16
rflags 0x10203
cs 0x33
fs 0x53
gs 0x2b
time=2025-02-28T17:21:23.030+01:00 level=ERROR source=routes.go:479 msg="embedding generation failed" error="do embedding request: Post "http://127.0.0.1:55895/embedding\": read tcp 127.0.0.1:55905->127.0.0.1:55895: wsarecv: Eine vorhandene Verbindung wurde vom Remotehost geschlossen."
[GIN] 2025/02/28 - 17:21:23 | 500 | 5.1642929s | 172.27.100.205 | POST "/api/embed"
time=2025-02-28T17:21:24.625+01:00 level=INFO source=server.go:104 msg="system memory" total="31.9 GiB" free="20.4 GiB"free_swap="17.9 GiB"
time=2025-02-28T17:21:24.626+01:00 level=INFO source=memory.go:356 msg="offload to device" layers.requested=-1 layers.model=13 layers.offload=0 layers.split="" memory.available="[20.4 GiB]" memory.gpu_overhead="0 B" memory.required.full="352.9 MiB" memory.required.partial="0 B" memory.required.kv="24.0 MiB" memory.required.allocations="[352.9 MiB]" memory.weights.total="240.1 MiB" memory.weights.repeating="195.4 MiB" memory.weights.nonrepeating="44.7 MiB" memory.graph.full="48.0 MiB" memory.graph.partial="48.0 MiB"
time=2025-02-28T17:21:24.631+01:00 level=INFO source=server.go:392 msg="starting llama server" cmd="C:\Users\Admin\ollama-0.5.4-ipex-llm\ollama-lib.exe runner --model C:\Users\Admin\.ollama\models\blobs\sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 --ctx-size 16384 --batch-size 512 --n-gpu-layers 999 --threads 4 --no-mmap--parallel 1 --port 55960"
time=2025-02-28T17:21:24.639+01:00 level=INFO source=sched.go:449 msg="loaded runners" count=1
time=2025-02-28T17:21:24.640+01:00 level=INFO source=server.go:571 msg="waiting for llama runner to start responding"
time=2025-02-28T17:21:24.640+01:00 level=INFO source=server.go:605 msg="waiting for server to become available" status="llm server error"

@sgwhat
Copy link
Contributor

sgwhat commented Mar 3, 2025

Which version of protable zip are you running?

@DediCATeD88
Copy link
Author

DediCATeD88 commented Mar 3, 2025

Which version of protable zip are you running?

With ollama-0.5.4-ipex-llm-2.2.0b20250226-win.zip i get:

level=ERROR source=routes.go:479 msg="embedding generation failed" error="do embedding request: Post "http://127.0.0.1:51510/embedding\": read tcp 127.0.0.1:51518->127.0.0.1:51510: wsarecv: Eine vorhandene Verbindung wurde vom Remotehost geschlossen."

in which i believe is the problem:

Exception 0xc000001d 0x0 0x0 0x7ffccaf42a16
PC=0x7ffccaf42a16
signal arrived during external code execution

With ollama-0.5.4-ipex-llm-2.2.0b20250220-win.zip i get 100% GPU loop which never ends (doing an embedding via OpenWebUI Web Search Feature of 3 Websites. Not very huge PDF Doucments which may take longer):

Screenshots: 2025-03-03 10_44_26.png and 2025-03-03 10_44_35.png

Image

Image

I am on Windows 11 24H2 Build 26100.3194. No AntiVirus besides Defender.

I've tested Ollama directly from Ollama Website. No problem doing embeddings whatsoever, besides really really slow because of running via CPU.

@sgwhat
Copy link
Contributor

sgwhat commented Mar 4, 2025

Hi @DediCATeD88 , I cannot reproduce your issue, nomic-embed-text works well on my Intel B580 windows desktop, and we get exactly the same version of gpu driver.

Btw, how did you run this model, like windows cmd?

@DediCATeD88
Copy link
Author

DediCATeD88 commented Mar 4, 2025

Hi @DediCATeD88 , I cannot reproduce your issue, nomic-embed-text works well on my Intel B580 windows desktop, and we get exactly the same version of gpu driver.

Btw, how did you run this model, like windows cmd?

Strange. For example DeepSeek R1 7B/8B is running totally fine with both portable versions. It seems to be only embed api / embed models. Nomic, AllMini and Snowflake V2 for example. Maybe theres some problem with "only" AVX CPU, not AVX2 or Resizable Bar not supported by BIOS/UEFI (the porpose of the machine is only for RAG so its an old HZ420 with Intel Xeon and the B580).

Heres my command, yes windows cmd:

Ollama autorun
cd C:\Users\Admin\ollama-0.5.4-ipex-llm
ONEAPI_DEVICE_SELECTOR=level_zero:0
call start-ollama.bat

Ollama serve
@echo off
setlocal
set OLLAMA_NUM_GPU=999
set no_proxy=localhost,127.0.0.1
set ZES_ENABLE_SYSMAN=1
set SYCL_CACHE_PERSISTENT=1
@Rem This environment variable might improve performance.
@Rem You could uncomment it and test whether it brings benefit for your case.
@Rem set SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
set OLLAMA_KEEP_ALIVE=10m
set OLLAMA_HOST=0.0.0.0
cd /d %~dp0 && ollama.exe serve

The model for example nomic is then called via OpenWebUI and Ollama API on http://ipofthehpz420machine:11434

I will have a further look with OLLAMA_DEBUG="1" or docker container for ipex and WSL2.

@DediCATeD88
Copy link
Author

DediCATeD88 commented Mar 4, 2025

Hi @DediCATeD88 , I cannot reproduce your issue, nomic-embed-text works well on my Intel B580 windows desktop, and we get exactly the same version of gpu driver.
Btw, how did you run this model, like windows cmd?

Strange. For example DeepSeek R1 7B/8B is running totally fine with both portable versions. It seems to be only embed api / embed models. Nomic, AllMini and Snowflake V2 for example. Maybe theres some problem with "only" AVX CPU, not AVX2 or Resizable Bar not supported by BIOS/UEFI (the porpose of the machine is only for RAG so its an old HZ420 with Intel Xeon and the B580).

Heres my command, yes windows cmd:

Ollama autorun cd C:\Users\Admin\ollama-0.5.4-ipex-llm ONEAPI_DEVICE_SELECTOR=level_zero:0 call start-ollama.bat

Ollama serve @echo off setlocal set OLLAMA_NUM_GPU=999 set no_proxy=localhost,127.0.0.1 set ZES_ENABLE_SYSMAN=1 set SYCL_CACHE_PERSISTENT=1 @Rem This environment variable might improve performance. @Rem You could uncomment it and test whether it brings benefit for your case. @Rem set SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 set OLLAMA_KEEP_ALIVE=10m set OLLAMA_HOST=0.0.0.0 cd /d %~dp0 && ollama.exe serve

The model for example nomic is then called via OpenWebUI and Ollama API on http://ipofthehpz420machine:11434

I will have a further look with OLLAMA_DEBUG="1" or docker container for ipex and WSL2.

Very strange btw i can run "embeddings" fine with lama 3.1 8b or even llama 3.2 1b. Only the real embedding models like nomic, allmini, snowflake give me the problems above.

@DediCATeD88
Copy link
Author

FYI since i've found no possible cause or solution on my side for

Exception 0xc000001d 0x0 0x0 0x7ffccaf42a16
PC=0x7ffccaf42a16
signal arrived during external code execution

i've found and tried

ollama/ollama#5059 (comment)

which runs fine with nomic-embed-text. No GPU 100% Copy loop like in ollama-0.5.4-ipex-llm-2.2.0b20250220-win.zip and no crash like in ollama-0.5.4-ipex-llm-2.2.0b20250226-win.zip.

The speed is very slow 4-6 Minutes, which may be the early vulkan support, but no crashes or loops.

Still hoping the next release for ipex will fix this and speed up things for me.

Full vulkan log:
2025/03/07 17:56:46 routes.go:1186: INFO server config env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY:vulkan variant OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\Users\Admin\.ollama\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]"
time=2025-03-07T17:56:46.491+01:00 level=INFO source=images.go:432 msg="total blobs: 21"
time=2025-03-07T17:56:46.495+01:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0"
time=2025-03-07T17:56:46.499+01:00 level=INFO source=routes.go:1237 msg="Listening on [::]:11434 (version 0.5.11-2d443b3-vulkan)"
time=2025-03-07T17:56:46.499+01:00 level=INFO source=gpu.go:254 msg="looking for compatible GPUs"
time=2025-03-07T17:56:46.499+01:00 level=INFO source=gpu_windows.go:176 msg=packages count=1
time=2025-03-07T17:56:46.499+01:00 level=INFO source=gpu_windows.go:223 msg="" package=0 cores=4 efficiency=0 threads=4
time=2025-03-07T17:56:46.512+01:00 level=INFO source=gpu.go:701 msg="Unable to load cudart library C:\WINDOWS\system32\nvcuda.dll: symbol lookup for cuDeviceGetUuid failed: Die angegebene Prozedur wurde nicht gefunden.\r\n"
time=2025-03-07T17:56:46.522+01:00 level=INFO source=gpu.go:199 msg="vulkan: load libvulkan and libcap ok"
Fri Mar 7 17:56:49 2025: IGSC: (D:\qb\workspace\31779\source\igsc-master\lib\igsc_lib.c:gsc_driver_init():218) Error in HECI init (3)
Fri Mar 7 17:56:49 2025: IGSC: (D:\qb\workspace\31779\source\igsc-master\lib\igsc_lib.c:igsc_device_subsystem_ids():1597) Failed to init HECI driver
time=2025-03-07T17:56:49.953+01:00 level=INFO source=types.go:137 msg="inference compute" id=0 library=vulkan variant="" compute=1.4 driver=1.4 name="Intel(R) Arc(TM) B580 Graphics" total="11.9 GiB" available="11.4 GiB"
Fri Mar 7 17:57:59 2025: IGSC: (D:\qb\workspace\31779\source\igsc-master\lib\igsc_lib.c:gsc_driver_init():218) Error in HECI init (3)
Fri Mar 7 17:57:59 2025: IGSC: (D:\qb\workspace\31779\source\igsc-master\lib\igsc_lib.c:igsc_device_subsystem_ids():1597) Failed to init HECI driver
Fri Mar 7 17:58:03 2025: IGSC: (D:\qb\workspace\31779\source\igsc-master\lib\igsc_lib.c:gsc_driver_init():218) Error in HECI init (3)
Fri Mar 7 17:58:03 2025: IGSC: (D:\qb\workspace\31779\source\igsc-master\lib\igsc_lib.c:igsc_device_subsystem_ids():1597) Failed to init HECI driver
time=2025-03-07T17:58:03.038+01:00 level=INFO source=server.go:100 msg="system memory" total="31.9 GiB" free="17.9 GiB" free_swap="14.0 GiB"
time=2025-03-07T17:58:03.039+01:00 level=INFO source=memory.go:356 msg="offload to vulkan" layers.requested=999 layers.model=13 layers.offload=13 layers.split="" memory.available="[10.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="352.9 MiB" memory.required.partial="352.9 MiB" memory.required.kv="24.0 MiB" memory.required.allocations="[352.9 MiB]" memory.weights.total="240.1 MiB" memory.weights.repeating="195.4 MiB" memory.weights.nonrepeating="44.7 MiB" memory.graph.full="48.0 MiB" memory.graph.partial="48.0 MiB"
time=2025-03-07T17:58:03.053+01:00 level=INFO source=server.go:380 msg="starting llama server" cmd="C:\Users\Admin\AppData\Local\Programs\Ollama\ollama.exe runner --model C:\Users\Admin\.ollama\models\blobs\sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 --ctx-size 8192 --batch-size 512 --n-gpu-layers 999 --threads 4 --parallel 1 --port 63544"
time=2025-03-07T17:58:03.060+01:00 level=INFO source=sched.go:449 msg="loaded runners" count=1
time=2025-03-07T17:58:03.061+01:00 level=INFO source=server.go:557 msg="waiting for llama runner to start responding"
time=2025-03-07T17:58:03.062+01:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error"
time=2025-03-07T17:58:03.115+01:00 level=INFO source=runner.go:936 msg="starting go runner"
time=2025-03-07T17:58:03.189+01:00 level=INFO source=runner.go:937 msg=system info="CPU : SSE3 = 1 | LLAMAFILE = 1 | CPU : SSE3 = 1 | LLAMAFILE = 1 | cgo(gcc)" threads=4
time=2025-03-07T17:58:03.191+01:00 level=INFO source=runner.go:995 msg="Server listening on 127.0.0.1:63544"
load_backend: loaded CPU backend from C:\Users\Admin\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-sandybridge.dll
load_backend: loaded Vulkan backend from C:\Users\Admin\AppData\Local\Programs\Ollama\lib\ollama\vulkan\ggml-vulkan.dll
time=2025-03-07T17:58:03.314+01:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model"
Fri Mar 7 17:58:06 2025: IGSC: (D:\qb\workspace\31779\source\igsc-master\lib\igsc_lib.c:gsc_driver_init():218) Error in HECI init (3)
Fri Mar 7 17:58:06 2025: IGSC: (D:\qb\workspace\31779\source\igsc-master\lib\igsc_lib.c:igsc_device_subsystem_ids():1597) Failed to init HECI driver
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) B580 Graphics (Intel Corporation) | uma: 0 | fp16: 1 | warp size: 32 | matrix cores: none
llama_load_model_from_file: using device Vulkan0 (Intel(R) Arc(TM) B580 Graphics) - 11916 MiB free
llama_model_loader: loaded meta data with 24 key-value pairs and 112 tensors from C:\Users\Admin.ollama\models\blobs\sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = nomic-bert
llama_model_loader: - kv 1: general.name str = nomic-embed-text-v1.5
llama_model_loader: - kv 2: nomic-bert.block_count u32 = 12
llama_model_loader: - kv 3: nomic-bert.context_length u32 = 2048
llama_model_loader: - kv 4: nomic-bert.embedding_length u32 = 768
llama_model_loader: - kv 5: nomic-bert.feed_forward_length u32 = 3072
llama_model_loader: - kv 6: nomic-bert.attention.head_count u32 = 12
llama_model_loader: - kv 7: nomic-bert.attention.layer_norm_epsilon f32 = 0.000000
llama_model_loader: - kv 8: general.file_type u32 = 1
llama_model_loader: - kv 9: nomic-bert.attention.causal bool = false
llama_model_loader: - kv 10: nomic-bert.pooling_type u32 = 1
llama_model_loader: - kv 11: nomic-bert.rope.freq_base f32 = 1000.000000
llama_model_loader: - kv 12: tokenizer.ggml.token_type_count u32 = 2
llama_model_loader: - kv 13: tokenizer.ggml.bos_token_id u32 = 101
llama_model_loader: - kv 14: tokenizer.ggml.eos_token_id u32 = 102
llama_model_loader: - kv 15: tokenizer.ggml.model str = bert
llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,30522] = ["[PAD]", "[unused0]", "[unused1]", "...
llama_model_loader: - kv 17: tokenizer.ggml.scores arr[f32,30522] = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,30522] = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 100
llama_model_loader: - kv 20: tokenizer.ggml.seperator_token_id u32 = 102
llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 0
llama_model_loader: - kv 22: tokenizer.ggml.cls_token_id u32 = 101
llama_model_loader: - kv 23: tokenizer.ggml.mask_token_id u32 = 103
llama_model_loader: - type f32: 51 tensors
llama_model_loader: - type f16: 61 tensors
llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
llm_load_vocab: special tokens cache size = 5
llm_load_vocab: token to piece cache size = 0.2032 MB
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = nomic-bert
llm_load_print_meta: vocab type = WPM
llm_load_print_meta: n_vocab = 30522
llm_load_print_meta: n_merges = 0
llm_load_print_meta: vocab_only = 0
llm_load_print_meta: n_ctx_train = 2048
llm_load_print_meta: n_embd = 768
llm_load_print_meta: n_layer = 12
llm_load_print_meta: n_head = 12
llm_load_print_meta: n_head_kv = 12
llm_load_print_meta: n_rot = 64
llm_load_print_meta: n_swa = 0
llm_load_print_meta: n_embd_head_k = 64
llm_load_print_meta: n_embd_head_v = 64
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: n_embd_k_gqa = 768
llm_load_print_meta: n_embd_v_gqa = 768
llm_load_print_meta: f_norm_eps = 1.0e-12
llm_load_print_meta: f_norm_rms_eps = 0.0e+00
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 3072
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 0
llm_load_print_meta: pooling type = 1
llm_load_print_meta: rope type = 2
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 1000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn = 2048
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: ssm_dt_b_c_rms = 0
llm_load_print_meta: model type = 137M
llm_load_print_meta: model ftype = F16
llm_load_print_meta: model params = 136.73 M
llm_load_print_meta: model size = 260.86 MiB (16.00 BPW)
llm_load_print_meta: general.name = nomic-embed-text-v1.5
llm_load_print_meta: BOS token = 101 '[CLS]'
llm_load_print_meta: EOS token = 102 '[SEP]'
llm_load_print_meta: UNK token = 100 '[UNK]'
llm_load_print_meta: SEP token = 102 '[SEP]'
llm_load_print_meta: PAD token = 0 '[PAD]'
llm_load_print_meta: CLS token = 101 '[CLS]'
llm_load_print_meta: MASK token = 103 '[MASK]'
llm_load_print_meta: LF token = 0 '[PAD]'
llm_load_print_meta: EOG token = 102 '[SEP]'
llm_load_print_meta: max token length = 21
ggml_vulkan: Compiling shaders............................................Done!
llm_load_tensors: offloading 12 repeating layers to GPU
llm_load_tensors: offloading output layer to GPU
llm_load_tensors: offloaded 13/13 layers to GPU
llm_load_tensors: Vulkan0 model buffer size = 216.14 MiB
llm_load_tensors: CPU_Mapped model buffer size = 44.72 MiB
llama_new_context_with_model: n_seq_max = 1
llama_new_context_with_model: n_ctx = 8192
llama_new_context_with_model: n_ctx_per_seq = 8192
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base = 1000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: n_ctx_pre_seq (8192) > n_ctx_train (2048) -- possible training context overflow
llama_kv_cache_init: kv_size = 8192, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 12, can_shift = 1
llama_kv_cache_init: Vulkan0 KV buffer size = 288.00 MiB
llama_new_context_with_model: KV self size = 288.00 MiB, K (f16): 144.00 MiB, V (f16): 144.00 MiB
llama_new_context_with_model: Vulkan_Host output buffer size = 0.00 MiB
llama_new_context_with_model: Vulkan0 compute buffer size = 23.50 MiB
llama_new_context_with_model: Vulkan_Host compute buffer size = 3.50 MiB
llama_new_context_with_model: graph nodes = 453
llama_new_context_with_model: graph splits = 4 (with bs=512), 2 (with bs=1)
time=2025-03-07T17:58:09.077+01:00 level=INFO source=server.go:596 msg="llama runner started in 6.02 seconds"
llama_model_loader: loaded meta data with 24 key-value pairs and 112 tensors from C:\Users\Admin.ollama\models\blobs\sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = nomic-bert
llama_model_loader: - kv 1: general.name str = nomic-embed-text-v1.5
llama_model_loader: - kv 2: nomic-bert.block_count u32 = 12
llama_model_loader: - kv 3: nomic-bert.context_length u32 = 2048
llama_model_loader: - kv 4: nomic-bert.embedding_length u32 = 768
llama_model_loader: - kv 5: nomic-bert.feed_forward_length u32 = 3072
llama_model_loader: - kv 6: nomic-bert.attention.head_count u32 = 12
llama_model_loader: - kv 7: nomic-bert.attention.layer_norm_epsilon f32 = 0.000000
llama_model_loader: - kv 8: general.file_type u32 = 1
llama_model_loader: - kv 9: nomic-bert.attention.causal bool = false
llama_model_loader: - kv 10: nomic-bert.pooling_type u32 = 1
llama_model_loader: - kv 11: nomic-bert.rope.freq_base f32 = 1000.000000
llama_model_loader: - kv 12: tokenizer.ggml.token_type_count u32 = 2
llama_model_loader: - kv 13: tokenizer.ggml.bos_token_id u32 = 101
llama_model_loader: - kv 14: tokenizer.ggml.eos_token_id u32 = 102
llama_model_loader: - kv 15: tokenizer.ggml.model str = bert
llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,30522] = ["[PAD]", "[unused0]", "[unused1]", "...
llama_model_loader: - kv 17: tokenizer.ggml.scores arr[f32,30522] = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,30522] = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 100
llama_model_loader: - kv 20: tokenizer.ggml.seperator_token_id u32 = 102
llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 0
llama_model_loader: - kv 22: tokenizer.ggml.cls_token_id u32 = 101
llama_model_loader: - kv 23: tokenizer.ggml.mask_token_id u32 = 103
llama_model_loader: - type f32: 51 tensors
llama_model_loader: - type f16: 61 tensors
llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
llm_load_vocab: special tokens cache size = 5
llm_load_vocab: token to piece cache size = 0.2032 MB
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = nomic-bert
llm_load_print_meta: vocab type = WPM
llm_load_print_meta: n_vocab = 30522
llm_load_print_meta: n_merges = 0
llm_load_print_meta: vocab_only = 1
llm_load_print_meta: model type = ?B
llm_load_print_meta: model ftype = all F32
llm_load_print_meta: model params = 136.73 M
llm_load_print_meta: model size = 260.86 MiB (16.00 BPW)
llm_load_print_meta: general.name = nomic-embed-text-v1.5
llm_load_print_meta: BOS token = 101 '[CLS]'
llm_load_print_meta: EOS token = 102 '[SEP]'
llm_load_print_meta: UNK token = 100 '[UNK]'
llm_load_print_meta: SEP token = 102 '[SEP]'
llm_load_print_meta: PAD token = 0 '[PAD]'
llm_load_print_meta: CLS token = 101 '[CLS]'
llm_load_print_meta: MASK token = 103 '[MASK]'
llm_load_print_meta: LF token = 0 '[PAD]'
llm_load_print_meta: EOG token = 102 '[SEP]'
llm_load_print_meta: max token length = 21
llama_model_load: vocab only - skipping tensors
[GIN] 2025/03/07 - 17:58:57 | 200 | 1m0s | ..
.205 | POST "/api/embed"
[GIN] 2025/03/07 - 17:59:45 | 200 | 48.2871992s | ...205 | POST "/api/embed"
[GIN] 2025/03/07 - 18:00:13 | 200 | 28.0140703s | ..
.205 | POST "/api/embed"
[GIN] 2025/03/07 - 18:00:52 | 200 | 38.9209724s | ...205 | POST "/api/embed"
[GIN] 2025/03/07 - 18:01:10 | 200 | 18.400538s | ..
.205 | POST "/api/embed"
[GIN] 2025/03/07 - 18:01:36 | 200 | 25.5411385s | ...205 | POST "/api/embed"
[GIN] 2025/03/07 - 18:02:20 | 200 | 44.2000303s | ..
.205 | POST "/api/embed"
[GIN] 2025/03/07 - 18:02:56 | 200 | 34.0947889s | ...205 | POST "/api/embed"
[GIN] 2025/03/07 - 18:03:18 | 200 | 21.9562093s | ..
.205 | POST "/api/embed"
[GIN] 2025/03/07 - 18:03:31 | 200 | 13.048495s | ...205 | POST "/api/embed"
[GIN] 2025/03/07 - 18:03:57 | 200 | 25.9365147s | ..
.205 | POST "/api/embed"
[GIN] 2025/03/07 - 18:04:16 | 200 | 19.0796961s | ...205 | POST "/api/embed"
[GIN] 2025/03/07 - 18:04:29 | 200 | 13.1584853s | ..
.205 | POST "/api/embed"
[GIN] 2025/03/07 - 18:04:31 | 200 | 414.5571ms | ...205 | POST "/api/embed"
[GIN] 2025/03/07 - 18:04:32 | 200 | 348.7407ms | ..
.205 | POST "/api/embed"
[GIN] 2025/03/07 - 18:04:32 | 200 | 331.5015ms | ...205 | POST "/api/embed"
[GIN] 2025/03/07 - 18:04:32 | 200 | 363.4372ms | ..
.205 | POST "/api/embed"
[GIN] 2025/03/07 - 18:04:33 | 200 | 346.9706ms | ...205 | POST "/api/embed"
[GIN] 2025/03/07 - 18:04:33 | 200 | 444.0134ms | ..
.205 | POST "/api/embed"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants