{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":631215961,"defaultBranch":"main","name":"ggml-sys-bleedingedge","ownerLogin":"KerfuffleV2","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2023-04-22T10:03:31.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/44031344?v=4","public":true,"private":false,"isOrgOwned":false},"refInfo":{"name":"","listCacheKey":"v0:1716252616.0","currentOid":""},"activityList":{"items":[{"before":"e6aade32b17280f525b46d6191ddf020c3dc22c0","after":"923caca811af80d5633ce3012e0c0ab235f5b955","ref":"refs/heads/main","pushedAt":"2024-05-21T00:50:14.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"[auto] Sync version 2405210049.0.0+llamacpp-release.b2953\n\n== Relevant log messages from source repo:\n\ncommit 917dc8cfa67a72fb7c8bf7392270da3bf4833af4\nAuthor: jaime-m-p <167997752+jaime-m-p@users.noreply.github.com>\nDate: Mon May 20 20:15:57 2024 +0200\n\n Tokenizer SPM fixes for phi-3 and llama-spm (#7375)\n\n * Update brute force test: special tokens\n * Fix added tokens\n - Try to read 'added_tokens.json'.\n - Try to read 'tokenizer_config.json'.\n - Try to read 'tokenizer.json'.\n * Fix special tokens rtrim\n\n Co-authored-by: Georgi Gerganov \n * server : fix test regexes","shortMessageHtmlLink":"[auto] Sync version 2405210049.0.0+llamacpp-release.b2953"}},{"before":"cadffc0b60428da1c383dd3cc89d49e8649fad09","after":"e6aade32b17280f525b46d6191ddf020c3dc22c0","ref":"refs/heads/main","pushedAt":"2024-05-20T18:14:43.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"[auto] Sync version 2405201813.0.0+llamacpp-release.b2952\n\n== Relevant log messages from source repo:\n\ncommit fabf30b4c4fca32e116009527180c252919ca922\nAuthor: Georgi Gerganov \nDate: Mon May 20 19:35:28 2024 +0300\n\n llama : remove Persimmon (#7408)\n\n * llama : remove Persimmon\n\n * requirements : remove\n\ncommit db10f01310beea8a1ef7798651b9d692fd1149d0\nAuthor: Radoslav Gerganov \nDate: Mon May 20 16:36:55 2024 +0300\n\n rpc : track allocated buffers (#7411)\n\n * rpc : track allocated buffers\n\n ref: #7407\n\n * rpc : pack rpc_tensor tightly\n\ncommit 6bf9b66fa3f263ca2175dcb5f6d0a658581e1dfb\nAuthor: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com>\nDate: Mon May 20 12:08:23 2024 +0100\n\n [SYCL] Update SYCL upscale operation (#7321)\n\n * Update SYCL upscale operation\n\n * Formatting\n\n * Remove messages","shortMessageHtmlLink":"[auto] Sync version 2405201813.0.0+llamacpp-release.b2952"}},{"before":"6e502cff23d5fd08543d41de29943b144058cad2","after":"cadffc0b60428da1c383dd3cc89d49e8649fad09","ref":"refs/heads/main","pushedAt":"2024-05-20T12:22:00.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"[auto] Sync version 2405201221.0.0+llamacpp-release.b2946\n\n== Relevant log messages from source repo:\n\ncommit 213e90ed73f8ac3cd3026dc3f086beae0d414f96\nAuthor: Herman Semenov \nDate: Mon May 20 07:33:21 2024 +0000\n\n ggml-opencl, llama: using reserve() if count already known (#7272)\n\ncommit 65c58207ece92ad213f4bfd0f91dcb2dfb664f5b\nAuthor: junchao-loongson <68935141+junchao-loongson@users.noreply.github.com>\nDate: Mon May 20 15:19:21 2024 +0800\n\n ggml : add loongarch lsx and lasx support (#6454)\n\n * add loongarch lsx and lasx optimize code\n\n * Add loongarch compilation support to makefile\n\n * revert stb_image.h\n\n * opt bytes_from_nibbles_32 and sum_i16_pairs_float\n\n * fix undeclared\n\n * format code\n\n * update\n\n * update 2\n\n ---------\n\n Co-authored-by: Jinyang He ","shortMessageHtmlLink":"[auto] Sync version 2405201221.0.0+llamacpp-release.b2946"}},{"before":"23ae0ee0ea19423f489142dbe08e60695ed276e1","after":"6e502cff23d5fd08543d41de29943b144058cad2","ref":"refs/heads/main","pushedAt":"2024-05-20T06:15:32.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"[auto] Sync version 2405200614.0.0+llamacpp-release.b2941\n\n== Relevant log messages from source repo:\n\ncommit 33c8d50accd6dca73c9c4af00a05e24209c160fe\nAuthor: Srihari-mcw <96763064+Srihari-mcw@users.noreply.github.com>\nDate: Sun May 19 19:18:39 2024 -0700\n\n Add provisions for windows support for BF16 code including CMake provision for enabling AVX512_BF16 (#7258)","shortMessageHtmlLink":"[auto] Sync version 2405200614.0.0+llamacpp-release.b2941"}},{"before":"ebf2db53bc323815fab45f780ccf6013b6cf8fe5","after":"23ae0ee0ea19423f489142dbe08e60695ed276e1","ref":"refs/heads/main","pushedAt":"2024-05-20T00:51:06.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"[auto] Sync version 2405200050.0.0+llamacpp-release.b2940\n\n== Relevant log messages from source repo:\n\ncommit d359f30921a9f62a0fd299c412ff3f270286fea6\nAuthor: slaren \nDate: Mon May 20 01:17:03 2024 +0200\n\n llama : remove MPI backend (#7395)\n\ncommit f030ec1f7a72aa825b2104823946551b9ec5dfc1\nAuthor: 0cc4m \nDate: Sun May 19 17:19:53 2024 +0200\n\n Vulkan Embedding Fix (#7360)\n\n * Fix empty Vulkan host buffers\n\n Add fp32 fp16 matmul shader\n\n Fix matmul shader alignment\n\n * Remove deprecated tensor->backend uses\n\n * Fix Vulkan validation errors on embedding models with no offloaded layers\n\n * Fix Vulkan llava segfault when not offloading layers\n\ncommit e4e6f67be6a8a697f5f89a28c98934e53c99c359\nAuthor: slaren \nDate: Sun May 19 17:08:46 2024 +0200\n\n ggml : fix another case of quants nans (#7387)\n\ncommit 5ca49cbecda27ce0a7266658fc3b640bff3ed386\nAuthor: Johannes Gäßler \nDate: Sun May 19 16:46:13 2024 +0200\n\n ggml: implement quantized KV cache for FA (#7372)\n\ncommit 6aade19ee74b896c59929676629340b36be3e22c\nAuthor: Anas Ahouzi <112881240+aahouzi@users.noreply.github.com>\nDate: Sun May 19 14:46:46 2024 +0200\n\n Add StableLM2 pre-tokenizer (#7349)\n\n * Add StableLM pre-tokenizer\n\n * Fix space\n\n * Fix trailing whitespace","shortMessageHtmlLink":"[auto] Sync version 2405200050.0.0+llamacpp-release.b2940"}},{"before":"d83ab1dded1348eae897c0e2688e81547ddd75a4","after":"ebf2db53bc323815fab45f780ccf6013b6cf8fe5","ref":"refs/heads/main","pushedAt":"2024-05-19T18:13:44.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"[auto] Sync version 2405191813.0.0+llamacpp-release.b2932\n\n== Relevant log messages from source repo:\n\ncommit ab33f7a338593f6cf1ae98b10b6f8684f63bd72c\nAuthor: slaren \nDate: Sun May 19 14:19:37 2024 +0200\n\n cuda : clear error after buffer allocation failure (#7376)","shortMessageHtmlLink":"[auto] Sync version 2405191813.0.0+llamacpp-release.b2932"}},{"before":"c7de716cae2d34152ff60e45be443c111f0dafa2","after":"d83ab1dded1348eae897c0e2688e81547ddd75a4","ref":"refs/heads/main","pushedAt":"2024-05-19T00:53:31.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"[auto] Sync version 2405190052.0.0+llamacpp-release.b2929\n\n== Relevant log messages from source repo:\n\ncommit f5bf761747988ee1832766f7d1433739aff810da\nAuthor: fraxy-v <65565042+fraxy-v@users.noreply.github.com>\nDate: Sun May 19 01:44:42 2024 +0300\n\n Capture CUDA logging output (#7298)\n\n * logging: output capture in cuda module\n\n * fix compile error\n\n * fix: vsnprintf terminates with 0, string use not correct\n\n * post review\n\n * Update llama.cpp\n\n Co-authored-by: slaren \n\n * Update llama.cpp\n\n Co-authored-by: slaren \n\n ---------\n\n Co-authored-by: slaren ","shortMessageHtmlLink":"[auto] Sync version 2405190052.0.0+llamacpp-release.b2929"}},{"before":"ecf3d3e51633563fd845a312d0ab71cd9c07394f","after":"c7de716cae2d34152ff60e45be443c111f0dafa2","ref":"refs/heads/main","pushedAt":"2024-05-18T18:14:44.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"[auto] Sync version 2405181813.0.0+llamacpp-release.b2928\n\n== Relevant log messages from source repo:\n\ncommit 059031b8c40e1f4ba60586842c5b1ed3ddf61842\nAuthor: Georgi Gerganov \nDate: Sat May 18 18:55:54 2024 +0300\n\n ci : re-enable sanitizer runs (#7358)\n\n * Revert \"ci : temporary disable sanitizer builds (#6128)\"\n\n This reverts commit 4f6d1337ca5a409dc74aca8c479b7c34408a69c0.\n\n * ci : trigger\n\ncommit 511182eabb36f6ec9776e2b3c4d7e16d93d0ac0d\nAuthor: Georgi Gerganov \nDate: Sat May 18 13:40:39 2024 +0300\n\n android : use \"ci-android\" branch for CI (#7341)\n\n * android : use \"ci-android\" branch for CI\n\n * ggml : disable SIMD exp and silu for 32-bit ARM\n\n ggml-ci\n\n * android : do not fetch, use add_subdirectory instead\n\n * cmake : provide binary dir\n\ncommit 0f98acfac6cc561dc57586bfff778405e42b576b\nAuthor: Steffen Röcker \nDate: Sat May 18 10:04:55 2024 +0200\n\n llama : add support for larger Granite Code Models (20B, 34B) (#7324)\n\n Tie the weights for ARCH_STARCODER to support the larger Granite code models.\n Partially addresses ggerganov/issues/7116\n\n There still remains to be a few things to fix.\n Currently requires `--override-kv tokenizer.ggml.add_bos_token=bool:false`","shortMessageHtmlLink":"[auto] Sync version 2405181813.0.0+llamacpp-release.b2928"}},{"before":"3c4bb04093263e558f76c76b1d11b199e739a2dc","after":"ecf3d3e51633563fd845a312d0ab71cd9c07394f","ref":"refs/heads/main","pushedAt":"2024-05-18T12:20:04.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"[auto] Sync version 2405181218.0.0+llamacpp-release.b2921\n\n== Relevant log messages from source repo:\n\ncommit c1b295eea5c49887a066559527a74e8b94fe9db0\nAuthor: 0cc4m \nDate: Sat May 18 08:10:58 2024 +0200\n\n Update and fix Vulkan soft_max and argsort implementations (#7237)\n\n * Update and fix Vulkan softmax implementation\n\n * Update and fix Vulkan argsort implementation\n\ncommit 05834841dcb4f922983ea976539c70472272df9a\nAuthor: slaren \nDate: Sat May 18 02:39:54 2024 +0200\n\n ggml : fix quants nans when all the group weights are very close to zero (#7313)","shortMessageHtmlLink":"[auto] Sync version 2405181218.0.0+llamacpp-release.b2921"}},{"before":"0546c88891b26b728ffc106f35748f23ddcf0614","after":"3c4bb04093263e558f76c76b1d11b199e739a2dc","ref":"refs/heads/main","pushedAt":"2024-05-18T06:13:58.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"[auto] Sync version 2405180613.0.0+llamacpp-release.b2917\n\n== Relevant log messages from source repo:\n\ncommit ef277de2add255a08b2b909ebfbf70364d1f4dc4\nAuthor: Engininja2 <139037756+Engininja2@users.noreply.github.com>\nDate: Fri May 17 18:39:25 2024 -0600\n\n cmake : fix typo in AMDGPU_TARGETS (#7356)","shortMessageHtmlLink":"[auto] Sync version 2405180613.0.0+llamacpp-release.b2917"}},{"before":"7373dbcdcbae82a09fdc5f6d7c2d650d0bf88638","after":"0546c88891b26b728ffc106f35748f23ddcf0614","ref":"refs/heads/main","pushedAt":"2024-05-18T00:49:33.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"[auto] Sync version 2405180048.0.0+llamacpp-release.b2916\n\n== Relevant log messages from source repo:\n\ncommit b43272afa29a64dcb8bcf26a96a05bac40792b92\nAuthor: jaime-m-p <167997752+jaime-m-p@users.noreply.github.com>\nDate: Sat May 18 01:09:13 2024 +0200\n\n Unicode codepoint flags for custom regexs (#7245)\n\n * Replace CODEPOINT_TYPE_* with codepoint_flags\n * Update and bugfix brute force random test\n * Deterministic brute force random test\n * Unicode normalization NFD\n * Get rid of BOM","shortMessageHtmlLink":"[auto] Sync version 2405180048.0.0+llamacpp-release.b2916"}},{"before":"535ebebca899ecfe9563cb5b6816862aed669c3f","after":"7373dbcdcbae82a09fdc5f6d7c2d650d0bf88638","ref":"refs/heads/main","pushedAt":"2024-05-17T18:14:45.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"[auto] Sync version 2405171814.0.0+llamacpp-release.b2915\n\n== Relevant log messages from source repo:\n\ncommit 82ca83db3c8d45df559c03a4225b6eb34808a2db\nAuthor: Gavin Zhao \nDate: Fri May 17 11:03:03 2024 -0400\n\n ROCm: use native CMake HIP support (#5966)\n\n Supercedes #4024 and #4813.\n\n CMake's native HIP support has become the\n recommended way to add HIP code into a project (see\n [here](https://rocm.docs.amd.com/en/docs-6.0.0/conceptual/cmake-packages.html#using-hip-in-cmake)).\n This PR makes the following changes:\n\n 1. The environment variable `HIPCXX` or CMake option\n `CMAKE_HIP_COMPILER` should be used to specify the HIP\n compiler. Notably this shouldn't be `hipcc`, but ROCm's clang,\n which usually resides in `$ROCM_PATH/llvm/bin/clang`. Previously\n this was control by `CMAKE_C_COMPILER` and `CMAKE_CXX_COMPILER`.\n Note that since native CMake HIP support is not yet available on\n Windows, on Windows we fall back to the old behavior.\n\n 2. CMake option `CMAKE_HIP_ARCHITECTURES` is used to control the\n GPU architectures to build for. Previously this was controled by\n `GPU_TARGETS`.\n\n 3. Updated the Nix recipe to account for these new changes.\n\n 4. The GPU targets to build against in the Nix recipe is now\n consistent with the supported GPU targets in nixpkgs.\n\n 5. Added CI checks for HIP on both Linux and Windows. On Linux, we test\n both the new and old behavior.\n\n The most important part about this PR is the separation of the\n HIP compiler and the C/C++ compiler. This allows users to choose\n a different C/C++ compiler if desired, compared to the current\n situation where when building for ROCm support, everything must be\n compiled with ROCm's clang.\n\n ~~Makefile is unchanged. Please let me know if we want to be\n consistent on variables' naming because Makefile still uses\n `GPU_TARGETS` to control architectures to build for, but I feel\n like setting `CMAKE_HIP_ARCHITECTURES` is a bit awkward when you're\n calling `make`.~~ Makefile used `GPU_TARGETS` but the README says\n to use `AMDGPU_TARGETS`. For consistency with CMake, all usage of\n `GPU_TARGETS` in Makefile has been updated to `AMDGPU_TARGETS`.\n\n Thanks to the suggestion of @jin-eld, to maintain backwards\n compatibility (and not break too many downstream users' builds), if\n `CMAKE_CXX_COMPILER` ends with `hipcc`, then we still compile using\n the original behavior and emit a warning that recommends switching\n to the new HIP support. Similarly, if `AMDGPU_TARGETS` is set but\n `CMAKE_HIP_ARCHITECTURES` is not, then we forward `AMDGPU_TARGETS`\n to `CMAKE_HIP_ARCHITECTURES` to ease the transition to the new\n HIP support.\n\n Signed-off-by: Gavin Zhao \n\ncommit f4bd8b3d260bb09491ba63c77ab7012b744362ef\nAuthor: Radoslav Gerganov \nDate: Fri May 17 17:25:44 2024 +0300\n\n rpc : set SO_REUSEADDR for the server socket (#7320)\n\n ref: #7293","shortMessageHtmlLink":"[auto] Sync version 2405171814.0.0+llamacpp-release.b2915"}},{"before":"f963752739c2ea6aac0a23bc593610e4a1370ff6","after":"535ebebca899ecfe9563cb5b6816862aed669c3f","ref":"refs/heads/main","pushedAt":"2024-05-17T12:19:42.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"[auto] Sync version 2405171218.0.0+llamacpp-release.b2910\n\n== Relevant log messages from source repo:\n\ncommit 27b040691cbe45314147c2745e891a38e9c048d4\nAuthor: fairydreaming <166155368+fairydreaming@users.noreply.github.com>\nDate: Fri May 17 13:24:38 2024 +0200\n\n llama : use n_embd_head_v when reshaping kqv (#7327)\n\n * llama : use n_embd_head_v instead of n_embd_head_k when reshaping kqv\n\n * llama : use n_embd_v_gqa and n_embd_head_v instead of n_embd_k_gqa and n_embd_head_k when making a view of cached value vectors.\n\n ---------\n\n Co-authored-by: Stanisław Szymczyk \n\ncommit 29c60d8cddcfd14fa8a6bf023a6c4eb8692c76ba\nAuthor: Johannes Gäßler \nDate: Fri May 17 09:59:57 2024 +0200\n\n tokenization: add warning for double BOS (#7332)\n\ncommit 359cbe3f46c90ce6f5151005e411b8fb74f8139e\nAuthor: Herman Semenov \nDate: Fri May 17 07:08:49 2024 +0000\n\n ggml-quants, llama : removed excess checks (#7274)\n\ncommit 934266c0e0b2aa9781fdba2deb112c161ff038a9\nAuthor: Justine Tunney \nDate: Fri May 17 02:58:52 2024 -0400\n\n ggml : rewrite silu and softmax for cpu (#7154)\n\n This change upstreams llamafile's vectorized expf() functions. This lets\n us compute softmax and silu more accurately than the short[65536] lookup\n table that GGML previously used to make this operation go faster. We can\n support aarch64 and sse2+ with the worst case rounding error of 2ulp. It\n makes make -j8 tests && ./tests/test-backend-ops -o SOFT_MAX -b CPU perf\n go 1.5x faster for SSE2+FMA, 1.9x faster for AVX2+FMA and 2.1x on AVX512","shortMessageHtmlLink":"[auto] Sync version 2405171218.0.0+llamacpp-release.b2910"}},{"before":"afa5b698a17de264d6715a8eada1711cedc7631f","after":"f963752739c2ea6aac0a23bc593610e4a1370ff6","ref":"refs/heads/main","pushedAt":"2024-05-16T12:20:35.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"[auto] Sync version 2405161219.0.0+llamacpp-release.b2901\n\n== Relevant log messages from source repo:\n\ncommit 3b3963c55c8332e33533c44b2aa882b0e45f8292\nAuthor: Radoslav Gerganov \nDate: Wed May 15 15:29:07 2024 +0300\n\n rpc : add command line arg for specifying backend memory\n\n ref: #7293\n\ncommit 0350f5815218c483fb3026a86adc44a115481625\nAuthor: Herman Semenov \nDate: Thu May 16 06:14:24 2024 +0000\n\n grammar, json, llama: replace push on emplace if it possible (#7273)\n\ncommit 13ad16af1231ab2d245d35df3295bcfa23de1305\nAuthor: Max Krasnyansky \nDate: Wed May 15 19:47:36 2024 -0700\n\n Add support for properly optimized Windows ARM64 builds with LLVM and MSVC (#7191)\n\n * logging: add proper checks for clang to avoid errors and warnings with VA_ARGS\n\n * build: add CMake Presets and toolchian files for Windows ARM64\n\n * matmul-int8: enable matmul-int8 with MSVC and fix Clang warnings\n\n * ci: add support for optimized Windows ARM64 builds with MSVC and LLVM\n\n * matmul-int8: fixed typos in q8_0_q8_0 matmuls\n\n Co-authored-by: Georgi Gerganov \n\n * matmul-int8: remove unnecessary casts in q8_0_q8_0\n\n ---------\n\n Co-authored-by: Georgi Gerganov ","shortMessageHtmlLink":"[auto] Sync version 2405161219.0.0+llamacpp-release.b2901"}},{"before":"f0a7b0cb083fa01b8d6eeec07268ac7d96fa47ae","after":"afa5b698a17de264d6715a8eada1711cedc7631f","ref":"refs/heads/main","pushedAt":"2024-05-16T00:51:11.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"[auto] Sync version 2405160049.0.0+llamacpp-release.b2894\n\n== Relevant log messages from source repo:\n\ncommit e1b40ac3b94824d761b5e26ea1bc5692706029d9\nAuthor: kunnis \nDate: Wed May 15 12:59:12 2024 -0500\n\n ggml : use dynamic thread scheduling for matrix multiplication (#6915)\n\n * Just reordering some structs.\n\n * Adding in the calls to mm_pause\n\n * Passing around the state\n\n * Renaming and moving a bunch of variables around.\n\n * Extracting the logic to it's own function.\n\n * Moving some variable definitions into the chunk function.\n\n * Moving some variables around\n\n * moving src1_cont inside\n\n * Moving row_size\n\n * adding the current_chunk\n\n * Reorg the code.\n\n * Formatting to match the orig patch\n\n * starting to setup the chunking variables\n\n * Starting the buildup of the loop\n\n * The yield shouldn't be necessary.\n\n * adding the looping structure based on the chunk configuration.\n\n * Add in the re-chunking code.\n\n * Making it much more likely to rechunk.\n\n * disable resizing if numa is enabled.\n\n * Updating comments with what we've learned.\n\n * Fix formatting\n\n * Couple more formatting fixes.\n\n * More style fixes.\n\n * Fix Warnings\n\n * Going with unused because there's conditional logic that needs it.\n\n * Update ggml.c\n\n * Update ggml.c\n\n ---------\n\ncommit dc020985b8755dd6aa93a2f002f43c3ede808cce\nAuthor: agray3 \nDate: Wed May 15 14:44:49 2024 +0100\n\n Avoid unnecessarily disabling CUDA graphs (#7302)\n\n As discussed in PR #6766, CUDA graphs were being disabled in the presence of long prompts.\n This fixes the issue by avoiding the consective update counter from incrementing unnecessarily\n for tokens in which cuda graphs are disabled due to batch size > 1.\n\ncommit 344f9126cc0d15891fde9472fe40b8572628ad7d\nAuthor: slaren \nDate: Wed May 15 15:08:48 2024 +0200\n\n ggml : tag ggml_tensor::backend as deprecated (#7290)\n\ncommit 9a17ab914b0aa7353389c656a3f2a0f086726868\nAuthor: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com>\nDate: Wed May 15 13:26:30 2024 +0100\n\n Add missing \" (#7303)\n\ncommit 48aa8fd1f213a69b41569f809cc954f24dbc4366\nAuthor: John Balis \nDate: Wed May 15 03:52:33 2024 -0500\n\n ggml : add `ggml_upscale_ext` (ggml/814)\n\n * initial commit with CPU implementation of upscale to shape and test, cuda implementation next\n\n * experimental commit to see if dst shape is correct\n\n * test version\n\n * test\n\n * removed unnecessary params\n\n * refactor\n\n * fixed tests\n\n * ggml : metal impl + cleanup + sycl dev warnings\n\n * patched ggml_upscale cuda op to handle non-contiguous tensors, added test for non-contiguous behavior\n\n * metal : fix upsacle op to support nb00 + style\n\n ---------\n\n Co-authored-by: Georgi Gerganov ","shortMessageHtmlLink":"[auto] Sync version 2405160049.0.0+llamacpp-release.b2894"}},{"before":"4b1eefd41654676f077b6aeaac80ae86c0b62f32","after":"f0a7b0cb083fa01b8d6eeec07268ac7d96fa47ae","ref":"refs/heads/main","pushedAt":"2024-05-15T12:21:40.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"[auto] Sync version 2405151221.0.0+llamacpp-release.b2885\n\n== Relevant log messages from source repo:\n\ncommit e8a7fd4fb06d82f663850c21fcf86c0fb98ad9b4\nAuthor: Georgi Gerganov \nDate: Tue May 14 19:09:30 2024 +0300\n\n metal : support FA without mask + add asserts (#7278)\n\n * ggml : fa without mask + add asserts\n\n ggml-ci\n\n * metal : support non-contiguous KV\n\n ggml-ci\n\ncommit f308ea705974dff62a1fe5367d776ad9d5109239\nAuthor: Georgi Gerganov \nDate: Mon May 13 11:01:07 2024 +0300\n\n metal : tune soft_max number of threads (whisper/0)\n\ncommit c3c88f296a72432edb697ac8026dbf2ec18f2b21\nAuthor: Georgi Gerganov \nDate: Sun May 12 20:36:31 2024 +0300\n\n ggml : try fix ppc64 (whisper/0)\n\ncommit 182adefcf36fc5f4263082ff032c0796fda65578\nAuthor: Przemysław Pawełczyk \nDate: Wed May 8 17:33:43 2024 +0200\n\n ggml : expose SSE3 and SSSE3 for MSVC when AVX is available (whisper/2128)\n\ncommit 0d26d8ccd8caebab75af697c0275f599075fdacf\nAuthor: Hong Bo PENG \nDate: Sun May 12 17:17:18 2024 +0800\n\n ggml : optimize for ppc64le using VSX intrinsics (ggml/784)\n\n * optimize for ppc64le using VSX intrinsics\n\n * 1. code clean up by removing comments about overflow concern.\n\n 2. fix typo in suffix of scaling.\n\n * Continue to fix typo in suffix of scaling for QK_K <> 256\n\n ---------\n\n Co-authored-by: Georgi Gerganov ","shortMessageHtmlLink":"[auto] Sync version 2405151221.0.0+llamacpp-release.b2885"}},{"before":"8bfa3c6e8db9e5a977a459a272ea3aa562ff5275","after":"4b1eefd41654676f077b6aeaac80ae86c0b62f32","ref":"refs/heads/main","pushedAt":"2024-05-14T18:14:05.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"[auto] Sync version 2405141813.0.0+llamacpp-release.b2877\n\n== Relevant log messages from source repo:\n\ncommit 5e31828d3e35c76ecfee665bc23771a4bec1d130\nAuthor: Radoslav Gerganov \nDate: Tue May 14 14:27:19 2024 +0300\n\n ggml : add RPC backend (#6829)\n\n * ggml : add RPC backend\n\n The RPC backend proxies all operations to a remote server which runs a\n regular backend (CPU, CUDA, Metal, etc).\n\n * set TCP_NODELAY\n\n * add CI workflows\n\n * Address review comments\n\n * fix warning\n\n * implement llama_max_devices() for RPC\n\n * Address review comments\n\n * Address review comments\n\n * wrap sockfd into a struct\n\n * implement get_alignment and get_max_size\n\n * add get_device_memory\n\n * fix warning\n\n * win32 support\n\n * add README\n\n * readme : trim trailing whitespace\n\n * Address review comments\n\n * win32 fix\n\n * Address review comments\n\n * fix compile warnings on macos\n\ncommit 541600201e6480f54ae09e58d16b154d4b4b331d\nAuthor: slaren \nDate: Tue May 14 09:33:42 2024 +0200\n\n llama : disable pipeline parallelism with nkvo (#7265)","shortMessageHtmlLink":"[auto] Sync version 2405141813.0.0+llamacpp-release.b2877"}},{"before":"706336c3e009dcf32c40a89a159c6857d529e6c4","after":"8bfa3c6e8db9e5a977a459a272ea3aa562ff5275","ref":"refs/heads/main","pushedAt":"2024-05-14T12:20:50.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"[auto] Sync version 2405141220.0.0+llamacpp-release.b2874\n\n== Relevant log messages from source repo:\n\ncommit e0f556186b6e1f2b7032a1479edf5e89e2b1bd86\nAuthor: Haggai Nuchi \nDate: Mon May 13 22:25:56 2024 -0700\n\n Add left recursion check: quit early instead of going into an infinite loop (#7083)\n\n * Add left recursion check: quit early instead of going into an infinite loop\n\n * Remove custom enum, rename left recursion check and move to \"grammar internal\" section, add handling for edge case where a leftmost nonterminal may be empty\n\n * Remove unnecessary declaration","shortMessageHtmlLink":"[auto] Sync version 2405141220.0.0+llamacpp-release.b2874"}},{"before":"af570835d801e759eeb7090937041713cde530a4","after":"706336c3e009dcf32c40a89a159c6857d529e6c4","ref":"refs/heads/main","pushedAt":"2024-05-14T00:49:53.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"[auto] Sync version 2405140049.0.0+llamacpp-release.b2871\n\n== Relevant log messages from source repo:\n\ncommit 614d3b914e1c3e02596f869649eb4f1d3b68614d\nAuthor: Georgi Gerganov \nDate: Mon May 13 17:15:15 2024 +0300\n\n llama : less KV padding when FA is off (#7257)\n\n ggml-ci\n\ncommit 948f4ec7c5bff92b18e63303f2b2d1645bccd943\nAuthor: Neo Zhang <14088817+arthw@users.noreply.github.com>\nDate: Mon May 13 18:11:26 2024 +0800\n\n [SYCL] rm wait() (#7233)","shortMessageHtmlLink":"[auto] Sync version 2405140049.0.0+llamacpp-release.b2871"}},{"before":"2c3c9a5058e29ed60fa222cb4321b0a00646a579","after":"af570835d801e759eeb7090937041713cde530a4","ref":"refs/heads/main","pushedAt":"2024-05-13T18:12:43.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"[auto] Sync version 2405131812.0.0+llamacpp-release.b2867\n\n== Relevant log messages from source repo:\n\ncommit 9aa672490c848e45eaa704a554e0f1f6df995fc8\nAuthor: Joan Fontanals \nDate: Mon May 13 10:35:14 2024 +0200\n\n llama : rename jina tokenizers to v2 (#7249)\n\n * refactor: rename jina tokenizers to v2\n\n * refactor: keep refactoring non-breaking","shortMessageHtmlLink":"[auto] Sync version 2405131812.0.0+llamacpp-release.b2867"}},{"before":"29c9e7bfd7d5347726f9435ac0de13f5ae7a6eee","after":"2c3c9a5058e29ed60fa222cb4321b0a00646a579","ref":"refs/heads/main","pushedAt":"2024-05-13T00:51:29.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"[auto] Sync version 2405130050.0.0+llamacpp-release.b2862\n\n== Relevant log messages from source repo:\n\ncommit dc685be46622a8fabfd57cfa804237c8f15679b8\nAuthor: Johannes Gäßler \nDate: Sun May 12 19:40:45 2024 +0200\n\n CUDA: add FP32 FlashAttention vector kernel (#7188)\n\n * CUDA: add FP32 FlashAttention vector kernel\n\n * fixup! CUDA: add FP32 FlashAttention vector kernel\n\n * fixup! fixup! CUDA: add FP32 FlashAttention vector kernel\n\n * fixup! fixup! fixup! CUDA: add FP32 FlashAttention vector kernel","shortMessageHtmlLink":"[auto] Sync version 2405130050.0.0+llamacpp-release.b2862"}},{"before":"b8d5fe6b14e828c73f560f53c823f8520ab36177","after":"29c9e7bfd7d5347726f9435ac0de13f5ae7a6eee","ref":"refs/heads/main","pushedAt":"2024-05-12T18:15:02.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"[auto] Sync version 2405121814.0.0+llamacpp-release.b2861\n\n== Relevant log messages from source repo:\n\ncommit 6f1b63606fc68a09d62d1d74dbd156c35219026d\nAuthor: Georgi Gerganov \nDate: Sun May 12 18:30:23 2024 +0300\n\n cmake : fix version cmp (#7227)","shortMessageHtmlLink":"[auto] Sync version 2405121814.0.0+llamacpp-release.b2861"}},{"before":"f5881353113e4fe4c2d25d469f30a16e2e2f7534","after":"b8d5fe6b14e828c73f560f53c823f8520ab36177","ref":"refs/heads/main","pushedAt":"2024-05-12T06:14:36.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"[auto] Sync version 2405120614.0.0+llamacpp-release.b2860\n\n== Relevant log messages from source repo:\n\ncommit b228aba91ac2cd9eb90e9d423ba1d0d20e0117e2\nAuthor: slaren \nDate: Sun May 12 02:29:33 2024 +0200\n\n remove convert-lora-to-ggml.py (#7204)","shortMessageHtmlLink":"[auto] Sync version 2405120614.0.0+llamacpp-release.b2860"}},{"before":"fa8aed85cec5c4e55edf01be877c140a76cbf3b6","after":"f5881353113e4fe4c2d25d469f30a16e2e2f7534","ref":"refs/heads/main","pushedAt":"2024-05-12T00:53:41.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"[auto] Sync version 2405120053.0.0+llamacpp-release.b2859\n\n== Relevant log messages from source repo:\n\ncommit 7bd4ffb78062587e4012a1c24186223f09b1bc70\nAuthor: Georgi Gerganov \nDate: Sat May 11 21:36:20 2024 +0300\n\n metal : fix warnings (skipme) (#0)\n\ncommit 6aeff24f8b91e145e92d17ec7ce3adc4ef60b8e9\nAuthor: Georgi Gerganov \nDate: Sat May 11 16:57:53 2024 +0300\n\n metal : fix indent (ggml/0)\n\ncommit 325756d28df7d018a7bac424e1b3bc8acb4ecf07\nAuthor: Georgi Gerganov \nDate: Sat May 11 16:25:50 2024 +0300\n\n ggml : resolve merge (ggml/0)\n\n ggml-ci","shortMessageHtmlLink":"[auto] Sync version 2405120053.0.0+llamacpp-release.b2859"}},{"before":"825ca044592eeeca4eea627d175da39722d25147","after":"fa8aed85cec5c4e55edf01be877c140a76cbf3b6","ref":"refs/heads/main","pushedAt":"2024-05-11T18:14:05.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"[auto] Sync version 2405111813.0.0+llamacpp-release.b2854\n\n== Relevant log messages from source repo:\n\ncommit f5ef34e428f3886544590ecb2d532e4d333c114c\nAuthor: Justina Cho \nDate: Wed May 1 14:44:26 2024 -0700\n\n feat: implemented sigmoid function (ggml/806)\n\n * added sigmoid function\n\n * implemented metal kernel for sigmoid\n\n * implemented cuda kernel for sigmoid\n\n * added sigmoid unary op and incremented count\n\ncommit ef0d5e3ec9f99003af3ff326384816c02850ea3f\nAuthor: Borislav Stanimirov \nDate: Thu Apr 25 17:24:07 2024 +0300\n\n build: fix and ignore msvc warnings (ggml/805)\n\ncommit f99e1e456eaf69cc38c1982a2693ce41c0f897ef\nAuthor: Haoxiang Fei \nDate: Sat May 11 16:12:06 2024 +0800\n\n llama : lookup word in vocab before doing BPE merges (#7193)\n\n * fix: llama-3 ignore_merges\n\n * test: add test for llama-3 bpe ignore_merges\n\n * fix: set ignore_merges only for llama-3\n\n * fix: test-tokenizer-1-bpe --ingore-merges detection\n\n * fix: copy to fix fallthrough\n\n * fix: change ignore_merges to bool\n\n * fix: add ignore merges tests to cmake\n\n * llama : alternative merge ignore logic\n\n ---------\n\n Co-authored-by: Haoxiang Fei \n Co-authored-by: Georgi Gerganov ","shortMessageHtmlLink":"[auto] Sync version 2405111813.0.0+llamacpp-release.b2854"}},{"before":"0a885b81253a5c5b6af8249e99fcc0aee8d4fe23","after":"825ca044592eeeca4eea627d175da39722d25147","ref":"refs/heads/main","pushedAt":"2024-05-11T12:18:17.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"[auto] Sync version 2405111217.0.0+llamacpp-release.b2846\n\n== Relevant log messages from source repo:\n\ncommit b83cc3f5b303ff30c52874b2d5864dc6385ebf9f\nAuthor: Joan Fontanals \nDate: Sat May 11 09:46:09 2024 +0200\n\n llama : add Jina Embeddings architecture (#6826)\n\n * feat: first things to do\n\n * feat: create tensors for Jina architecture\n\n * fix: use other tensors\n\n * feat: embedding gets results\n\n * fix: fix usage of ALIBI\n\n * fix: clean prints\n\n * fix: do some cleanup unused vars\n\n * fix: revert changes to Makefile and CMakeLists\n\n * fix: revert some changes\n\n * fix: fix small detail\n\n * fix: fix convert formatting\n\n * fix: fix linting and editor\n\n * feat: set proper vocab settings\n\n * fix: JinaBertForMaskedLM registration\n\n * feat: support q_normalization and k_normalization in Jina arch\n\n * feat: handle gpt2 tokenizer with Jina architecture\n\n * feat: example comments in embedding\n\n * feat: rename Jina Bert to Jina Bert V2\n\n * fix: add some changes as per review\n\n * feat: proper KQ_pos for Jina embeddings\n\n * feat: add capacity to load models ES and DE for Spanish\n\n * llama : fix pre-tokenizers\n\n * ggml : full ALiBi support\n\n * ggml : update ggml_soft_max_ext() CUDA, SYCL\n\n * ggml : ggml_flash_attn_ext() support ALiBi (CPU)\n\n * ggml : ggml_flash_attn_ext() support ALiBi (Metal)\n\n * ggml : fix warning\n\n * ggml : ggml_flash_attn_ext() support ALiBi (CUDA)\n\n ggml-ci\n\n * minor : clean-up\n\n * embedding : add warning about missing SEP\n\n ---------\n\n Co-authored-by: Georgi Gerganov \n\ncommit 9cb317f77e53067f7a138cc89ef7657148eae8e6\nAuthor: Georgi Gerganov \nDate: Sat May 11 10:32:41 2024 +0300\n\n ggml : full ALiBi support (#7192)\n\n * ggml : full ALiBi support\n\n * ggml : update ggml_soft_max_ext() CUDA, SYCL\n\n * ggml : ggml_flash_attn_ext() support ALiBi (CPU)\n\n * ggml : ggml_flash_attn_ext() support ALiBi (Metal)\n\n * ggml : fix warning\n\n * ggml : ggml_flash_attn_ext() support ALiBi (CUDA)\n\n ggml-ci\n\n * ggml : fix assert message\n\n * vulkan : add dev notes\n\n * ggml : require mask when using ALiBi\n\n ggml-ci\n\n * convert : fix convert for refact models","shortMessageHtmlLink":"[auto] Sync version 2405111217.0.0+llamacpp-release.b2846"}},{"before":"9c7005494dbc30aaa9e2248c24e00e03b6d5a65c","after":"0a885b81253a5c5b6af8249e99fcc0aee8d4fe23","ref":"refs/heads/main","pushedAt":"2024-05-11T00:49:02.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"[auto] Sync version 2405110047.0.0+llamacpp-release.b2843\n\n== Relevant log messages from source repo:\n\ncommit 18e437665ce626dddbd79119aa7498493e7cb13b\nAuthor: Georgi Gerganov \nDate: Fri May 10 18:20:10 2024 +0300\n\n metal : fix flash attention kernel requirements (#7169)\n\n * metal : fix flash attention kernel requirements\n\n ggml-ci\n\n * metal : fix ggml_metal_supports_op\n\n ggml-ci","shortMessageHtmlLink":"[auto] Sync version 2405110047.0.0+llamacpp-release.b2843"}},{"before":"e350ec4303df0657f52230b0933d13dc5b2f62be","after":"9c7005494dbc30aaa9e2248c24e00e03b6d5a65c","ref":"refs/heads/main","pushedAt":"2024-05-10T18:14:09.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"[auto] Sync version 2405101813.0.0+llamacpp-release.b2840\n\n== Relevant log messages from source repo:\n\ncommit 25c6e82e7a1ad25a42b0894e87d9b5c557409516\nAuthor: slaren \nDate: Fri May 10 14:28:01 2024 +0200\n\n llama : use n_vocab to differentiate between mistral 7B and llama3 8B (#7200)","shortMessageHtmlLink":"[auto] Sync version 2405101813.0.0+llamacpp-release.b2840"}},{"before":"317fa6b702a6ecaa23d7b848e50322b29f5e22ef","after":"e350ec4303df0657f52230b0933d13dc5b2f62be","ref":"refs/heads/main","pushedAt":"2024-05-10T06:14:53.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"[auto] Sync version 2405100614.0.0+llamacpp-release.b2836\n\n== Relevant log messages from source repo:\n\ncommit 8c570c9496212073079476651c7517c02581101f\nAuthor: Ouadie EL FAROUKI \nDate: Fri May 10 01:32:15 2024 +0100\n\n Minor arithmetic improvement to mmvq wrapper kernel (#7172)","shortMessageHtmlLink":"[auto] Sync version 2405100614.0.0+llamacpp-release.b2836"}},{"before":"01956f1f0551cd158e239eaa02ae20a84d7765b3","after":"317fa6b702a6ecaa23d7b848e50322b29f5e22ef","ref":"refs/heads/main","pushedAt":"2024-05-10T00:48:59.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"[auto] Sync version 2405100048.0.0+llamacpp-release.b2835\n\n== Relevant log messages from source repo:\n\ncommit befddd0f15de6efb15d7e7f5b527dfb671f4196f\nAuthor: 0cc4m \nDate: Thu May 9 20:39:54 2024 +0200\n\n Vulkan Bugfixes and Improvements (#7084)\n\n * Modify mat mat mul shader for mul_mat_id, modify mat vec mul shaders for single call batch operation\n\n * Further work towards MoE, disabled for now\n\n * Disable MoE code (not ready yet), fix a number of bugs in shaders and Vulkan code\n\n * Add softmax with f16 mask and pos buffer support\n\n * Disable mul_mat_id shaders for now\n\n * Fix flake8\n\n * Fix validation errors caused by empty buffers on larger batch sizes\n\ncommit 43248e559472556f368988575d9fba906b3eb139\nAuthor: jaime-m-p <167997752+jaime-m-p@users.noreply.github.com>\nDate: Thu May 9 15:30:44 2024 +0200\n\n llama3 custom regex split (#6965)\n\n * merged the changes from deepseeker models to main branch\n\n * Moved regex patterns to unicode.cpp and updated unicode.h\n\n * Moved header files\n\n * Resolved issues\n\n * added and refactored unicode_regex_split and related functions\n\n * Updated/merged the deepseek coder pr\n\n * Refactored code\n\n * Adding unicode regex mappings\n\n * Adding unicode regex function\n\n * Added needed functionality, testing remains\n\n * Fixed issues\n\n * Fixed issue with gpt2 regex custom preprocessor\n\n * unicode : fix? unicode_wstring_to_utf8\n\n * lint : fix whitespaces\n\n * tests : add tokenizer tests for numbers\n\n * unicode : remove redundant headers\n\n * tests : remove and rename tokenizer test scripts\n\n * tests : add sample usage\n\n * gguf-py : reader prints warnings on duplicate keys\n\n * llama : towards llama3 tokenization support (wip)\n\n * unicode : shot in the dark to fix tests on Windows\n\n * unicode : first try custom implementations\n\n * convert : add \"tokenizer.ggml.pre\" GGUF KV (wip)\n\n * llama : use new pre-tokenizer type\n\n * convert : fix pre-tokenizer type writing\n\n * lint : fix\n\n * make : add test-tokenizer-0-llama-v3\n\n * wip\n\n * models : add llama v3 vocab file\n\n * llama : adapt punctuation regex + add llama 3 regex\n\n * minor\n\n * unicode : set bomb\n\n * unicode : set bomb\n\n * unicode : always use std::wregex\n\n * unicode : support \\p{N}, \\p{L} and \\p{P} natively\n\n * unicode : try fix windows\n\n * unicode : category support via std::regex\n\n * unicode : clean-up\n\n * unicode : simplify\n\n * llama3 custom regex split\n\n * convert : add convert-hf-to-gguf-update.py\n\n ggml-ci\n\n * lint : update\n\n * convert : add falcon\n\n ggml-ci\n\n * unicode : normalize signatures\n\n * lint : fix\n\n * lint : fix\n\n * convert : remove unused functions\n\n * convert : add comments\n\n * convert : exercise contractions\n\n ggml-ci\n\n * Using char32_t for codepoints\n\n * lint : fix\n\n * already exists unicode_tolower()\n\n * Typing\n\n * Restore BOM\n\n * cmake : refactor test targets\n\n * tests : refactor vocab tests\n\n ggml-ci\n\n * tests : add more vocabs and tests\n\n ggml-ci\n\n * unicode : cleanup\n\n * scripts : ignore new update script in check-requirements.sh\n\n * Fix merge\n\n * models : add phi-3, mpt, gpt-2, starcoder\n\n * tests : disable obsolete\n\n ggml-ci\n\n * tests : use faster bpe test\n\n ggml-ci\n\n * llama : more prominent warning for old BPE models\n\n * tests : disable test-tokenizer-1-bpe due to slowness\n\n ggml-ci\n\n * Move unused variable value\n\n * GPT2 custom regex split\n\n * Add alternative regex for custom aplit llama3\n\n Co-authored-by: Georgi Gerganov \n\n * Style\n\n * Add bruteforce random tests for token encoding\n\n * wip: fixing unicode codepoint ranges\n\n * Fix merge\n\n * Unicode tables: separator, lowercase, uppercase and whitespace\n\n * llama3 custom regex split: fix \\s\n\n * Restore BOM\n\n * Style\n\n * wip: generate NDF table\n\n * Ignore special tokens for testing\n\n * Clean gen-unicode-data.py\n\n * Refactor random tokenizer test\n\n * lint : fix\n\n * tests : add fail test for llama-bpe\n\n ---------\n\n Co-authored-by: Jaggzh \n Co-authored-by: Kazim Abrar Mahi \n Co-authored-by: Georgi Gerganov \n Co-authored-by: jaime-m-p <>\n\ncommit a743d76a01f23038b2c85af1e9048ee836767b44\nAuthor: Johannes Gäßler \nDate: Thu May 9 14:32:02 2024 +0200\n\n CUDA: generalize FP16 fattn vec kernel (#7061)\n\n * CUDA: generalize FP16 fattn vec kernel\n\n * disable unsupported head sizes for AMD in test\n\n * try AMD fix\n\n * fix batch size 2-8\n\n * partially revert changes","shortMessageHtmlLink":"[auto] Sync version 2405100048.0.0+llamacpp-release.b2835"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAET08f_QA","startCursor":null,"endCursor":null}},"title":"Activity · KerfuffleV2/ggml-sys-bleedingedge"}