starcoder2 : change rope type to neox #1

ggerganov · 2024-03-01T13:14:25Z

Did some anecdotal tests and this seems to improve the results. Will cross-check with the reference implementation to confirm that this is correct

make -j && ./main -m models/starcoder2-3b/ggml-model-f16.gguf -p "#python code for efficient implemetation of two_sum\ndef two_sum(arr, target_sum):\n" -n 256 -e --temp 0 -ngl 99 --verbose-prompt

system_info: n_threads = 16 / 24 | AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | 

main: prompt: '#python code for efficient implemetation of two_sum
def two_sum(arr, target_sum):
'
main: number of tokens in prompt = 26
    40 -> '#'
  2980 -> 'python'
  1361 -> ' code'
   456 -> ' for'
 17505 -> ' efficient'
  1378 -> ' imp'
   293 -> 'le'
  2580 -> 'met'
   387 -> 'ation'
   451 -> ' of'
  3161 -> ' two'
   100 -> '_'
  1055 -> 'sum'
   222 -> '
'
   610 -> 'def'
  3161 -> ' two'
   100 -> '_'
  1055 -> 'sum'
    45 -> '('
   865 -> 'arr'
    49 -> ','
  1780 -> ' target'
   100 -> '_'
  1055 -> 'sum'
   731 -> '):'
   222 -> '
'

sampling: 
	repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
	top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.000
	mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order: 
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature 
generate: n_ctx = 512, n_batch = 512, n_predict = 256, n_keep = 0


#python code for efficient implemetation of two_sum
def two_sum(arr, target_sum):
	length = len(arr)
	for i in range (0, length-1):
		for j in range (i+1, length):
			if arr[i] + arr[j] == target_sum:
				return [i, j]
	return [-1,-1]

#python code for efficient implemetation of three_sum
def three_sum(arr, target_sum):
	length = len(arr)
	for i in range (0, length-2):
		for j in range (i+1, length-1):
			for k in range (j+1, length):
				if arr[i] + arr[j] + arr[k] == target_sum:
					return [i, j, k]
	return [-1,-1,-1]

#python code for efficient implemetation of four_sum
def four_sum(arr, target_sum):
	length = len(arr)
	for i in range (0, length-3):
		for j in range (i+1, length-2):
			for k in range (j+
llama_print_timings:        load time =     224.24 ms
llama_print_timings:      sample time =      33.05 ms /   256 runs   (    0.13 ms per token,  7745.84 tokens per second)
llama_print_timings: prompt eval time =      41.90 ms /    26 tokens (    1.61 ms per token,   620.51 tokens per second)
llama_print_timings:        eval time =    4009.60 ms /   255 runs   (   15.72 ms per token,    63.60 tokens per second)
llama_print_timings:       total time =    4118.97 ms /   281 tokens

pacman100 · 2024-03-01T13:35:53Z

Thank you @ggerganov! 😄

llama : change starcoder2 rope type

9862d59

pacman100 merged commit 15f233b into pacman100:smangrul/add-starcoder2-support Mar 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

starcoder2 : change rope type to neox #1

starcoder2 : change rope type to neox #1

ggerganov commented Mar 1, 2024 •

edited

pacman100 commented Mar 1, 2024

starcoder2 : change rope type to neox #1

starcoder2 : change rope type to neox #1

Conversation

ggerganov commented Mar 1, 2024 • edited

pacman100 commented Mar 1, 2024

ggerganov commented Mar 1, 2024 •

edited