Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

starcoder2 : change rope type to neox #1

Conversation

ggerganov
Copy link

@ggerganov ggerganov commented Mar 1, 2024

Did some anecdotal tests and this seems to improve the results. Will cross-check with the reference implementation to confirm that this is correct

make -j && ./main -m models/starcoder2-3b/ggml-model-f16.gguf -p "#python code for efficient implemetation of two_sum\ndef two_sum(arr, target_sum):\n" -n 256 -e --temp 0 -ngl 99 --verbose-prompt
system_info: n_threads = 16 / 24 | AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | 

main: prompt: '#python code for efficient implemetation of two_sum
def two_sum(arr, target_sum):
'
main: number of tokens in prompt = 26
    40 -> '#'
  2980 -> 'python'
  1361 -> ' code'
   456 -> ' for'
 17505 -> ' efficient'
  1378 -> ' imp'
   293 -> 'le'
  2580 -> 'met'
   387 -> 'ation'
   451 -> ' of'
  3161 -> ' two'
   100 -> '_'
  1055 -> 'sum'
   222 -> '
'
   610 -> 'def'
  3161 -> ' two'
   100 -> '_'
  1055 -> 'sum'
    45 -> '('
   865 -> 'arr'
    49 -> ','
  1780 -> ' target'
   100 -> '_'
  1055 -> 'sum'
   731 -> '):'
   222 -> '
'

sampling: 
	repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
	top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.000
	mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order: 
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature 
generate: n_ctx = 512, n_batch = 512, n_predict = 256, n_keep = 0


#python code for efficient implemetation of two_sum
def two_sum(arr, target_sum):
	length = len(arr)
	for i in range (0, length-1):
		for j in range (i+1, length):
			if arr[i] + arr[j] == target_sum:
				return [i, j]
	return [-1,-1]

#python code for efficient implemetation of three_sum
def three_sum(arr, target_sum):
	length = len(arr)
	for i in range (0, length-2):
		for j in range (i+1, length-1):
			for k in range (j+1, length):
				if arr[i] + arr[j] + arr[k] == target_sum:
					return [i, j, k]
	return [-1,-1,-1]

#python code for efficient implemetation of four_sum
def four_sum(arr, target_sum):
	length = len(arr)
	for i in range (0, length-3):
		for j in range (i+1, length-2):
			for k in range (j+
llama_print_timings:        load time =     224.24 ms
llama_print_timings:      sample time =      33.05 ms /   256 runs   (    0.13 ms per token,  7745.84 tokens per second)
llama_print_timings: prompt eval time =      41.90 ms /    26 tokens (    1.61 ms per token,   620.51 tokens per second)
llama_print_timings:        eval time =    4009.60 ms /   255 runs   (   15.72 ms per token,    63.60 tokens per second)
llama_print_timings:       total time =    4118.97 ms /   281 tokens

@pacman100 pacman100 merged commit 15f233b into pacman100:smangrul/add-starcoder2-support Mar 1, 2024
@pacman100
Copy link
Owner

Thank you @ggerganov! 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants