taking long time to give response (around 2 min) #1896

mbbutt · 2024-11-07T12:31:59Z

Hello

I am running in the following machine.

CPU: 12th Gen Intel(R) Core(TM) i7-12700
RAM: 32GB, speed: 4400MT/s
NVIDIA RTX A2000 12GB

model is:
llama-2-7b-chat.Q6_K.gguf

And it takes around 2 min to start giving a response.
is it reasonable or it should be faster?

bat command to start the bot

"C:\Users\Public\pyenv-win\pyenv-win\bin\.h2o\Scripts\python.exe"^
 "generate.py"^
 --share=False ^
 --auth=[('jon','password')] ^
 --auth_access=closed ^
 --gradio_offline_level=1 ^
 --base_model="llama" ^
 --prompt_type=llama2 ^
 --model_path_llama=C:\Users\Public\git\h2ogpt\llama-2-7b-chat.Q6_K.gguf^
 --score_model=None ^
 --langchain_mode="LLLM" ^
 --user_path=user_path ^
 --load_4bit=True ^
 --llamacpp_dict="{'n_gpu_layers':5}"

While running idle
it is taking 7GB GPU memory (remains same when running the query)
24.4GB RAM (remains same when running the query)
CPU utilization stays 2 to 3%

When running the query CPU utilization goes closer to 100%
GPU remains 1% to 2%

and it takes around 2 min to start giving a response.

It seems it is not utilizing GPU at all.
could you please see what i am doing wrong here?
I want to get faster response

cuda version is

C:\Windows\System32>nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:41:10_Pacific_Daylight_Time_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

below is my pip list

Package                                  Version
---------------------------------------- ---------------
absl-py                                  2.1.0
accelerate                               0.32.1
aiofiles                                 23.2.1
aiohappyeyeballs                         2.4.3
aiohttp                                  3.10.9
aiosignal                                1.3.1
altair                                   5.4.1
annotated-types                          0.7.0
anthropic                                0.8.1
antlr4-python3-runtime                   4.9.3
anyio                                    4.6.0
appdirs                                  1.4.4
APScheduler                              3.10.4
argcomplete                              3.5.1
arxiv                                    1.4.8
asgiref                                  3.8.1
async-timeout                            4.0.3
attributedict                            0.3.0
attrs                                    24.2.0
audioread                                3.0.1
Authlib                                  1.3.1
auto-gptq                                0.6.0
autoawq                                  0.1.8+cu118
autoawq_kernels                          0.0.3+cu118
babel                                    2.16.0
backoff                                  2.2.1
backports.tarfile                        1.2.0
bcrypt                                   4.2.0
beautifulsoup4                           4.12.3
bioc                                     2.1
bitsandbytes                             0.41.1
blessings                                1.7
boto3                                    1.35.35
botocore                                 1.35.35
Brotli                                   1.1.0
bs4                                      0.0.2
build                                    1.2.2.post1
cachetools                               5.5.0
certifi                                  2024.8.30
cffi                                     1.17.1
chardet                                  5.2.0
charset-normalizer                       3.3.2
chroma-bullet                            2.2.0
chroma-hnswlib                           0.7.3
chroma-migrate                           0.0.7
chromadb                                 0.4.23
chromamigdb                              0.3.26
click                                    8.1.7
clickhouse-connect                       0.6.6
codecov                                  2.1.13
colorama                                 0.4.6
coloredlogs                              15.0.1
colour-runner                            0.1.1
contourpy                                1.3.0
coverage                                 7.6.1
cryptography                             43.0.1
cssselect2                               0.7.0
cutlet                                   0.3.0
cycler                                   0.12.1
dacite                                   1.7.0
dataclasses-json                         0.6.7
DataProperty                             1.0.1
datasets                                 2.16.1
dateparser                               1.1.8
decorator                                5.1.1
deepdiff                                 8.0.1
defusedxml                               0.7.1
Deprecated                               1.2.14
diffusers                                0.24.0
dill                                     0.3.7
diskcache                                5.6.3
distlib                                  0.3.8
distro                                   1.9.0
dnspython                                2.7.0
docopt                                   0.6.2
docutils                                 0.20.1
duckdb                                   0.7.1
duckduckgo_search                        6.3.0
durationpy                               0.9
effdet                                   0.4.1
einops                                   0.8.0
emoji                                    2.14.0
et-xmlfile                               1.1.0
eval_type_backport                       0.2.0
evaluate                                 0.4.0
exceptiongroup                           1.2.2
execnet                                  2.1.1
exllama                                  0.0.18+cu118
fastapi                                  0.115.0
feedparser                               6.0.11
ffmpeg                                   1.4
ffmpy                                    0.4.0
fiftyone                                 1.0.0
fiftyone-brain                           0.17.0
fiftyone_db                              1.1.6
filelock                                 3.16.1
filetype                                 1.2.0
fire                                     0.5.0
flatbuffers                              24.3.25
fonttools                                4.54.1
frozenlist                               1.4.1
fsspec                                   2023.10.0
ftfy                                     6.2.3
fugashi                                  1.3.2
future                                   1.0.0
g2pkk                                    0.1.2
gekko                                    1.2.1
glob2                                    0.7
google-ai-generativelanguage             0.4.0
google-api-core                          2.20.0
google-auth                              2.35.0
google-generativeai                      0.3.2
google_search_results                    2.4.2
googleapis-common-protos                 1.65.0
gpt4all                                  1.0.5
gradio                                   3.50.2
gradio_client                            0.6.1
gradio_pdf                               0.0.15
gradio_tools                             0.0.9
graphql-core                             3.2.4
greenlet                                 3.0.3
grpcio                                   1.66.2
grpcio-health-checking                   1.62.3
grpcio-status                            1.62.3
grpcio-tools                             1.62.3
gruut                                    2.2.3
gruut-ipa                                0.13.0
gruut-lang-de                            2.0.1
gruut-lang-en                            2.0.1
gruut-lang-es                            2.0.1
gruut_lang_fr                            2.0.2
h11                                      0.14.0
h2                                       4.1.0
h5py                                     3.12.1
hf_transfer                              0.1.8
hnswlib                                  0.8.0
hnswmiglib                               0.7.0
hpack                                    4.0.0
html2text                                2024.2.26
html5lib                                 1.1
httpcore                                 1.0.6
httptools                                0.6.1
httpx                                    0.27.0
huggingface-hub                          0.25.1
humanfriendly                            10.0
humanize                                 4.11.0
Hypercorn                                0.17.3
hyperframe                               6.0.1
idna                                     3.10
imageio                                  2.35.1
importlib_metadata                       8.4.0
importlib_resources                      6.4.5
imutils                                  0.5.4
inflate64                                1.0.0
iniconfig                                2.0.0
inspecta                                 0.1.3
InstructorEmbedding                      1.0.1
intervaltree                             3.1.0
iopath                                   0.1.10
jaconv                                   0.4.0
jamo                                     0.4.1
jaraco.context                           6.0.1
jieba                                    0.42.1
Jinja2                                   3.1.4
jiter                                    0.6.1
jmespath                                 1.0.1
joblib                                   1.4.2
jsonlines                                1.2.0
jsonpatch                                1.33
jsonpath-python                          1.0.6
jsonpointer                              3.0.0
jsonschema                               4.23.0
jsonschema-specifications                2024.10.1
kaleido                                  0.2.1
kiwisolver                               1.4.7
kubernetes                               31.0.0
langchain                                0.0.354
langchain-community                      0.0.8
langchain-core                           0.1.6
langchain-experimental                   0.0.47
langchain-google-genai                   0.0.6
langchain-mistralai                      0.0.2
langdetect                               1.0.9
langid                                   1.1.6
langsmith                                0.0.77
layoutparser                             0.3.4
lazy_loader                              0.4
librosa                                  0.10.1
llama_cpp_python                         0.2.26+cpuavx2
llama_cpp_python_cuda                    0.2.26+cu121avx
llvmlite                                 0.43.0
lm-dataformat                            0.0.20
lm_eval                                  0.4.4
loralib                                  0.1.2
lxml                                     5.3.0
lz4                                      4.3.3
Markdown                                 3.7
markdown-it-py                           3.0.0
MarkupSafe                               2.1.5
marshmallow                              3.22.0
matplotlib                               3.9.2
mbstrdecoder                             1.1.3
mdurl                                    0.1.2
mistralai                                0.0.8
mmh3                                     5.0.1
mojimoji                                 0.0.13
mongoengine                              0.24.2
monotonic                                1.6
more-itertools                           10.5.0
motor                                    3.5.3
mplcursors                               0.5.3
mpmath                                   1.3.0
msg-parser                               1.2.0
msgpack                                  1.1.0
multidict                                6.1.0
multiprocess                             0.70.15
multivolumefile                          0.2.3
mutagen                                  1.47.0
mypy-extensions                          1.0.0
narwhals                                 1.9.1
nest-asyncio                             1.6.0
networkx                                 2.8.8
nltk                                     3.9.1
num2words                                0.5.13
numba                                    0.60.0
numexpr                                  2.10.1
numpy                                    1.23.4
oauthlib                                 3.2.2
olefile                                  0.47
omegaconf                                2.3.0
onnx                                     1.17.0
onnxruntime                              1.15.1
onnxruntime-gpu                          1.15.0
openai                                   1.51.2
opencv-python                            4.10.0.84
opencv-python-headless                   4.10.0.84
openpyxl                                 3.1.5
opentelemetry-api                        1.27.0
opentelemetry-exporter-otlp-proto-common 1.27.0
opentelemetry-exporter-otlp-proto-grpc   1.27.0
opentelemetry-instrumentation            0.48b0
opentelemetry-instrumentation-asgi       0.48b0
opentelemetry-instrumentation-fastapi    0.48b0
opentelemetry-proto                      1.27.0
opentelemetry-sdk                        1.27.0
opentelemetry-semantic-conventions       0.48b0
opentelemetry-util-http                  0.48b0
openvino                                 2022.3.0
optimum                                  1.16.1
orderly-set                              5.2.2
orjson                                   3.10.7
outcome                                  1.3.0.post0
overrides                                7.7.0
packaging                                24.1
pandas                                   2.0.2
pathvalidate                             3.2.1
pdf2image                                1.17.0
pdfminer.six                             20221105
pdfplumber                               0.10.4
peft                                     0.13.1
pikepdf                                  9.3.0
pillow                                   10.4.0
pillow_heif                              0.18.0
pip                                      23.0.1
pip-licenses                             5.0.0
platformdirs                             4.3.6
playwright                               1.47.0
plotly                                   5.24.1
pluggy                                   1.5.0
pooch                                    1.8.2
portalocker                              2.10.1
posthog                                  3.7.0
pprintpp                                 0.4.0
prettytable                              3.11.0
primp                                    0.6.3
priority                                 2.0.0
propcache                                0.2.0
proto-plus                               1.24.0
protobuf                                 4.25.5
psutil                                   6.0.0
pulsar-client                            3.5.0
py7zr                                    0.22.0
pyarrow                                  17.0.0
pyarrow-hotfix                           0.6
pyasn1                                   0.6.1
pyasn1_modules                           0.4.1
pybcj                                    1.0.2
pybind11                                 2.13.6
pyclipper                                1.3.0.post5
pycocotools                              2.0.8
pycparser                                2.22
pycryptodomex                            3.21.0
pydantic                                 2.9.2
pydantic_core                            2.23.4
pydantic-settings                        2.1.0
pydash                                   8.0.3
pydub                                    0.25.1
pydyf                                    0.11.0
pyee                                     12.0.0
Pygments                                 2.18.0
pymongo                                  4.8.0
PyMuPDF                                  1.24.11
pynvml                                   11.5.3
pypandoc                                 1.14
pypandoc_binary                          1.14
pyparsing                                3.1.4
pypdf                                    5.0.1
pypdfium2                                4.30.0
pyphen                                   0.16.0
PyPika                                   0.48.9
pyppmd                                   1.1.0
pyproject-api                            1.8.0
pyproject_hooks                          1.2.0
pyreadline3                              3.5.4
PySocks                                  1.7.1
pytablewriter                            1.2.0
pytesseract                              0.3.13
pytest                                   8.3.3
pytest-xdist                             3.6.1
python-crfsuite                          0.9.11
python-dateutil                          2.8.2
python-doctr                             0.5.4a0
python-docx                              1.1.2
python-dotenv                            1.0.1
python-iso639                            2024.4.27
python-magic                             0.4.27
python-magic-bin                         0.4.14
python-multipart                         0.0.12
python-pptx                              0.6.23
pytube                                   15.0.0
pytz                                     2024.2
pywin32                                  307
PyYAML                                   6.0.2
pyzstd                                   0.16.1
RapidFuzz                                3.10.0
rarfile                                  4.2
referencing                              0.35.1
regex                                    2024.9.11
replicate                                0.20.0
requests                                 2.32.3
requests-file                            2.1.0
requests-oauthlib                        2.0.0
requests-toolbelt                        1.0.0
responses                                0.18.0
retrying                                 1.3.4
rich                                     13.9.2
rootpath                                 0.1.1
rouge                                    1.0.1
rouge_score                              0.1.2
rpds-py                                  0.20.0
rsa                                      4.9
ruff                                     0.6.9
s3transfer                               0.10.2
sacrebleu                                2.3.1
safetensors                              0.4.5
scikit-image                             0.24.0
scikit-learn                             1.2.2
scipy                                    1.13.1
selenium                                 4.25.0
semantic-version                         2.10.0
semanticscholar                          0.8.4
sentence-transformers                    2.2.2
sentencepiece                            0.1.99
setuptools                               65.5.0
sgmllib3k                                1.0.0
Shapely                                  1.8.5.post1
shellingham                              1.5.4
six                                      1.16.0
sniffio                                  1.3.1
sortedcontainers                         2.4.0
soundfile                                0.12.1
soupsieve                                2.6
soxr                                     0.5.0.post1
SQLAlchemy                               2.0.35
sqlitedict                               2.1.0
sse-starlette                            0.10.3
sseclient-py                             1.8.0
starlette                                0.38.6
strawberry-graphql                       0.246.0
sympy                                    1.13.3
tabledata                                1.3.3
tabulate                                 0.9.0
taskgroup                                0.0.0a4
tcolorpy                                 0.1.6
tenacity                                 8.5.0
termcolor                                2.5.0
text-generation                          0.7.0
textstat                                 0.7.4
texttable                                1.7.0
threadpoolctl                            3.5.0
tifffile                                 2024.9.20
tiktoken                                 0.8.0
timm                                     1.0.9
tinycss2                                 1.3.0
tokenizers                               0.19.1
toml                                     0.10.2
tomli                                    2.0.2
tomlkit                                  0.12.0
torch                                    2.1.2+cu118
torchvision                              0.16.2+cu118
tox                                      4.21.2
tqdm                                     4.66.5
tqdm-multiprocess                        0.0.11
transformers                             4.40.2
trio                                     0.26.2
trio-websocket                           0.11.1
typepy                                   1.3.2
typer                                    0.12.5
typing_extensions                        4.12.2
typing-inspect                           0.9.0
tzdata                                   2024.2
tzlocal                                  5.2
ujson                                    5.10.0
Unidecode                                1.3.8
universal-analytics-python3              1.1.1
unstructured                             0.12.5
unstructured-client                      0.26.0
unstructured-inference                   0.7.23
unstructured.pytesseract                 0.3.13
urllib3                                  2.2.3
uvicorn                                  0.31.0
validators                               0.34.0
virtualenv                               20.26.6
voxel51-eta                              0.13.0
watchfiles                               0.24.0
wavio                                    0.0.8
wcwidth                                  0.2.13
weasyprint                               62.3
weaviate-client                          4.8.1
webencodings                             0.5.1
websocket-client                         1.8.0
websockets                               11.0.3
wikipedia                                1.4.0
wolframalpha                             5.1.3
word2number                              1.1
wrapt                                    1.16.0
wsproto                                  1.2.0
xlrd                                     2.0.1
XlsxWriter                               3.2.0
xmltodict                                0.13.0
xxhash                                   3.5.0
yarl                                     1.14.0
yt-dlp                                   2023.10.13
zipp                                     3.20.2
zopfli                                   0.2.3
zstandard                                0.23.0

The text was updated successfully, but these errors were encountered:

mbbutt · 2024-11-07T14:42:03Z

below is the log if it may help.


-----

WeasyPrint could not import some external libraries. Please carefully follow the installation steps before reporting an issue:
https://doc.courtbouillon.org/weasyprint/stable/first_steps.html#installation
https://doc.courtbouillon.org/weasyprint/stable/first_steps.html#troubleshooting

-----


-----

WeasyPrint could not import some external libraries. Please carefully follow the installation steps before reporting an issue:
https://doc.courtbouillon.org/weasyprint/stable/first_steps.html#installation
https://doc.courtbouillon.org/weasyprint/stable/first_steps.html#troubleshooting

-----

STT enabled, may use more GPU, set --enable_stt=False for low-memory systems
TTS enabled, may use more GPU, set --enable_tts=False for low-memory systems
Using Model llama
load INSTRUCTOR_Transformer
max_seq_length  512
Must install DocTR and LangChain installed if enabled DocTR, disabling
C:\Users\Public\pyenv-win\pyenv-win\bin\.h2o\lib\site-packages\huggingface_hub\file_download.py:1142: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
C:\Users\Public\pyenv-win\pyenv-win\bin\.h2o\lib\site-packages\pydub\utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
  warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
Starting get_model: llama
Failed to listen to n_gpus: Failed to load shared library 'C:\Users\Public\pyenv-win\pyenv-win\bin\.h2o\lib\site-packages\llama_cpp_cuda\llama.dll': Could not find module 'C:\Users\Public\pyenv-win\pyenv-win\bin\.h2o\lib\site-packages\llama_cpp_cuda\llama.dll' (or one of its dependencies). Try using the full path with constructor syntax.
llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from llama-2-7b-chat.Q6_K.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
llama_model_loader: - kv   2:                       llama.context_length u32              = 4096
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 11008
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 32
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  10:                          general.file_type u32              = 18
llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  15:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  16:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  17:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  18:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q6_K:  226 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format           = GGUF V2
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 4096
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 32
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 11008
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 4096
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = Q6_K
llm_load_print_meta: model params     = 6.74 B
llm_load_print_meta: model size       = 5.15 GiB (6.56 BPW)
llm_load_print_meta: general.name     = LLaMA v2
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_tensors: ggml ctx size       =    0.11 MiB
llm_load_tensors: system memory used  = 5272.45 MiB
....................................................................................................
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: KV self size  = 1024.00 MiB, K (f16):  512.00 MiB, V (f16):  512.00 MiB
llama_build_graph: non-view tensors processed: 676/676
llama_new_context_with_model: compute buffer total size = 42.19 MiB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
Auto-detected LLaMa n_ctx=4096, will unload then reload with this setting.
warning: failed to VirtualUnlock buffer: The segment is already unlocked.

Failed to listen to n_gpus: Failed to load shared library 'C:\Users\Public\pyenv-win\pyenv-win\bin\.h2o\lib\site-packages\llama_cpp_cuda\llama.dll': Could not find module 'C:\Users\Public\pyenv-win\pyenv-win\bin\.h2o\lib\site-packages\llama_cpp_cuda\llama.dll' (or one of its dependencies). Try using the full path with constructor syntax.
llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from llama-2-7b-chat.Q6_K.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
llama_model_loader: - kv   2:                       llama.context_length u32              = 4096
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 11008
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 32
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  10:                          general.file_type u32              = 18
llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  15:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  16:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  17:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  18:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q6_K:  226 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format           = GGUF V2
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 4096
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 32
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 11008
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 4096
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = Q6_K
llm_load_print_meta: model params     = 6.74 B
llm_load_print_meta: model size       = 5.15 GiB (6.56 BPW)
llm_load_print_meta: general.name     = LLaMA v2
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_tensors: ggml ctx size       =    0.11 MiB
llm_load_tensors: system memory used  = 5272.45 MiB
....................................................................................................
llama_new_context_with_model: n_ctx      = 4096
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: KV self size  = 2048.00 MiB, K (f16): 1024.00 MiB, V (f16): 1024.00 MiB
llama_build_graph: non-view tensors processed: 676/676
llama_new_context_with_model: compute buffer total size = 75.19 MiB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
Model {'base_model': 'llama', 'base_model0': 'llama', 'tokenizer_base_model': '', 'lora_weights': '', 'inference_server': '', 'prompt_type': 'llama2', 'prompt_dict': {'promptA': '', 'promptB': '', 'PreInstruct': "<s>[INST] <<SYS>>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\n<</SYS>>\n\n", 'PreInput': None, 'PreResponse': '[/INST]', 'terminate_response': ['[INST]', '</s>'], 'chat_sep': ' ', 'chat_turn_sep': ' </s>', 'humanstr': '[INST]', 'botstr': '[/INST]', 'generates_leading_space': False, 'system_prompt': "You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.", 'can_handle_system_prompt': True}, 'visible_models': None, 'h2ogpt_key': None, 'load_8bit': 'pause', 'load_4bit': True, 'low_bit_mode': 1, 'load_half': True, 'use_flash_attention_2': False, 'load_gptq': '', 'load_awq': '', 'load_exllama': False, 'use_safetensors': False, 'revision': None, 'use_gpu_id': True, 'gpu_id': 0, 'compile_model': None, 'use_cache': None, 'llamacpp_dict': {'n_gpu_layers': 5, 'n_gqa': 0, 'model_path_llama': 'C:\\Users\\Public\\git\\h2ogpt\\llamacpp_path\\llama-2-7b-chat.Q6_K.gguf', 'model_name_gptj': '', 'model_name_gpt4all_llama': '', 'model_name_exllama_if_no_config': '', 'n_batch': 128}, 'rope_scaling': {}, 'max_seq_len': 4096, 'max_output_seq_len': None, 'exllama_dict': {}, 'gptq_dict': {}, 'attention_sinks': False, 'sink_dict': {}, 'truncation_generation': False, 'hf_model_dict': {}}
Begin auto-detect HF cache text generation models
C:\Users\Public\pyenv-win\pyenv-win\bin\.h2o\lib\site-packages\huggingface_hub\file_download.py:1142: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
No loading model openai/whisper-base.en because is_encoder_decoder=True
No loading model microsoft/speecht5_hifigan because The checkpoint you are trying to load has model type `hifigan` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
No loading model microsoft/speecht5_tts because is_encoder_decoder=True
End auto-detect HF cache text generation models
Begin auto-detect llama.cpp models
End auto-detect llama.cpp models
C:\Users\Public\git\h2ogpt\gradio_utils\prompt_form.py:211: GradioUnusedKwargWarning: You have unused kwarg parameters in Chatbot, please remove them: {'likeable': True}
  text_output = gr.Chatbot(label=output_label0,
C:\Users\Public\git\h2ogpt\gradio_utils\prompt_form.py:216: GradioUnusedKwargWarning: You have unused kwarg parameters in Chatbot, please remove them: {'likeable': True}
  text_output2 = gr.Chatbot(label=output_label0_model2,
Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.
Started Gradio Server and/or GUI: server_name: localhost port: None
Use local URL: http://localhost:7860/
C:\Users\Public\pyenv-win\pyenv-win\bin\.h2o\lib\site-packages\pydantic\_internal\_fields.py:132: UserWarning: Field "model_name" in ModelInfoResponse has conflict with protected namespace "model_".

You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
  warnings.warn(
C:\Users\Public\pyenv-win\pyenv-win\bin\.h2o\lib\site-packages\pydantic\_internal\_fields.py:132: UserWarning: Field "model_names" in ModelListResponse has conflict with protected namespace "model_".

You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
  warnings.warn(
OpenAI API URL: http://0.0.0.0:5000
OpenAI API key: EMPTY

pseudotensor · 2024-11-07T20:44:30Z

The logs don't mention using the GPU, which is probably why it's slow. Something wrong with the llamacpp_python installation.

mbbutt · 2024-11-08T10:35:17Z

Hello
Thanks for the response.
i can see in pip list, both lama_cpp (for CPU) and llama_cpp (for cuda) are installed
could this be the reason?

llama_cpp_python                         0.2.26+cpuavx2
llama_cpp_python_cuda                    0.2.26+cu121avx

should i uninstall or get some different version?

this is the issue

Failed to listen to n_gpus: Failed to load shared library
'C:\Users\Public\pyenv-win\pyenv-win\bin\.h2o\lib\site-packages\llama_cpp_cuda\llama.dll': Could not find module
'C:\Users\Public\pyenv-win\pyenv-win\bin\.h2o\lib\site-packages\llama_cpp_cuda\llama.dll' (or one of its dependencies). Try using the full path with constructor syntax.

mbbutt · 2024-11-08T19:19:09Z

now I did the new installation
but it failed on loading the model

with

Using Model h2oai/h2ogpt-4096-llama2-7b-chat
git failed to run: [WinError 2] The system cannot find the file specified
USER_AGENT environment variable not set, consider setting it to identify your requests.
Windows fatal exception: code 0xc0000139

below is the complete log error

(.myh2o) C:\Users\Public\h2ogpt_Nov24>python generate.py --base_model=h2oai/h2ogpt-4096-llama2-7b-chat --score_model=None --prompt_type=human_bot --cli=True

-----

WeasyPrint could not import some external libraries. Please carefully follow the installation steps before reporting an issue:
https://doc.courtbouillon.org/weasyprint/stable/first_steps.html#installation
https://doc.courtbouillon.org/weasyprint/stable/first_steps.html#troubleshooting

-----

C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\pydub\utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
  warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)

-----

WeasyPrint could not import some external libraries. Please carefully follow the installation steps before reporting an issue:
https://doc.courtbouillon.org/weasyprint/stable/first_steps.html#installation
https://doc.courtbouillon.org/weasyprint/stable/first_steps.html#troubleshooting

-----

Using Model h2oai/h2ogpt-4096-llama2-7b-chat
git failed to run: [WinError 2] The system cannot find the file specified
USER_AGENT environment variable not set, consider setting it to identify your requests.
Windows fatal exception: code 0xc0000139

Current thread 0x00002ea8 (most recent call first):
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 1176 in create_module
  File "<frozen importlib._bootstrap>", line 571 in module_from_spec
  File "<frozen importlib._bootstrap>", line 674 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\awq\modules\linear.py", line 4 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 883 in exec_module
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\awq\models\base.py", line 16 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 883 in exec_module
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\awq\models\mpt.py", line 1 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 883 in exec_module
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\awq\models\__init__.py", line 1 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 883 in exec_module
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 992 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\awq\__init__.py", line 2 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 883 in exec_module
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 992 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 992 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\peft\tuners\lora\awq.py", line 26 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 883 in exec_module
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\peft\tuners\lora\model.py", line 50 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 883 in exec_module
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\peft\tuners\lora\__init__.py", line 20 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 883 in exec_module
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\peft\tuners\__init__.py", line 21 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 883 in exec_module
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 992 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 992 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\peft\mapping.py", line 22 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 883 in exec_module
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\peft\auto.py", line 32 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 883 in exec_module
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\peft\__init__.py", line 22 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 883 in exec_module
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\transformers\trainer.py", line 226 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 883 in exec_module
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "<frozen importlib._bootstrap>", line 1050 in _gcd_import
  ...
Traceback (most recent call last):
  File "C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\transformers\utils\import_utils.py", line 1778, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\lib\importlib\__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\transformers\trainer.py", line 226, in <module>
    from peft import PeftModel
  File "C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\peft\__init__.py", line 22, in <module>
    from .auto import (
  File "C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\peft\auto.py", line 32, in <module>
    from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING
  File "C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\peft\mapping.py", line 22, in <module>
    from peft.tuners.xlora.model import XLoraModel
  File "C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\peft\tuners\__init__.py", line 21, in <module>
    from .lora import LoraConfig, LoraModel, LoftQConfig, LoraRuntimeConfig
  File "C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\peft\tuners\lora\__init__.py", line 20, in <module>
    from .model import LoraModel
  File "C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\peft\tuners\lora\model.py", line 50, in <module>
    from .awq import dispatch_awq
  File "C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\peft\tuners\lora\awq.py", line 26, in <module>
    from awq.modules.linear import WQLinear_GEMM
  File "C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\awq\__init__.py", line 2, in <module>
    from awq.models.auto import AutoAWQForCausalLM
  File "C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\awq\models\__init__.py", line 1, in <module>
    from .mpt import MptAWQForCausalLM
  File "C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\awq\models\mpt.py", line 1, in <module>
    from .base import BaseAWQForCausalLM
  File "C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\awq\models\base.py", line 16, in <module>
    from awq.modules.linear import WQLinear_GEMM, WQLinear_GEMV
  File "C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\awq\modules\linear.py", line 4, in <module>
    import awq_inference_engine  # with CUDA kernels
ImportError: DLL load failed while importing awq_inference_engine: The specified procedure could not be found.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\Public\h2ogpt_Nov24\generate.py", line 20, in <module>
    entrypoint_main()
  File "C:\Users\Public\h2ogpt_Nov24\generate.py", line 16, in entrypoint_main
    H2O_Fire(main)
  File "C:\Users\Public\h2ogpt_Nov24\src\utils.py", line 79, in H2O_Fire
    fire.Fire(component=component, command=args)
  File "C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\fire\core.py", line 135, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\fire\core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\fire\core.py", line 684, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "C:\Users\Public\h2ogpt_Nov24\src\gen.py", line 2060, in main
    model=get_embedding(use_openai_embedding, hf_embedding_model=hf_embedding_model,
  File "C:\Users\Public\h2ogpt_Nov24\src\gpt_langchain.py", line 552, in get_embedding
    embedding = HuggingFaceBgeEmbeddings(model_name=hf_embedding_model,
  File "C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\langchain_community\embeddings\huggingface.py", line 287, in __init__
    import sentence_transformers
  File "C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\sentence_transformers\__init__.py", line 18, in <module>
    from sentence_transformers.trainer import SentenceTransformerTrainer
  File "C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\sentence_transformers\trainer.py", line 12, in <module>
    from transformers import EvalPrediction, PreTrainedTokenizerBase, Trainer, TrainerCallback
  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
  File "C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\transformers\utils\import_utils.py", line 1766, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "C:\Users\Public\h2ogpt_Nov24\.myh2o\lib\site-packages\transformers\utils\import_utils.py", line 1780, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.trainer because of the following error (look up to see its traceback):
DLL load failed while importing awq_inference_engine: The specified procedure could not be found.

(.myh2o) C:\Users\Public\h2ogpt_Nov24>

mbbutt · 2024-11-12T21:48:12Z

some how it is working when reinstall VS builder tool.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

taking long time to give response (around 2 min) #1896

taking long time to give response (around 2 min) #1896

mbbutt commented Nov 7, 2024

mbbutt commented Nov 7, 2024

pseudotensor commented Nov 7, 2024

mbbutt commented Nov 8, 2024 •

edited

Loading

mbbutt commented Nov 8, 2024

mbbutt commented Nov 12, 2024

taking long time to give response (around 2 min) #1896

taking long time to give response (around 2 min) #1896

Comments

mbbutt commented Nov 7, 2024

mbbutt commented Nov 7, 2024

pseudotensor commented Nov 7, 2024

mbbutt commented Nov 8, 2024 • edited Loading

mbbutt commented Nov 8, 2024

mbbutt commented Nov 12, 2024

mbbutt commented Nov 8, 2024 •

edited

Loading