This is my favorite and most used tool #735
clearsitedesigns
started this conversation in
Show and tell
Replies: 1 comment
-
@clearsitedesigns this looks really useful. We can integrate it to the main code. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all I have to fork off the main repo. I use this tool a lot in my research, and building some complexity. Since I am ingesting a ton of stuff I wrote a custom_ingest.py that gives me a bit more insight into what is going on and controls the token ingesting a bit more. Including, telling me what files are going, and giving a count of how much time is left. If there is interest I could merge this. It's not perfect. I usually throw a hundred files or so at a time, but if they are large PDF's that can be problematic.
For example,
I used to sit here and wonder how many hours something was going to take now I can see how long it will take to ingest these 195 documents, at least a better idea of how far done we are. I did have to update matlib + charset and a few other libs to get this to work.
1000
2024-02-05 16:14:26,396 - WARNING - text_splitter.py:176 - Created a chunk of size 1713, which is longer than the specified 1000
2024-02-05 16:14:26,970 - INFO - SentenceTransformer.py:66 - Load pre trained SentenceTransformer: hkunlp/instructor-large
load INSTRUCTOR_Transformer
max_seq_length 512
2024-02-05 16:14:34,784 - INFO - custom_ingest.py:151 - Ingested batch 1/195, 0.51% complete
2024-02-05 16:14:40,216 - INFO - custom_ingest.py:151 - Ingested batch 2/195, 1.03% complete
2024-02-05 16:14:45,999 - INFO - custom_ingest.py:151 - Ingested batch 3/195, 1.54% complete
2024-02-05 16:14:52,932 - INFO - custom_ingest.py:151 - Ingested batch 4/195, 2.05% complete
2024-02-05 16:14:58,689 - INFO - custom_ingest.py:151 - Ingested batch 5/195, 2.56% complete
2024-02-05 16:15:04,510 - INFO - custom_ingest.py:151 - Ingested batch 6/195, 3.08% complete
2024-02-05 16:15:10,455 - INFO - custom_ingest.py:151 - Ingested batch 7/195, 3.59% complete
2024-02-05 16:15:16,187 - INFO - custom_ingest.py:151 - Ingested batch 8/195, 4.10% complete
2024-02-05 16:15:21,968 - INFO - custom_ingest.py:151 - Ingested batch 9/195, 4.62% complete
2024-02-05 16:15:28,333 - INFO - custom_ingest.py:151 - Ingested batch 10/195, 5.13% complete
2024-02-05 16:15:33,562 - INFO - custom_ingest.py:151 - Ingested batch 11/195, 5.64% complete
2024-02-05 16:17:34,903 - INFO - custom_ingest.py:151 - Ingested batch 31/195, 15.90% complete
Beta Was this translation helpful? Give feedback.
All reactions