Description
Describe the bug
I am partitioning and then chunking an html file. The HTML has 12357 chars including spaces., but even with very large values for
max_characters, combine_text_under_n_chars and new_after_n_chars it still gives me 9 chunks.
To Reproduce
Provide a code snippet that reproduces the issue. Use an HTML based document, roughly 1 page in length with numerous titles, then partition and chunk as follows.
elements = partition_html(file=bytes_io)
chunks = chunk_by_title(elements, multipage_sections=True, new_after_n_chars=100000, combine_text_under_n_chars=75000, max_characters=100000)
Expected behavior
Given the parameter values I would expect a single chunk. This is just exploratory to understand an issue. In the real scenario I wouldn't use these values or desire a single chunk.
Screenshots
If applicable, add screenshots to help explain your problem.
Environment Info
Please run python scripts/collect_env.py
and paste the output here.
This will help us understand more about the environment in which the bug occurred.
Additional context
Add any other context about the problem here.
We are using version 0.12.6