bug/chunk_by_title disregarding combine_text_under_n_chars

**Describe the bug**
I am partitioning and then chunking an html file. The HTML has 12357 chars including spaces., but even with very large values for 
max_characters, combine_text_under_n_chars and new_after_n_chars it still gives me 9 chunks.

**To Reproduce**
Provide a code snippet that reproduces the issue. Use an HTML based document, roughly 1 page in length with numerous titles, then partition and chunk as follows.

elements = partition_html(file=bytes_io) 
chunks = chunk_by_title(elements, multipage_sections=True, new_after_n_chars=100000, combine_text_under_n_chars=75000, max_characters=100000)   

**Expected behavior**
Given the parameter values I would expect a single chunk. This is just exploratory to understand an issue. In the real scenario I wouldn't use these  values or desire a single chunk.

**Screenshots**
If applicable, add screenshots to help explain your problem.
![image](https://github.com/Unstructured-IO/unstructured/assets/32558947/5ef0fe45-5d01-4ca4-838a-d3f335796ea0)

**Environment Info**
Please run `python scripts/collect_env.py` and paste the output here. 
This will help us understand more about the environment in which the bug occurred.

**Additional context**
Add any other context about the problem here.
We are using version 0.12.6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bug/chunk_by_title disregarding combine_text_under_n_chars #2699

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug/chunk_by_title disregarding combine_text_under_n_chars #2699

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions