You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I encountered an issue where the JSON output gets truncated partway when running sensemaking on a file that contains long Japanese text. Here is a snippet of the truncated output:
It appears that categorizationBatchSize is currently fixed at 100, which might be causing the model to exceed its output token limit, especially for languages like Japanese that consume more tokens or for comments that are very long.
Proposed Solution
It would be helpful if categorizationBatchSize could be passed as a parameter upon invocation, so users can adjust it according to their language or the size of their dataset. This way, we can avoid hitting the model’s output token limit and prevent truncated JSON outputs.
Would it be possible to make categorizationBatchSize configurable? If you have any suggestions or alternative approaches, I'd be happy to hear them. Thank you in advance!
The text was updated successfully, but these errors were encountered:
I encountered an issue where the JSON output gets truncated partway when running sensemaking on a file that contains long Japanese text. Here is a snippet of the truncated output:
It appears that
categorizationBatchSize
is currently fixed at 100, which might be causing the model to exceed its output token limit, especially for languages like Japanese that consume more tokens or for comments that are very long.Proposed Solution
It would be helpful if
categorizationBatchSize
could be passed as a parameter upon invocation, so users can adjust it according to their language or the size of their dataset. This way, we can avoid hitting the model’s output token limit and prevent truncated JSON outputs.Would it be possible to make
categorizationBatchSize
configurable? If you have any suggestions or alternative approaches, I'd be happy to hear them. Thank you in advance!The text was updated successfully, but these errors were encountered: