Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine/llm api op unittest #528

Open
wants to merge 178 commits into
base: main
Choose a base branch
from
Open

Refine/llm api op unittest #528

wants to merge 178 commits into from

Conversation

BeachWang
Copy link
Collaborator

  1. Add global skip_op_error param to enable fault-tolerant when execute DataJuicer analyzer and executor, but disable fault-tolerant for unit test.
  2. Enhance unit test for API calling OPs.

@BeachWang BeachWang added enhancement New feature or request dj:core issues/PRs about the core functions of Data-Juicer labels Jan 3, 2025
@BeachWang BeachWang requested review from HYLcool and yxdyc January 3, 2025 11:18
@BeachWang BeachWang self-assigned this Jan 3, 2025
@@ -13,6 +13,8 @@ np: 4 # number of subproce
text_keys: 'text' # the key name of field where the sample texts to be processed, e.g., `text`, `instruction`, `output`, ...
# Note: currently, we support specify only ONE key for each op, for cases requiring multiple keys, users can specify the op multiple times. We will only use the first key of `text_keys` when you set multiple keys.
suffixes: [] # the suffix of files that will be read. For example: '.txt', 'txt' or ['txt', '.pdf', 'docx']
turbo: false # Enable Turbo mode to maximize processing speed when batch size is 1.
skip_op_error: true # Skip errors in OPs caused by unexpected unvalid samples.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unvalid --> invalid

'--skip_op_error',
type=bool,
default=True,
help='Skip errors in OPs caused by unexpected unvalid samples.')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same typo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dj:core issues/PRs about the core functions of Data-Juicer enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants