-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refine/llm api op unittest #528
Open
BeachWang
wants to merge
178
commits into
main
Choose a base branch
from
refine/llm_api_op_unittest
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
178 commits
Select commit
Hold shift + click to select a range
63d430a
add api call
drcege 6720da4
add call_api ops
drcege 8daa6e1
clean
drcege ef11951
minor update
drcege 5597d5c
more tests
drcege 4b6e769
update tests
drcege 835be22
Merge branch 'main' into dev/api_model
drcege 325a753
update prompts
drcege 4f04bdd
fix unittest
drcege 0adbdcd
update tests
drcege 0aa4069
add docs
drcege f007532
minor fix
drcege 9aa7390
Merge branch 'main' into dev/api_model
drcege ee4f461
add API processor
drcege 9bbfe47
Merge branch 'main' into dev/api_model
drcege b00b182
refine API processor
drcege b718de7
refine
drcege 6d1d433
chunk and extract events
BeachWang 4d1670f
fix bugs
drcege 9e11aa3
fix tests
drcege cc40fc0
extract attribute
BeachWang 4c262ad
Merge branch 'dev/api_model' of github.com:alibaba/data-juicer into d…
BeachWang 347bc0f
refine tests
drcege c9d5051
extract nickname
BeachWang 8a128ca
Merge branch 'dev/api_model' of github.com:alibaba/data-juicer into d…
BeachWang 9262777
nickname test done
BeachWang 58fc020
merge main
BeachWang c7dc28e
lightRAG to OP
BeachWang 238869e
merge main
BeachWang 0e51a43
doc done
BeachWang 6d9d8a5
remove extra test
BeachWang a637a64
relavant -> relevant
BeachWang 56e7988
fix minor error
BeachWang 03880b7
group by op done
BeachWang 23174fd
ValueError -> Exception
BeachWang e82cc06
merge main
BeachWang 20a8dee
fix config_all error
BeachWang 38a9511
fix prepare_api_model
BeachWang 35f0eb3
fix rank sample None
BeachWang 155d3dd
constant fix key
BeachWang f862897
aggregator op
BeachWang 2d4da5e
merge llm_info_extract
BeachWang 7e66057
init python_lambda_mapper
drcege a61859b
set default arg
drcege 8031a31
fix init
drcege 67711f9
add python_file_mapper
drcege cdeb692
support text & most relavant entities
BeachWang 125a8f3
coverage ignore_errors
drcege 0c68089
index sample
BeachWang 651789d
role_playing_system_prompt_yaml
BeachWang c5d7b9e
merge python_file_mapper
BeachWang cf6a53a
Merge branch 'main' of github.com:alibaba/data-juicer into dev/group_…
BeachWang 222790e
system_prompt begin
BeachWang 75f2911
support batched
drcege 11fa852
remove unforkable
BeachWang 4af2bfb
support batched & add docs
drcege 8867580
Merge branch 'main' into op/python_lambda
drcege 553d5ad
add docs
drcege 470ca19
fix docs
drcege 399a238
update docs
drcege 706365f
Merge branch 'main' into op/python_file
drcege 115ab9a
pre-commit done
BeachWang ecb8635
fix batch bug
BeachWang 03e3469
fix batch bug
BeachWang 1788fa6
merge fix_batch_bug
BeachWang 735ff4d
Merge branch 'main' of github.com:alibaba/data-juicer into debug/fix_…
BeachWang 00ff624
fix filter batch
BeachWang 8601519
fix filter batch
BeachWang eeefcab
system prompt recipe done
BeachWang 6eaa50c
Merge branch 'main' of github.com:alibaba/data-juicer into dev/group_…
BeachWang 1575717
not rank for filter
BeachWang 2c5c4a1
limit pyav version
BeachWang 5c96dd5
Merge branch 'debug/fix_batch_bug' of github.com:alibaba/data-juicer …
BeachWang 49be467
add test for op
BeachWang 9ab02fe
tmp
BeachWang ba086de
tmp
BeachWang f712131
doc done
BeachWang 12b7616
Merge branch 'op/python_lambda' of github.com:alibaba/data-juicer int…
BeachWang e57b64a
merge python_lambda
BeachWang 5f463cd
merge python_lambda
BeachWang a786070
skip api test
BeachWang 73f4e77
merge main
BeachWang 4b6f0b9
merge main
BeachWang 788a212
add env dependency
BeachWang 10242c4
install by recipe
BeachWang 6a43eec
dialog sent intensity
BeachWang 621a693
add query
BeachWang b46d105
change to dj_install
BeachWang a0da444
change to dj_install
BeachWang 02f8dda
developer doc done
BeachWang 635a8a9
merge dj_install
BeachWang 083b665
+ add auto mode for analyzer: load all filters that produce stats to …
HYLcool 662df5e
+ add default mem_required for those model-based OPs
HYLcool 3b04908
query sent_int mapper
BeachWang 6b4d525
query sentiment test done
BeachWang 926c3da
- support wordcloud drawing for str or str list fields in stats
HYLcool 27347c0
- take the minimum one of dataset length and auto num
HYLcool d19f92f
* update default export path
HYLcool fbd6726
* set version limit for wandb to avoid exception
HYLcool 58288f7
change meta pass
BeachWang 9f9f85b
+ add docs for auto mode
HYLcool b665c10
doc done
BeachWang 07be552
merge main
BeachWang 8ba4156
sentiment detection
BeachWang 48b1761
diff label
BeachWang 8160725
sentiment
BeachWang 01846d1
test done
BeachWang 566eb5b
+ support t-test for Measure
HYLcool 7b8ee5c
* fix some bugs
HYLcool a76d975
dialog intent label
BeachWang 2fb9fe4
fix typo
BeachWang 324467f
prompt adjust
BeachWang 4a3ad39
add more test
BeachWang 937b3f1
query intent detection
BeachWang d4ca87b
for test
BeachWang 8109c71
for test
BeachWang c749dcd
change model
BeachWang c7df0bc
fix typo
BeachWang c7662cb
fix typo
BeachWang 6f44ec0
for test
BeachWang 9b6652d
for test
BeachWang fa306dc
doc done
BeachWang 601d9a2
- support analyze a dataset object
HYLcool 34f2ab6
- support analysis on tags in meta
HYLcool 8531a01
- support analysis with tagging OPs
HYLcool 4d6b701
- move tags into the meta field
HYLcool 767b2f0
dialog topic detection
BeachWang c088cb1
dialog topic detection
BeachWang 12351db
dialog topic detection
BeachWang 4b4e946
dialog topic detection
BeachWang 4506a8e
dialog topic detection
BeachWang d21db85
dialog topic detection
BeachWang 6f394ee
query topic detection
BeachWang abee815
query topic detection
BeachWang 0494741
query topic detection
BeachWang 38523a1
query topic detection
BeachWang b03a33a
query topic detection
BeachWang 35aa6bd
- do not tell tags using their suffix
HYLcool ad226b1
doc done
BeachWang 85e1392
- add insight mining
HYLcool b02745b
meta tags aggregator
BeachWang f2654f1
meta tags aggregator
BeachWang 23e5d6f
meta tags aggregator
BeachWang 1c74709
meta tags aggregator
BeachWang a997726
meta tags aggregator
BeachWang 2642847
meta tags aggregator
BeachWang 2dae3b8
meta tags aggregator
BeachWang 8bb2509
meta tags aggregator
BeachWang 90303ee
meta tags aggregator
BeachWang e4c6ff1
meta tags aggregator
BeachWang 12f8946
meta tags aggregator
BeachWang 09b1599
meta tags aggregator
BeachWang 203bc64
naive reverse grouper
BeachWang cf01e7e
naive reverse grouper
BeachWang e3d7b8b
* resolve the bugs when running insight mining in multiprocessing mode
HYLcool 3ca9994
Merge branch 'main' into feat/insight_mining
HYLcool 16ca358
* update unittests
HYLcool dfb0bca
* update unittests
HYLcool f8b9539
* update unittests
HYLcool 0ba6459
tags specified field
BeachWang 45259e5
* update readme for analyzer
HYLcool 174ee05
Merge branch 'main' into feat/insight_mining
HYLcool 4ad8b8d
merge main
BeachWang 9f098bd
doc done
BeachWang 51f53dc
* use more detailed key
HYLcool 58001ca
+ add reference
HYLcool 892cb48
Merge branch 'feat/insight_mining' of github.com:alibaba/data-juicer …
BeachWang 19fd15b
move mm tags
BeachWang 8fec0f7
move meta key
BeachWang 6fdc95b
done
BeachWang 8e01f7e
merge main
BeachWang af9e14d
test done
BeachWang f57f454
rm nested set
BeachWang 4188150
enable op error for unittest
BeachWang fad48f5
merge main
BeachWang e6f4564
enhance api unittest
BeachWang a572f5a
merge main
BeachWang 97f4642
expose skip_op_error
BeachWang File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -219,8 +219,13 @@ def init_configs(args: Optional[List[str]] = None, which_entry: object = None): | |
'--turbo', | ||
type=bool, | ||
default=False, | ||
help='Enable Turbo mode to maximize processing speed. Stability ' | ||
'features like fault tolerance will be disabled.') | ||
help='Enable Turbo mode to maximize processing speed when batch size ' | ||
'is 1.') | ||
parser.add_argument( | ||
'--skip_op_error', | ||
type=bool, | ||
default=True, | ||
help='Skip errors in OPs caused by unexpected unvalid samples.') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same typo |
||
parser.add_argument( | ||
'--use_cache', | ||
type=bool, | ||
|
@@ -550,6 +555,7 @@ def init_setup_from_cfg(cfg: Namespace): | |
'video_key': cfg.video_key, | ||
'num_proc': cfg.np, | ||
'turbo': cfg.turbo, | ||
'skip_op_error': cfg.skip_op_error, | ||
'work_dir': cfg.work_dir, | ||
} | ||
cfg.process = update_op_attr(cfg.process, op_attrs) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unvalid --> invalid