You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Exception: 'TypeError("\'WordCountFilter\' object is not callable")'
Expected behavior
Since all the filters are implemented as a children class of DocumentFilter now (expect for bitext filter), the Score module and Filter module should be consistent with ScoreFilter module and take filter_obj: DocumentFilter as input instead of filter_fn (Callable).
Environment overview (please complete the following information)
Environment location: local dev env
Method of NeMo-Curator install: pip install --extra-index-url https://pypi.nvidia.com nemo-curator[cuda12x]
If method of install is [Docker], provide docker pull & docker run commands used: not applied
Environment details
If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:
OS version: Ubuntu 24.04
Dask version: 2024.9.0
Python version: Python 3.10.16 (main, Jan 13 2025, 16:25:23) [GCC 13.3.0] on linux
Additional context
None
The text was updated successfully, but these errors were encountered:
Describe the bug
The
Score
module andFilter
module are not behaving the same asScoreFilter
module. They are here: https://github.com/NVIDIA/NeMo-Curator/blob/main/nemo_curator/modules/filter.pySteps/Code to reproduce bug
from nemo_curator import Score
from nemo_curator.filters import WordCountFilter
"""
Load your dataset as
dataset
"""
filter = Score(
WordCountFilter(min_words=80, max_words=200_000),
score_field="word_count",
text_field="text",
score_type=int,
)
filtered_dataset = filter(dataset)
Error message:
Expected behavior
Since all the filters are implemented as a children class of
DocumentFilter
now (expect for bitext filter), theScore
module andFilter
module should be consistent withScoreFilter
module and takefilter_obj: DocumentFilter
as input instead offilter_fn (Callable)
.Environment overview (please complete the following information)
docker pull
&docker run
commands used: not appliedEnvironment details
If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:
Additional context
None
The text was updated successfully, but these errors were encountered: