Replies: 2 comments 1 reply
-
The following code snippets shows how one can download a single parquet file from the fine web dataset: import urllib.request
import shutil
url='https://huggingface.co/datasets/HuggingFaceFW/fineweb/resolve/main/data/CC-MAIN-2013-20/000_00000.parquet'
shutil.os.makedirs("input-folder", exist_ok=True)
urllib.request.urlretrieve(url, "input-folder/000_00000.parquet") |
Beta Was this translation helpful? Give feedback.
1 reply
-
Other methods could rely on HF API as follow: from huggingface_hub import hf_hub_download
import pandas as pd
REPO_ID = "wikimedia/wikipedia"
FILENAME = "20231101.en/train-00000-of-00041.parquet"
hf_hub_download(repo_id=REPO_ID, filename=FILENAME, repo_type="dataset") |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
When build a new recipe, it is often desirable to test the notebook with an existing data set from HuggingFace. What is the ease way to do that ?
Beta Was this translation helpful? Give feedback.
All reactions