huggingface · Vaibhavs10 · Feb 3, 2025 · Jan 20, 2025
diff --git a/chapters/en/chapter5/4.mdx b/chapters/en/chapter5/4.mdx
@@ -87,13 +87,13 @@ RAM used: 5678.33 MB
 Here the `rss` attribute refers to the _resident set size_, which is the fraction of memory that a process occupies in RAM. This measurement also includes the memory used by the Python interpreter and the libraries we've loaded, so the actual amount of memory used to load the dataset is a bit smaller. For comparison, let's see how large the dataset is on disk, using the `dataset_size` attribute. Since the result is expressed in bytes like before, we need to manually convert it to gigabytes:
 
 ```py
-print(f"Number of files in dataset : {pubmed_dataset.dataset_size}")
+print(f"Dataset size in bytes: {pubmed_dataset.dataset_size}")
 size_gb = pubmed_dataset.dataset_size / (1024**3)
 print(f"Dataset size (cache file) : {size_gb:.2f} GB")
 ```
 
 ```python out
-Number of files in dataset : 20979437051
+Dataset size in bytes : 20979437051
 Dataset size (cache file) : 19.54 GB
 ```