-
Notifications
You must be signed in to change notification settings - Fork 388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BigEarthNet Version2 #2531
base: main
Are you sure you want to change the base?
Add BigEarthNet Version2 #2531
Conversation
torchgeo/datasets/bigearthnet.py
Outdated
@@ -495,7 +504,7 @@ def _download(self, url: str, filename: Path, md5: str) -> None: | |||
filename: output filename to write downloaded file | |||
md5: md5 of downloaded file | |||
""" | |||
if not os.path.exists(filename): | |||
if not os.path.exists(os.path.join(self.root, filename)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Saw this, and actually I think this is required, because it should check whether the file exists in root already right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Actually, this line isn't necessary because we already check if the zipfile exists and extract it before we reach the download. Can you open a separate PR to fix this so we can backport it to 0.6.3?
torchgeo/datasets/bigearthnet.py
Outdated
Returns: | ||
the target label | ||
""" | ||
indices = self.metadata_df.iloc[index]['labels'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also need to check the labels. In the V1 class it only seems possible to select the 19 label versions, because with selecting 43 they also get mapped to 19 if I understand correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Version2 it appears that there are only 19 labels in the parquet file, ran:
unique_labels = self.metadata_df['labels'].explode().unique().tolist()
to get these unique labels:
['Broad-leaved forest', 'Coniferous forest', 'Inland waters', 'Mixed forest', 'Pastures', 'Urban fabric', 'Arable land', 'Industrial or commercial units', 'Land principally occupied by agriculture, with significant areas of natural vegetation', 'Complex cultivation patterns', 'Transitional woodland, shrub', 'Inland wetlands', 'Natural grassland and sparsely vegetated areas', 'Moors, heathland and sclerophyllous vegetation', 'Marine waters', 'Coastal wetlands', 'Beaches, dunes, sands', 'Permanent crops', 'Agro-forestry areas']
So for V2 will remove the option to specify 43 classes.
@ando-shah in case you wanna have a look, since you have experience with the dataset already and find anything. |
Superseeds #2371 after talking to Ando.
After taking a look, the new version comes with a
metadata.parquet
file, which makes data handling quiet a bit more straightforward. With aversion=2
argument, I felt like there would be many nested if statements and therefore, cleaner to do it this way. If there was a similarmetadata.parquet
file for V1, then this could be made more condensed.