Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BigEarthNet Version2 #2531

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open

Add BigEarthNet Version2 #2531

wants to merge 18 commits into from

Conversation

nilsleh
Copy link
Collaborator

@nilsleh nilsleh commented Jan 24, 2025

Superseeds #2371 after talking to Ando.

After taking a look, the new version comes with a metadata.parquet file, which makes data handling quiet a bit more straightforward. With a version=2 argument, I felt like there would be many nested if statements and therefore, cleaner to do it this way. If there was a similar metadata.parquet file for V1, then this could be made more condensed.

example_benv2

@nilsleh nilsleh marked this pull request as draft January 24, 2025 19:00
@nilsleh nilsleh added this to the 0.7.0 milestone Jan 24, 2025
@github-actions github-actions bot added documentation Improvements or additions to documentation datasets Geospatial or benchmark datasets testing Continuous integration testing labels Jan 24, 2025
@github-actions github-actions bot added the dependencies Packaging and dependencies label Jan 27, 2025
@@ -495,7 +504,7 @@ def _download(self, url: str, filename: Path, md5: str) -> None:
filename: output filename to write downloaded file
md5: md5 of downloaded file
"""
if not os.path.exists(filename):
if not os.path.exists(os.path.join(self.root, filename)):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saw this, and actually I think this is required, because it should check whether the file exists in root already right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Actually, this line isn't necessary because we already check if the zipfile exists and extract it before we reach the download. Can you open a separate PR to fix this so we can backport it to 0.6.3?

Returns:
the target label
"""
indices = self.metadata_df.iloc[index]['labels']
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need to check the labels. In the V1 class it only seems possible to select the 19 label versions, because with selecting 43 they also get mapped to 19 if I understand correctly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Version2 it appears that there are only 19 labels in the parquet file, ran:

unique_labels = self.metadata_df['labels'].explode().unique().tolist()

to get these unique labels:

['Broad-leaved forest', 'Coniferous forest', 'Inland waters', 'Mixed forest', 'Pastures', 'Urban fabric', 'Arable land', 'Industrial or commercial units', 'Land principally occupied by agriculture, with significant areas of natural vegetation', 'Complex cultivation patterns', 'Transitional woodland, shrub', 'Inland wetlands', 'Natural grassland and sparsely vegetated areas', 'Moors, heathland and sclerophyllous vegetation', 'Marine waters', 'Coastal wetlands', 'Beaches, dunes, sands', 'Permanent crops', 'Agro-forestry areas']

So for V2 will remove the option to specify 43 classes.

@nilsleh nilsleh marked this pull request as ready for review January 27, 2025 18:49
@nilsleh nilsleh marked this pull request as draft January 27, 2025 19:11
@nilsleh nilsleh marked this pull request as ready for review January 28, 2025 16:16
@nilsleh
Copy link
Collaborator Author

nilsleh commented Jan 28, 2025

@ando-shah in case you wanna have a look, since you have experience with the dataset already and find anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets dependencies Packaging and dependencies documentation Improvements or additions to documentation testing Continuous integration testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants