Releases: plantnet/malpolon
v2.1.2
Main changes
- Updated all examples to fix inference path issue. Previously, PyTorch Lightning overwrote checkpoint_paths with the values contained in the saved checkpoint files. However, those provided by the Malpolon team for pure inference purposes, contained absolute path incompatible with other people's machines. Now, only relative paths are stored.
- Updated all examples to prevent downloading the model weights twice when running models in inference mode
- Updated URL and md5 checksum signature to download
glc24_pre_extracted
pre-trained weights
Other changes
- Added Malpolon QR code in project resources
v2.1.1
v2.1.0
What's changed
Main changes
- Added possibility for users to choose their optimizer and scheduler via their config file:
malpolon.models.utils
: Changed behavior ofcheck_optimizer()
and addedcheck_scheduler()
to allow users to input one or several optimizers (and optionally 1 scheduler per optimizer, possibly with a lr_scheduler_config descriptor) via their config files.malpolon.models.standard_prediction_systems
: changed instantiation of optimizer(s) and scheduler(s) in classGenericPredictionSystem
. The class attributes are now lists of instantiated optimizers (respectively, oflr_scheduler_config dictionaries
). Updated behavior of methodconfigure_optimizers()
to return a dictionary containing all the optimizers and schedulers (cf. https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.core.LightningModule.html#lightning.pytorch.core.LightningModule.configure_optimizers).- Updated all examples and added all corresponding unit tests, testing both valid scenarios and edge cases of incorrect user inputs in the config file.
Others
- In glc24_pre_extracted example: added habitat version dataset which consists of symbolic links to the species version. Running the habitat's main script will trigger the download of the data predictors (rasters, satellite, time-series).
- Updated split_obs_per_species_frequency() to include more input arguments
v2.0.0
What's Changed
Main changes
-
Added GLC24 pre_extracted habitat dataset and example (see PR 58 in the Links section)
-
Changed the way checkpoints are loaded from loading the
state_dict
of the model object to loading thestate_dict
of the LightningModule. This is a breaking change as examples needed to be updated by removing the replacement of "model." string in the loaded state_dict. -
Added possibility to download model weights for any Malpolon model given a URL and a few file paths
-
Updated the way checkpoint_path is passed on to models. Added an attribute checkpoint_path for all Malpolon models
- Updated every examples consequently
-
Added Malpolon as (local) model provider.
- Created new module
malpolon.models.custom_models
which will host custom models proposed by Malpolon - Split classes from
geolifeclef2024_multimodal_ensemble.py
to glc2024_multimodal_ensemble_model.py and glc2024_pre_extracted_prediction_system.py in custom_models to prevent circular import from malpolon.models.model_builder after adding Malpolon as (local) provider
- Created new module
Others
-
Updated
malpolon.data.data_module.export_predict_csv
to enable more flexibility when outputting the prediction CSV for a single data point. -
Added GLC24 pre-extracted examples (habitat and species) using the MultiModalEnsemble (MME) model
- Automatic download of the dataset from Kaggle (depending on the value of boolean config parameter
data.download_data
) - Automatic download of the model weights from Seafile if not already on disk, via a new
model.model_kwargs.pretrained
key in the config file. The weights enable users to directly run our MME model on our GLC24_pre_extracted Test set and reach ~30% micro F1-score with ~26% micro precision and ~36% micro Recall, as well as ~96% micro AuC.
- Automatic download of the dataset from Kaggle (depending on the value of boolean config parameter
-
Added and updated unit tests for GLC24 pre-extracted examples (habitat and species)
-
Added new content in online documentation and tutorial files
Full Changelog: v1.3.0...v2.0.0
v1.3.0
What's Changed
Main changes
-
Created new module
malpolon.models.custom_models
which will host custom models proposed by Malpolon- Split datamodule and model from
geolifeclef2024_multimodal_ensemble.py
toglc2024_multimodal_ensemble_model.py
andglc2024_pre_extracted_prediction_system.py
incustom_models
.
- Split datamodule and model from
-
Added
malpolon
as model provider. Currently we only provide MultiModalEnsemble (MME) model which can be called for in config files "model_name" key as:glc24_multimodal_ensemble
(see repositoryexamples/benchmark/geolifeclef/geolifeclef2024_pre_extracted/config/glc24_cnn_multimodal_ensemble.yaml
) -
Added possibility to download model weights for any Malpolon model given a URL and a few file paths via
malpolon.standard_prediction_system.download_weights
- Added model weight download info for the MME model. The example experiment file of MME now automatically downloads the weights from Seafile if not already on disk, via
model.model_kwargs.pretrained
key in the config file
- Added model weight download info for the MME model. The example experiment file of MME now automatically downloads the weights from Seafile if not already on disk, via
-
Updated the way
checkpoint_path
is passed on to models. Added an attributecheckpoint_path
for all Malpolon models- Updated every examples consequently
Others
-
MME: changed the way loss parameter
loss.pos_weight
is used in the model's_step()
method so that its state_dict object stays the same before and after running the model in train mode. -
GLC22 examples in
benchmark
andcustom_train
have been updated to include an inference run option. This led to changing the return values of the class getter for thetest
dataset. The class now always return a{data, label}
pair, withlabel
of value-1
fortest
dataset (inference run)- Updated
malpolon/tests/test_examples.py
accordingly
- Updated
v1.2.1
v1.2.0
New features
-
Datasets
- Added a new dataset
geolifeclef2024_pre_extracted
following 2024 edition of Kaggle challenge GeoLifeCLEF- Computed rolling
mean
and rollingstd
values of GeoLifeCLEF2024 dataset for each modality. These values are stored in this dataset's transform functions
- Computed rolling
- Added a new dataset
-
Models
- Added a new model "MultimodalEnsemble" in
geolifeclef2024_multimodal_ensemble
based on @picekl work on GeoLifeCLEF2024
- Added a new model "MultimodalEnsemble" in
-
Scripts
- Added new scripts
split_obs_spatially.py
,sort_files_glc_fashion.sh
split_obs_spatially.py
: splits a CSV observation dataset into a training and a val subset where val observation plots are spatially separated from training ones. This scripts uses newverde
package.sort_files_glc_fashion.sh
:This script re-organizes files in one folder into folders and sub-folders in the same way as for the GeoLifeCLEF challenge.
That is to say in the following manner.Each file is re-arranged in folders and sub-folders in the following way:
A file named 'ABCDWXYZ.pt' located at 'root_path/' will be moved to
'root_path/YZ/WX/ABCDWXYZ.pt'.Each file name must be at least 3 characters long. For instance:
A file named 'XYZ.pt' located at 'root_path/' will be moved to
'root_path/YZ/X/XYZ.pt'.split_obs_per_species_frequency
: splits a CSV observation dataset into a training and a val subset based on species frequency
- Added
split_obs_spatially.py
andsplit_obs_per_species_frequency.py
scripts to Malpolon as modules inmalpolon.data.utils
- Added new scripts
Changes
- Renamed
scripts
folder totoolbox
- Renamed scenarios from {"Ecologists", "Inference", "Kaggle"} to {"Custom_train", "Inference", "Benchmarks"} and re-organized experiments
- Fixed examples-related bugs, file links, duplicate files and cleaned config files
- Updated code documentation, repository READMEs and examples tutorial files
v1.1.0
New features
- New dataset
ConcatPatchRasterDataset
to handle both satellite image patches and geolocalized rasters in the same model- Added example using this new dataset
- Added standalone scripts
crop_rasters.py
: This script crops a window from raster files based on coordinates and outputs it as a new file.split_obs_per_species_frequency.py
: This script splits an obs csv in val/train based on the frequency of occurrences in the whole dataset. It does NOT perform a spatial split.split_obs_spatially.py
: This script splits an obs csv in val/train based on the observations' geographic locations using the Verde package.sort_files_glc_fashion.sh
:
This script re-organizes files in one folder into folders and sub-folders in the same way as for the GeoLifeCLEF challenge. That is to say in the following manner.
Each file is re-arranged in folders and sub-folders in the following way:
A file named 'ABCDWXYZ.pt' located at 'root_path/' will be moved to 'root_path/YZ/WX/ABCDWXYZ.pt'.
Each file name must be at least 3 characters long. For instance:
A file named 'XYZ.pt' located at 'root_path/' will be moved to 'root_path/YZ/X/XYZ.pt'.
- Added CIFAR-10 example
Changes
- Harmonized datasets class arguments and kwargs
- Reduced examples config files values redundancy by using variable interpolation
- Changed metric logging parameters for tensorboard logger to include more details
- Fixed multilabel inference export for test_dataset
v1.0.3
First release of Malpolon's framework.
Try it out now !
https://pypi.org/project/malpolon/
(Versions 1.0.0 to 1.0.2 do not exist)