Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flood data for Colombia, Nigeria, Sudan and Venezuela has an array of float 1.0 for .date attribute #850

Open
IanHopkinson opened this issue Feb 14, 2024 · 12 comments
Labels
Data API Related to the Data API, not to this code base directly task something that needs to be done

Comments

@IanHopkinson
Copy link

IanHopkinson commented Feb 14, 2024

Flood hazard data for Colombia, Nigeria, Sudan and Venezuela have an array of float 1.0 for .date attribute which cannot be parsed as a date.

To replicate:

from climada.util.api_client import Client
client = Client()
flood = client.get_hazard("flood", properties={
                            "country_name": "Colombia",
                        })
flood.date

Produces the result:

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1.])

By contrast the same code for Haiti produces the result:

array([731529, 733948, 733021, 732238, 732977, 735649, 731839, 732826,
       736580, 733439, 734601], dtype=int64)
@IanHopkinson IanHopkinson changed the title Flood data for Colombia has an array of float 1.0 for .date attribute Flood data for Colombia, Nigeria and Sudan has an array of float 1.0 for .date attribute Feb 14, 2024
@IanHopkinson IanHopkinson changed the title Flood data for Colombia, Nigeria and Sudan has an array of float 1.0 for .date attribute Flood data for Colombia, Nigeria, Sudan and Venezuela has an array of float 1.0 for .date attribute Feb 14, 2024
@peanutfun
Copy link
Member

@IanHopkinson Thanks for reporting this. I can confirm that the .date attribute of the flood data retrieved for Colombia, Nigeria and Sudan is all floating-point ones. However, using your code I cannot retrieve data for Venezuela (NoResult error).

As immediate solution, you can try casting the dates to ints, this way at least you should not run into value or data type errors. Of course, this will not give you more sensible data, but at least all Climada operations should run smoothly:

flood.date = flood.date.astype("int")

@emanuel-schmid Do you see a way of updating the datasets and adding the correct date information?

@peanutfun peanutfun added the Data API Related to the Data API, not to this code base directly label Feb 15, 2024
@IanHopkinson
Copy link
Author

@peanutfun - thanks! Currently I catch the exception, which is specific to my code, to allow operations to continue. For Venezuela I retrieve the data using the iso3alpha code:

 flood = client.get_hazard("flood", properties={
                            "country_iso3alpha": "VEN",
                        })

To give you some idea of where these issues are coming from, I'm uploading data to the Humanitarian Data Exchange for the Humanitarian Response Plan countries listed here: https://github.com/OCHA-DAP/hdx-scraper-climada/blob/main/src/hdx_scraper_climada/metadata/countries.csv

I'm working my way through exposures and hazards so far I've done litpop, crop_production, earthquakes, floods and I'm going to do River_flood, Tropical_cyclone, Relative_cropyield

@emanuel-schmid
Copy link
Collaborator

@emanuel-schmid Do you see a way of updating the datasets and adding the correct date information?

Sure, I'm gonna give it a try - but I can't right away tell until when it's done.

@peanutfun
Copy link
Member

@emanuel-schmid Great to hear, thank you! I was mostly wondering if the data is available at all.

@peanutfun
Copy link
Member

@IanHopkinson Please bear in mind that these datasets are provided on a best-effort basis and with no guarantees on correctness and completeness whatsoever. We see them as "demonstrator" datasets for a Climada application and recommend users to use their own data for specialized applications as much as possible. See the disclaimer on the website of the API service here: https://climada.ethz.ch/disclaimer/ In the data types section, you will also find more detailed information on the datasets.

@emanuel-schmid
Copy link
Collaborator

I was mostly wondering if the data is available at all.

That is indeed a very good question. 🤔

@IanHopkinson
Copy link
Author

@peanutfun - no problem - that is understood!

@Evelyn-M
Copy link
Collaborator

@IanHopkinson
There's a two-part answer to this:

  1. General If you upload data to the HDX, please do not use these files, but rather the original ones from The Global Flood Database, available at https://global-flood-database.cloudtostreet.ai/ to avoid data being copied infinite times and important meta-data (like source, purpose, methods) getting lost. What we did is collect these files, which are event-based, but across several countries at times, and re-grouped them to country-wise files (covering various events, instead). However, for random users this will not be clear where they come from, how they have been post-processed, etc.

  2. Specific If some of the dates are missing, you can use this file (attached), which collects metadata of the original cloudtostreet files: via the id column in the csv (in the hdf5 file, this should be event_id), you can match it with the provided date of the csv. Most files should be correctly updated, but it can happen that some metadata got lost.
    flood_metainfo.csv

@IanHopkinson
Copy link
Author

Thanks @Evelyn-M - that should fix my issue, also the link to the original source is very useful since I was checking the cloudstreet.ai website and it was re-directing to floodbase.com

@peanutfun
Copy link
Member

@Evelyn-M @emanuel-schmid Do you intend to update the dates in the dataset on the API according to the "metainfo" file you provided? If not, I will close this issue

@emanuel-schmid
Copy link
Collaborator

@peanutfun: yes, eventually. But it's note yet clear when. 🤷

@peanutfun
Copy link
Member

No worries, will leave it open then ✌️

@peanutfun peanutfun added the task something that needs to be done label Jun 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Data API Related to the Data API, not to this code base directly task something that needs to be done
Projects
None yet
Development

No branches or pull requests

4 participants