Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingest of CSV file #118

Open
falkamelung opened this issue Oct 20, 2024 · 5 comments
Open

Ingest of CSV file #118

falkamelung opened this issue Oct 20, 2024 · 5 comments

Comments

@falkamelung
Copy link
Member

Below example CSV/XLS file we want to ingest. Im not sure what is better, CSV or XLS. XLS is handy because you can open and modify in a spreadsheet. I added the metadata manually. Are these the critical metadata for the ingest to work? This file is produced by a different software (sarvey), which starts with miaplpy data products. I am just getting started with this. Once we have decided about the format and confirm that the ingest works I will create a python script to generate these files as part of the sarvey workflow.

The key parameter which we were not able to properly examine is the estimated elevation. If it agrees with the real elevation that means that this is a reliable pixel. I probably will add another column lidar_elevation. If that exist it should display it as well.

image

Here the "needed" attributes in hdf5*_2_json_mbtiles.py. Many of them don't seem critical. Can we just say unknown for now? I will add add them once I fine them. But it will be good to make it work with as few needed data as possible.

needed_attributes = {
    "prf", "first_date", "mission", "WIDTH", "X_STEP", "processing_software",
    "wavelength", "processing_type", "beam_swath", "Y_FIRST", "look_direction",
    "flight_direction", "last_frame", "post_processing_method", "min_baseline_perp"
    "unwrap_method", "relative_orbit", "beam_mode", "LENGTH", "max_baseline_perp",
    "X_FIRST", "atmos_correct_method", "last_date", "first_frame", "frame", "Y_STEP", "history",
    "scene_footprint", "data_footprint", "downloadUnavcoUrl", "referencePdfUrl", "areaName", "referenceText",
    "REF_LAT", "REF_LON", "CENTER_LINE_UTC", "insarmaps_download_flag", "mintpy.subset.lalo"
}

SunnyIslesSenA48_20190101-20233110.csv
SunnyIslesSenA48_20190101-20233110.xlsx

@stackTom
Copy link
Contributor

I'm confused. So we need to ingest csv files on top of h5 files now?
Can't this data just be put inside the h5 files as extra attributes? We already have a system for ingesting extra attributes.

@falkamelung
Copy link
Member Author

It will be good to have the ability to ingest csv. The alternative is to convert a csv into an HDF5EOS, but this is not smart as nobody uses HDF5EOS. But I can do this myself. Your time is better used for InSARmaps. When adding the checks to hdf*2json_mbtiles we just should keep this in mind.

I am just not sure what is better. Create a new ingest script (csv_2json_mbtiles.py) or add a --csv option to the current script. I think I prefer the second, even though the name is messed up.

@stackTom
Copy link
Contributor

stackTom commented Oct 20, 2024

The second is probably better. I am surprised csv's are frequently used. See my reply here #117 (comment)
H5 files seem much better suited for containing this large amount of data than csv files which are super rudimentary and inefficient.

@stackTom
Copy link
Contributor

stackTom commented Oct 20, 2024

Here the "needed" attributes in hdf5*_2_json_mbtiles.py. Many of them don't seem critical. Can we just say unknown for now? I will add add them once I fine them. But it will be good to make it work with as few needed data as possible.


needed_attributes = {

    "prf", "first_date", "mission", "WIDTH", "X_STEP", "processing_software",

    "wavelength", "processing_type", "beam_swath", "Y_FIRST", "look_direction",

    "flight_direction", "last_frame", "post_processing_method", "min_baseline_perp"

    "unwrap_method", "relative_orbit", "beam_mode", "LENGTH", "max_baseline_perp",

    "X_FIRST", "atmos_correct_method", "last_date", "first_frame", "frame", "Y_STEP", "history",

    "scene_footprint", "data_footprint", "downloadUnavcoUrl", "referencePdfUrl", "areaName", "referenceText",

    "REF_LAT", "REF_LON", "CENTER_LINE_UTC", "insarmaps_download_flag", "mintpy.subset.lalo"

}

SunnyIslesSenA48_20190101-20233110.csv

SunnyIslesSenA48_20190101-20233110.xlsx

Here "needed" doesn't mean "critical or necessary". At the time, I meant "these are the ones we should have on the database on the site". Ambiguous naming, I know.

Off the top of my head, some of the critical or necessary ones are scene and data footprint. Otherwise the site has no way of showing the swaths... areaName might also be critical. I will just not display the ones missing this info so the site doesn't crash. Just a little confused why some ingests are missing this info now when they haven't been for the past 7-8 years

@falkamelung
Copy link
Member Author

Yes, so maybe just separate into needed_attributes and optional_attributes. If a needed one is missing it exits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants