Ingest of CSV file #118

falkamelung · 2024-10-20T19:48:47Z

Below example CSV/XLS file we want to ingest. Im not sure what is better, CSV or XLS. XLS is handy because you can open and modify in a spreadsheet. I added the metadata manually. Are these the critical metadata for the ingest to work? This file is produced by a different software (sarvey), which starts with miaplpy data products. I am just getting started with this. Once we have decided about the format and confirm that the ingest works I will create a python script to generate these files as part of the sarvey workflow.

The key parameter which we were not able to properly examine is the estimated elevation. If it agrees with the real elevation that means that this is a reliable pixel. I probably will add another column lidar_elevation. If that exist it should display it as well.

Here the "needed" attributes in hdf5*_2_json_mbtiles.py. Many of them don't seem critical. Can we just say unknown for now? I will add add them once I fine them. But it will be good to make it work with as few needed data as possible.

needed_attributes = {
    "prf", "first_date", "mission", "WIDTH", "X_STEP", "processing_software",
    "wavelength", "processing_type", "beam_swath", "Y_FIRST", "look_direction",
    "flight_direction", "last_frame", "post_processing_method", "min_baseline_perp"
    "unwrap_method", "relative_orbit", "beam_mode", "LENGTH", "max_baseline_perp",
    "X_FIRST", "atmos_correct_method", "last_date", "first_frame", "frame", "Y_STEP", "history",
    "scene_footprint", "data_footprint", "downloadUnavcoUrl", "referencePdfUrl", "areaName", "referenceText",
    "REF_LAT", "REF_LON", "CENTER_LINE_UTC", "insarmaps_download_flag", "mintpy.subset.lalo"
}

SunnyIslesSenA48_20190101-20233110.csv
SunnyIslesSenA48_20190101-20233110.xlsx

The text was updated successfully, but these errors were encountered:

stackTom · 2024-10-20T20:35:47Z

I'm confused. So we need to ingest csv files on top of h5 files now?
Can't this data just be put inside the h5 files as extra attributes? We already have a system for ingesting extra attributes.

falkamelung · 2024-10-20T20:56:13Z

It will be good to have the ability to ingest csv. The alternative is to convert a csv into an HDF5EOS, but this is not smart as nobody uses HDF5EOS. But I can do this myself. Your time is better used for InSARmaps. When adding the checks to hdf*2json_mbtiles we just should keep this in mind.

I am just not sure what is better. Create a new ingest script (csv_2json_mbtiles.py) or add a --csv option to the current script. I think I prefer the second, even though the name is messed up.

stackTom · 2024-10-20T21:04:58Z

The second is probably better. I am surprised csv's are frequently used. See my reply here #117 (comment)
H5 files seem much better suited for containing this large amount of data than csv files which are super rudimentary and inefficient.

stackTom · 2024-10-20T21:09:43Z

Here the "needed" attributes in hdf5*_2_json_mbtiles.py. Many of them don't seem critical. Can we just say unknown for now? I will add add them once I fine them. But it will be good to make it work with as few needed data as possible.
needed_attributes = {

    "prf", "first_date", "mission", "WIDTH", "X_STEP", "processing_software",

    "wavelength", "processing_type", "beam_swath", "Y_FIRST", "look_direction",

    "flight_direction", "last_frame", "post_processing_method", "min_baseline_perp"

    "unwrap_method", "relative_orbit", "beam_mode", "LENGTH", "max_baseline_perp",

    "X_FIRST", "atmos_correct_method", "last_date", "first_frame", "frame", "Y_STEP", "history",

    "scene_footprint", "data_footprint", "downloadUnavcoUrl", "referencePdfUrl", "areaName", "referenceText",

    "REF_LAT", "REF_LON", "CENTER_LINE_UTC", "insarmaps_download_flag", "mintpy.subset.lalo"

}
SunnyIslesSenA48_20190101-20233110.csv

SunnyIslesSenA48_20190101-20233110.xlsx

Here "needed" doesn't mean "critical or necessary". At the time, I meant "these are the ones we should have on the database on the site". Ambiguous naming, I know.

Off the top of my head, some of the critical or necessary ones are scene and data footprint. Otherwise the site has no way of showing the swaths... areaName might also be critical. I will just not display the ones missing this info so the site doesn't crash. Just a little confused why some ingests are missing this info now when they haven't been for the past 7-8 years

falkamelung · 2024-10-20T21:16:23Z

Yes, so maybe just separate into needed_attributes and optional_attributes. If a needed one is missing it exits.

falkamelung added the priority 9 label Oct 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ingest of CSV file #118

Ingest of CSV file #118

falkamelung commented Oct 20, 2024

stackTom commented Oct 20, 2024

falkamelung commented Oct 20, 2024

stackTom commented Oct 20, 2024 •

edited

Loading

stackTom commented Oct 20, 2024 •

edited

Loading

falkamelung commented Oct 20, 2024

Ingest of CSV file #118

Ingest of CSV file #118

Comments

falkamelung commented Oct 20, 2024

stackTom commented Oct 20, 2024

falkamelung commented Oct 20, 2024

stackTom commented Oct 20, 2024 • edited Loading

stackTom commented Oct 20, 2024 • edited Loading

falkamelung commented Oct 20, 2024

stackTom commented Oct 20, 2024 •

edited

Loading

stackTom commented Oct 20, 2024 •

edited

Loading