Issue in running the tracking algorithm for CM1 output #75

ealucy · 2023-12-05T20:08:47Z

My goal is to track vorticity features within tropical cyclone model output in CM1. Currently, when running the code, I'm getting an error that the algorithm doesn't seem to be seeing the files within the directory that I have housed them:

(aug23_env) el381212@turing:/nfs/tcdynasty/lucy$ python run_generic_tracking.py config.yml
2023-12-05 20:01:29,557 - pyflextrkr.idfeature_driver - INFO - Identifying features from raw data
2023-12-05 20:01:30,181 - pyflextrkr.idfeature_driver - INFO - Total number of files to process: 0
2023-12-05 20:01:30,184 - pyflextrkr.idfeature_driver - INFO - Done with features from raw data.
2023-12-05 20:01:30,184 - pyflextrkr.tracksingle_driver - INFO - Tracking sequential pairs of idfeature files
2023-12-05 20:01:30,185 - pyflextrkr.tracksingle_driver - INFO - Total number of files to process: 0
2023-12-05 20:01:30,185 - pyflextrkr.tracksingle_driver - INFO - Done with tracking sequential pairs of idfeature files
2023-12-05 20:01:30,185 - pyflextrkr.gettracks - INFO - Tracking features sequentially from single track files
2023-12-05 20:01:30,186 - pyflextrkr.gettracks - INFO - Total number of files to process: 0
Traceback (most recent call last):
File "/nfs/tcdynasty/lucy/run_generic_tracking.py", line 61, in
tracknumbers_filename = gettracknumbers(config)
^^^^^^^^^^^^^^^^^^^^^^^
File "/nfs/knight/mamba_aug23/envs/aug23_env/lib/python3.11/site-packages/pyflextrkr/gettracks.py", line 74, in gettracknumbers
logger.debug(f"files[0]: {files[0]}")
~~~~~^^^
IndexError: list index out of range

I'll also attach my config file:

ERA5 vorticity anomaly tracking configuration file

Identify features to track

run_idfeature: True

Track single consecutive feature files

run_tracksingle: True

Run tracking for all files

run_gettracks: True

Calculate feature statistics

run_trackstats: True

Link merge/split tracks

run_mergesplit: True

Map tracking to pixel files

run_mapfeature: True

Start/end date and time

startdate: '20000101_000004'
enddate: '20000101_000008'

Parallel processing set up

run_parallel: 1 (local cluster), 2 (Dask MPI)

run_parallel: 1
nprocesses: 32 # Number of processors to use if run_parallel=1

databasename: 'cm1out_'
#databasename: ERA5_SFvortPV_

Specify date/time string format in the file name

E.g., radar_20181101.011503.nc --> yyyymodd.hhmmss

E.g., wrfout_2018-11-01_01:15:00 --> yyyy-mo-dd_hh:mm:ss

time_format: 'yyyymodd_hhmmss'

Input files directory

clouddata_path: '/nfs/tcdynasty/lucy/cm1/'

Working directory for the tracking data

root_path: '/nfs/tcdynasty/lucy/cm1_tracking/'

root_path: '/pscratch/sd/j/jmarquis/ERA5_waccem/Bandpassed/'

Working sub-directory names

tracking_path_name: 'vtracking'
stats_path_name: 'vortstats'
pixel_path_name: 'vortracking'

Specify types of feature being tracked

This adds additional feature-specific statistics to be computed

feature_type: 'generic'

Specify data structure

datatimeresolution: 1/3600 # hours
pixel_radius: .015625 # km
x_dimname: 'ni'
y_dimname: 'nj'
time_dimname: 'time'
time_coordname: 'time'
x_coordname: 'x'
y_coordname: 'y'
field_varname: 'rel_vort'

Feature detection parameters

label_method: 'skimage.watershed'

peak_local_max params:

plm_min_distance: 15 # min_distance - distance buffer between maxima; num grid points
plm_exclude_border: 5 # exclude_border - distance buffer between maxima and the domain sides; num grid points
plm_threshold_abs: 0 # threshold_abs - minimum magnitude of PSI' required to define a maxima

watershed params:

cont_thresh: 0.00002 # PSI' contour defining outermost of flood-filled object area
compa: 0 #"compactness factor" - (how much you'll let a flood fill spread into a neighbor's domain. Zero or < 100 seemed ok.)

field_thresh: [1.6, 1000] # variable thresholds

min_size: .1 # Min area to define a feature (km^2)
R_earth: 6378.0 # Earth radius (km)

Tracking parameters

timegap: 1/3600 # hour
othresh: 0.3 # overlap percentage threshold
maxnclouds: 100 # Maximum number of features in one snapshot
nmaxlinks: 10 # Maximum number of overlaps that any single feature can be linked to
duration_range: [6, 800] # A vector [minlength,maxlength] to specify the duration range for the tracks

Flag to remove short-lived tracks [< min(duration_range)] that are not mergers/splits with other tracks

0:keep all tracks; 1:remove short tracks

remove_shorttracks: 1

Set this flag to 1 to write a dense (2D) trackstats netCDF file

Note that for datasets with lots of tracks, the memory consumption could be very large

trackstats_dense_netcdf: 1

Minimum time difference threshold to match track stats with cloudid pixel files

match_pixel_dt_thresh: 60.0 # seconds

Link merge/split parameters to main tracks

maintrack_area_thresh: .1 # [km^2] Main track area threshold
maintrack_lifetime_thresh: 60/3600 # [hour] Main track duration threshold
split_duration: 30/3600 # [hour] Split tracks <= this length is linked to the main tracks
merge_duration: 30/3600 # [hour] Merge tracks <= this length is linked to the main tracks

Define tracked feature variable names

feature_varname: 'feature_number'
nfeature_varname: 'nfeatures'
featuresize_varname: 'npix_feature'

Track statistics output file dimension names

tracks_dimname: 'tracks'
times_dimname: 'times'
fillval: -9999

Output file base names

finalstats_filebase: 'trackstats_final_'
pixeltracking_filebase: 'vort_tracks_'

List of variable names to pass from input to tracking output data

pass_varname:

'rel_vort'

All the files are housed in the /nfs/tcdynasty/lucy/cm1/ directory, but it seems to me that they're not being found by the code. Any assistance is much appreciated!

feng045 · 2023-12-07T23:19:51Z

Based on your config, the code would be searching for input files like this:
/nfs/tcdynasty/lucy/cm1/cm1out_yyyymodd_hhmmss.nc

And the files date/time must be within this range:
startdate: '20000101_000004'
enddate: '20000101_000008'

You should check to make sure that matches your input files.

ealucy · 2023-12-08T01:20:18Z

Yes, they match. The files are titled like this: 'cm1out_20000101_000004.nc'. Curious!

feng045 · 2023-12-08T05:21:54Z

I just realized that your startdate and enddate only differ by 4 seconds. The code calculates the date/times from your filenames (hence the specified datetime format 'yyyymodd_hhmmss' in the config), and then only keeps those that fall within your specified startdate and enddate for processing.

What does your file names look like? Can you put the list of your full file names here?

ealucy · 2023-12-08T20:26:09Z

Yes, that is correct. This is the file list:
cm1out_20000101_000004.nc
cm1out_20000101_000005.nc
cm1out_20000101_000006.nc
cm1out_20000101_000007.nc
cm1out_20000101_000008.nc
There are only these five files, as the entire dataset is not housed locally. I was hoping to test the tracker on these few to get an idea of how it works before attempting to do so on the entire dataset.

feng045 · 2023-12-08T21:08:50Z

I think I may know why. The function in PyFLEXTRKR converting input file datetimes did not use the digits down to seconds precision. See the code at this line.

You can try making a larger datetime window that include all the files you have, e.g.,
startdate: '20000101_000000'
enddate: '20000101_001000'

ealucy closed this as not planned Won't fix, can't repro, duplicate, stale Dec 8, 2023

ealucy reopened this Dec 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue in running the tracking algorithm for CM1 output #75

Issue in running the tracking algorithm for CM1 output #75

ealucy commented Dec 5, 2023

feng045 commented Dec 7, 2023

ealucy commented Dec 8, 2023

feng045 commented Dec 8, 2023

ealucy commented Dec 8, 2023

feng045 commented Dec 8, 2023

Issue in running the tracking algorithm for CM1 output #75

Issue in running the tracking algorithm for CM1 output #75

Comments

ealucy commented Dec 5, 2023

ERA5 vorticity anomaly tracking configuration file

Identify features to track

Track single consecutive feature files

Run tracking for all files

Calculate feature statistics

Link merge/split tracks

Map tracking to pixel files

Start/end date and time

Parallel processing set up

run_parallel: 1 (local cluster), 2 (Dask MPI)

Specify date/time string format in the file name

E.g., radar_20181101.011503.nc --> yyyymodd.hhmmss

E.g., wrfout_2018-11-01_01:15:00 --> yyyy-mo-dd_hh:mm:ss

Input files directory

Working directory for the tracking data

root_path: '/pscratch/sd/j/jmarquis/ERA5_waccem/Bandpassed/'

Working sub-directory names

Specify types of feature being tracked

This adds additional feature-specific statistics to be computed

Specify data structure

Feature detection parameters

peak_local_max params:

watershed params:

field_thresh: [1.6, 1000] # variable thresholds

Tracking parameters

Flag to remove short-lived tracks [< min(duration_range)] that are not mergers/splits with other tracks

0:keep all tracks; 1:remove short tracks

Set this flag to 1 to write a dense (2D) trackstats netCDF file

Note that for datasets with lots of tracks, the memory consumption could be very large

Minimum time difference threshold to match track stats with cloudid pixel files

Link merge/split parameters to main tracks

Define tracked feature variable names

Track statistics output file dimension names

Output file base names

List of variable names to pass from input to tracking output data

feng045 commented Dec 7, 2023

ealucy commented Dec 8, 2023

feng045 commented Dec 8, 2023

ealucy commented Dec 8, 2023

feng045 commented Dec 8, 2023