Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit xmls - cdscan optimization and investigation of psql as alternative #1059

Open
durack1 opened this issue Feb 21, 2015 · 6 comments
Open
Assignees

Comments

@durack1
Copy link
Member

durack1 commented Feb 21, 2015

As discussed earlier by @doutriaux1 @williams13 @painter1 there is a need to revisit cdscan as it's started to become difficult to maintain.

As noted in #124 additional to cdscan the psql package by Charlie O'Connor (https://github.com/UV-CDAT/uvcdat/tree/master/Packages/psql) may also be an approach to revisit when rethinking xmls.

@doutriaux1 @williams13 @painter1 - I think it's better to have a dedicated issue here, rather than tacking comments onto a bugfix issue for cdscan.

@doutriaux1 is 2.4 a likely milestone?

@durack1
Copy link
Member Author

durack1 commented Jul 17, 2015

@dnadeau4 @doutriaux1 I've been running cdscan across most of the local CMIP5 archive and have some interesting errors and warnings to share - some of these will be further documented at durack1/cmip5#6

I'd be happy to discuss insights and provide feedback on any changes you're proposing

@durack1
Copy link
Member Author

durack1 commented Apr 24, 2016

It would be great to also add the ability to generate aliases for coordinate variables - so extending the existing -a alias_file functionality so that coords can be renamed to time,lat,lon etc.

This would allow data that is not CF-compliant to be remapped in a CF-compliant xml file.

@covey1
Copy link

covey1 commented May 27, 2016

PS Paul has encouraged me to add a cdscan "bug" encountered on sierra.llnl.gov (a Linux cluster at the Livermore Computing facility). I find that cdscan can scan over netCDF files and produce a set of XMLs with no problem, but it cannot go to the next level and scan over those XMLs:

bash-4.1$ date
Fri May 27 16:30:44 PDT 2016
bash-4.1$ pwd
/g/g18/covey1/CMIP5/Tides/OtherFields/GridpointTimeseries/CMORPH
bash-4.1$ ls -l CMORPH_V1.0_ADJ_0.25degHLYJul2[1-3]Z2008composite.xml
-rw------- May 25 09:48 CMORPH_V1.0_ADJ_0.25degHLYJul21Z2008composite.xml
-rw------- May 25 09:48 CMORPH_V1.0_ADJ_0.25degHLYJul22Z2008composite.xml
-rw------- May 25 09:48 CMORPH_V1.0_ADJ_0.25degHLYJul23Z2008composite.xml
bash-4.1$ cdscan -x testcase.xml CMORPH_V1.0_ADJ_0.25degHLYJul2[1-3]Z2008composite.xml
Finding common directory ...
Traceback (most recent call last):
File "/usr/gapps/uvcdat/chaos_5_x86_64_ib/default/2015-04-28/bin/cdscan", line 1680, in
main(sys.argv)
File "/usr/gapps/uvcdat/chaos_5_x86_64_ib/default/2015-04-28/bin/cdscan", line 834, in main
for t0, t1, lev0, lev1, path in slicelist:
ValueError: too many values to unpack
bash-4.1$

@durack1
Copy link
Member Author

durack1 commented May 27, 2016

Just for clarity, the specific issue that @covey1 has hit (and I have also encountered previously) is a problem dealing with very high resolution (hourly, 0.1 degree spatial resolution) data, where cdscan is unable to complete..

To attempt to get around this issue, he has attempted to create xmls that span a subset of this high frequency data and then has attempted to cdscan the subset of xmls so that the full data record is available..

We can both provide a test case of this high resolution data when work begins on revising cdscan

@doutriaux1
Copy link
Contributor

I know about the xml of xmls issue, it gets confused with full vs relative path.
@dnadeau4 we need to revive Psql (pcmdi sql) it was so much better.

@durack1
Copy link
Member Author

durack1 commented Jun 9, 2016

@dnadeau4 @doutriaux1 this is something else to consider, the docs for netcdf4.MFDataset are here, thanks to @jkrasting for pointing this out.

Here are some other links:

http://stackoverflow.com/questions/20340977/using-mfdataset-to-combine-netcdf-files-in-python
http://stackoverflow.com/questions/23116570/how-to-use-mfdataset-to-read-multiple-files-in-opendap-dataset-with-python-netcd

From: John Krasting - NOAA Federal
Date: Thursday, June 9, 2016 at 12:31 PM
To: Paul Durack
Subject: Quick cdms2 question ...

Hi Paul - 

Hope all is well.  I have a quick cdms2 question.  Right now, I use cdscan to aggregate files using wildcards (i.e.):

   cdscan -x some.xml files*.nc

Is it possible to do this directly in python/cmds2 and somehow skip the cdscan step:

   f = cdms2.open('files*.nc')

The vanilla netCDF4.MFDataset supports this and it is very very convenient when trying to do cgi server-side scripting:

  f = netCDF4.MFDataset('files*.nc')

Any ideas how I can open multiple files, use cdms2 data types, all without a separate cdscan step?  Any possibility of somehow combining MFDataset with cdms2 in my scripts? 

Thanks,
John

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants