-
Notifications
You must be signed in to change notification settings - Fork 13
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
5e867ef
commit 8886a86
Showing
6 changed files
with
336 additions
and
6 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,95 @@ | ||
# Working With CAOM2 | ||
|
||
For observations to appear in [CADC search services](http://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/), an observation must first be described by a CAOM record. That description will then need to be loaded into the CADC CAOM repository, using a CADC web service. This web service will create a corresponding database record. | ||
|
||
Once an Observation has been described and loaded, it is searchable from CADC's UI. | ||
|
||
* If you are interested in using CADC Python Data Engineering tools, you should start [here](./user/cli_description.md). | ||
|
||
* If you are interested in scripting with the CADC Python Data Engineering tools, you should start [here](./user/script_description.md). | ||
|
||
## Preconditions | ||
|
||
1. These descriptions assume: | ||
1. a working knowledge of python. [Prefer python3, please](https://pythonclock.org/), | ||
1. a linux-type environment, | ||
1. a working directory location, where all files discussed are placed, and | ||
1. that you have a [CADC account](http://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/en/auth/request.html), which is configured by CADC to have read and write access to a CAOM `COLLECTION`. | ||
|
||
1. This description uses the parameters `TEST_FILE.FITS`, `TEST_OBS.XML` and `COLLECTION`. Replace these values appropriately when executing the commands. | ||
|
||
1. Copy the file `TEST_FILE.FITS` in the working directory. The metadata in this file will be described in the CAOM Observation created during this example. | ||
|
||
1. The example will cause an instance to be created in the [CAOM2 sandbox](http://sc2.canfar.net/search/). If you click the CAOM2 sandbox URL prior to the creation of the first CAOM instance for a `COLLECTION`, that `COLLECTION` will not show in the `Additional Constraints -> Collection` . Even after successful creation of a CAOM instance, it can take up to one day for the `COLLECTION` to be selectable from the UI. The CAOM2 'sandbox' is a site that mimics the production CADC CAOM2 storage service and search UI. This sandbox site allows developers and scientists to debug collection-specific code for creating and updating CAOM2 Observations. It also allows developers and scientiests to immediately view CAOM2 records as they will appear to users in the search interface. | ||
|
||
To use production CADC services, remove `resource-id` parameters in `caom2-repo` commands. | ||
|
||
1. Install the following python dependencies: | ||
|
||
``` | ||
pip install caom2repo | ||
pip install caom2utils | ||
``` | ||
1. Get credentials organized. The examples assume the use of a [./.netrc file](https://www.systutorials.com/docs/linux/man/5-netrc/). The examples expect this file to be named `./.netrc`, located in the working directory, and with permissions set to `-rw-------`. The `./.netrc` file content should include the following, with cadcusername and cadcpassword replaced with your CADC username and password values: | ||
```` | ||
machine www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca login canfarusername password canfarpassword | ||
machine www.canfar.net login canfarusername password canfarpassword | ||
machine ws-cadc.canfar.net login canfarusername password canfarpassword | ||
machine sc2.canfar.net login canfarusername password canfarpassword | ||
```` | ||
To set the `./.netrc` file permissions: | ||
``` | ||
chmod 600 ./.netrc | ||
``` | ||
1. The caom2-repo client also supports username/password and X509 certificates. If you want to use X509 | ||
certificates use the --cert parameter instead of the -n parameter in all the commands. The command line client `cadc-get-cert` is installed with the prerequisites for the `caom2repo` package, and `cadc-get-cert --help` from a terminal prompt will describe how to obtain a CADC certificate. | ||
1. Test the install. Commands are case-sensitive. | ||
``` | ||
caom2-repo read --netrc ./.netrc --resource-id ivo://cadc.nrc.ca/sc2repo COLLECTION abc | ||
``` | ||
If the install was successful, this will report an error: | ||
``` | ||
Client Error: Not Found for url: http://sc2.canfar.net/sc2repo/auth-observations/COLLECTION/abc. | ||
``` | ||
## Troubleshooting | ||
1. If `pip install caom2utils` fails with the following error: | ||
``` | ||
AttributeError: module ‘enum’ has no attribute ‘IntFlag’ | ||
``` | ||
Ensure the version of vos is >= 3.1.1: | ||
``` | ||
pip list | grep vos | ||
``` | ||
Upgrade vos if necessary: | ||
``` | ||
pip install --upgrade vos | ||
``` | ||
Uninstall `enum34`, the package raisng the AttributeError: | ||
``` | ||
pip uninstall enum34 | ||
``` | ||
Then retry the `caom2utils` install: | ||
``` | ||
pip install caom2utils | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
This is the developer documentation for the CADC Python Data Engineering Tools. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
# How to describe and load a CAOM2 Observation using the Command Line | ||
|
||
1. Ensure the pre-conditions as described [here](https://github.com/opencadc/caom2tools/blob/master/doc#preconditions) | ||
|
||
1. Use the file [test_obs.blueprint](https://github.com/opencadc-metadata-curation/collection2caom2/blob/master/test_obs.blueprint) as the initial version of the blueprint file. For more information on the concept of blueprints, and their use, see [here](https://github.com/opencadc/caom2tools/blob/master/doc/user/script_description.md#observation-blueprints). | ||
|
||
1. Run caom2gen. The value provided for the `--local` parameter must be a fully qualified path name. | ||
|
||
``` | ||
caom2gen --out TEST_OBS.XML --observation COLLECTION TEST_OBS --blueprint ./test_obs.blueprint | ||
--local /fully/qualified/path/TEST_FILE.FITS --lineage test_file/ad:COLLECTION/TEST_FILE.FITS | ||
``` | ||
1. There should be a file named `TEST_OBS.XML` in the working directory. | ||
1. Run caom2-repo. There will be no output if the command succeeds. | ||
``` | ||
caom2-repo create --netrc ./.netrc --resource-id ivo://cadc.nrc.ca/sc2repo TEST_OBS.XML | ||
``` | ||
1. Everything after this is making refinements to the mapping between file content and CAOM2 instance members. This means issuing `caom2-repo update` commands, instead of `caom2-repo create` commands, to make changes on the server. However, caom2-repo is particular about its ids, so after the first successful execution of caom2-repo create, do this: | ||
``` | ||
caom2-repo read --netrc ./.netrc --resource-id ivo://cadc.nrc.ca/sc2repo COLLECTION TEST_OBS > TEST_OBS_READ.XML | ||
``` | ||
1. There should be a file named `TEST_OBS_READ.XML` on disk. It will be different than the `TEST_OBS.XML` used for `create`, because the service generates a parallel set of keys that must be honoured. For each of the observation, plane, artifact, part, and chunk elements there are `id`, `lastModified`, `maxLastModified`, `metaChecksum`, and `accMetaChecksum` values. In particular, the `id` values must be consistent when doing `caom2-repo update` calls, or a "This observation already exists" error will occur. | ||
1. After you've generated this output file, use the following commands to iteratively make and view changes to the mapping between the `COLLECTION` data and the CAOM2 instance: | ||
``` | ||
caom2gen -o TEST_OBS.XML --in TEST_OBS_READ.XML --blueprint ./test_obs.blueprint --local /fully/qualified/path/TEST_FILE.FITS | ||
--lineage TEST_FILE/ad:COLLECTION/TEST_FILE.FITS | ||
caom2-repo update --netrc ./.netrc --resource-id ivo://cadc.nrc.ca/sc2repo TEST_OBS.XML | ||
``` | ||
1. In your browser, go to http://sc2.canfar.net/search, enter `TEST_OBS` into the `Observation ID` search field, click search, then click the `TEST_OBS` link in the `Obs. ID` column of the `Results` tab. This will display the details of the CAOM2 instance for `TEST_OBS` in a new tab. | ||
1. Modify the blueprint to change mappings between the `COLLECTION` data model and the CAOM2 data model. If more complicated metadata mappings are required, investigate the use of the `--module` and `--plugin` parameters to [caom2gen](https://github.com/opencadc/caom2tools/tree/master/caom2utils). There are additional `caom2gen` parameters described here as well. | ||
1. Should entries ever need to be deleted from the CAOM2 repository, replace `COLLECTION` with the appropriate value, and replace `TEST_OBS` with the observation ID that is being deleted: | ||
``` | ||
caom2-repo delete --netrc ./.netrc --resource-id ivo://cadc.nrc.ca/sc2repo COLLECTION TEST_OBS | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,186 @@ | ||
# How to describe and load a CAOM2 Observation using Python scripts | ||
|
||
Ensure the pre-conditions described [here](../README.md). | ||
|
||
The method `caom2utils.fits2caom2.augment` uses the concept of a blueprint to capture the description of a CAOM2 Observation as a | ||
mapping of a Telescope Data Model (TDM) to the CAOM2 data model. This describes how to extend that application to customize the mapping for a `COLLECTION`. | ||
|
||
`augment` works by creating or augmenting a CAOM2 Observation record, which can then be loaded via the CADC service. | ||
|
||
`augment` creates the Observation record using information contained in a FITS file. The python module `fits2caom2`, from the python package `caom2utils`, | ||
examines the FITS file and uses a blueprint, embodied in an instance of the ObsBlueprint class, to define default values, override values, and mappings to augment the FITS header. The keywords and values in the augmented FITS header are then used to fill in corresponding CAOM2 entities and attributes. | ||
|
||
There are two alternate ways to provide input file metadata to the caom2gen application: | ||
* have the file located on disk, and use the --local parameter | ||
* have the file located in a CADC archive. The artifact URI portion of the lineage parameter will be used to resolve the archive and file name. | ||
|
||
## Observation Blueprints | ||
|
||
The blueprint is one way to capture the mapping of the TDM to the CAOM2 data model. The blueprint can identify: | ||
* what information to obtain from the FITS header, | ||
* defaults in case the FITS header is incomplete, | ||
* hard-coded value when the FITS header should be ignored, or doesn't have information, and | ||
* python functions which will be loaded and executed at run-time to augment FITS keyword values. See [this section](https://github.com/opencadc/caom2tools/blob/master/doc/user/script_description.md#putting-it-all-together) for an example. | ||
|
||
The blueprint is a set of key-value pairs, where the values have three possible representations. | ||
|
||
The three representations are: defaults, overrides, and FITS keyword mappings. | ||
|
||
There is a sample blueprint in [this file](https://github.com/opencadc-metadata-curation/collection2caom2/blob/master/test_obs.blueprint). | ||
|
||
The keys are the long-form names for the CAOM2 model elements and attributes. The complete set of valid keys can be found by executing the following: | ||
|
||
pydoc caom2utils.fits2caom2.ObsBlueprint | ||
|
||
### Changing What a Blueprint Looks Like, By Extension | ||
|
||
A blueprint may be provided by one of two ways: as a file on disk, or programmatically. | ||
|
||
#### File Blueprint Usage | ||
|
||
Observation.observationID = ['OBSID'], default = TEST_OBS | ||
Plane.dataRelease = 2017-08-31T00:00:00 | ||
Chunk.position.coordsys = ['RADECSYS,RADESYS'] | ||
|
||
* Observation.observationID provides a default value of `TEST_OBS`, which is used if the `OBSID` keyword does not exist in the FITS file. | ||
* Plane.dataRelease provides an override value, which is always used. | ||
* Chunk.position.coordsys provides a list of FITS keywords to try. If the first value is not in the FITS header, the second one is queried. If neither of them exist, there will be no value for Chunk.position.coordsys in the CAOM2 observation. | ||
|
||
#### Programmatic Blueprint Usage | ||
|
||
An example of this implementation is in (https://github.com/opencadc-metadata-curation/vlass2caom2) | ||
|
||
bp = ObsBlueprint(position_axes=(1,2), time_axis=3, energy_axis=4, polarization_axis=5, observable_axis=6) | ||
bp.set_default('Observation.observationID', 'TEST_OBS') | ||
bp.set('Plane.dataRelease', '2017-08-31T00:00:00') | ||
bp.add_fits_attribute('Chunk.position.coordsys', 'RADECSYS') | ||
bp.add_fits_attribute('Chunk.position.coordsys', 'RADESYS') | ||
|
||
* Observation.observationID provides a default value of `TEST_OBS`, which is used if the `OBSID` keyword does not exist in the FITS file. | ||
* Plane.dataRelease provides an override value, which is always used when setting the plane-level data release date in the CAOM2 instance. | ||
* Chunk.position.coordsys provides a list of FITS keywords to try. The last keyword listed will be tried first, and the first keyword found will be used to set the value. | ||
|
||
To make WCS content available in the blueprint, instead of setting the indices in the ObsBlueprint constructor any of the following functions for which there is metadata in a FITS file may be called on a blueprint instance: | ||
|
||
bp = ObsBlueprint() | ||
bp.configure_position_axes((1, 2)) | ||
bp.configure_energy_axis(3) | ||
bp.configure_time_axis(4) | ||
bp.configure_polarization_axis(5) | ||
bp.configure_observable_axis(6) | ||
bp.configure_custom_axis(7) | ||
|
||
## Putting It All Together | ||
|
||
The following script is an end-to-end example of describing and loading a CAOM2 Observation to the CADC service, given a FITS file and programatically constructing a blueprint. | ||
|
||
import importlib | ||
import os | ||
from cadcutils import net | ||
from caom2 import obs_reader_writer, DataProductType, CalibrationLevel | ||
from caom2repo import CAOM2RepoClient | ||
from caom2utils import fits2caom2 | ||
|
||
|
||
def get_meta_release(header): | ||
""" | ||
Use functions when the value of many header keywords are needed | ||
to set one CAOM2 attribute. | ||
""" | ||
obs_type = header.get('OBSTYPE') | ||
if obs_type == 'OBJECT': | ||
# science observation | ||
rel_date = header.get('REL_DATE') | ||
else: | ||
# calibration observation | ||
rel_date = header.get('DATE-OBS') | ||
return rel_date | ||
|
||
|
||
# configure and create the CADC service client | ||
this_dir = os.path.dirname(os.path.realpath(__file__)) | ||
netrc_fqn = f'{this_dir}/netrc' | ||
subject = net.Subject(netrc=netrc_fqn) | ||
# remove the resource_id parameter to use production resources | ||
repo_client = CAOM2RepoClient(subject, resource_id='ivo://cadc.nrc.ca/sc2repo') | ||
|
||
# describe the Observation by setting up the mapping between the | ||
# COLLECTION and CAOM2, which is captured in an instance of | ||
# ObsBlueprint | ||
|
||
# so functions can be used in the blueprint | ||
module = importlib.import_module(__name__) | ||
|
||
bp = fits2caom2.ObsBlueprint(module=module) | ||
bp.configure_position_axes((1, 2)) | ||
# set a default value that will be used if FITS header values are not | ||
# available | ||
bp.set_default('Observation.observationID', 'TEST_OBS') | ||
# set a hard-coded value | ||
bp.set('Plane.dataRelease', '2017-08-31T00:00:00') | ||
# use an enumerated value for a hard-coded value | ||
bp.set('Plane.calibrationLevel', CalibrationLevel.RAW_STANDARD) | ||
bp.set('Plane.dataProductType', DataProductType.IMAGE) | ||
# add the FITS keyword 'RADECSYS' to the list of FITS keywords | ||
# checked for a value | ||
bp.add_fits_attribute('Chunk.position.coordsys', 'RADECSYS') | ||
# execute a function to set a value - parameter may be either | ||
# 'header' or 'uri' | ||
bp.set('Plane.metaRelease', 'get_meta_release(header)') | ||
|
||
# apply the mapping to the FITS file, which writes the Observation to | ||
# an xml file on disk | ||
kwargs = {} | ||
uri = 'ad:COLLECTION/TEST_FILE.FITS' | ||
blueprints = {uri: bp} | ||
fits2caom2.augment(blueprints=blueprints, | ||
no_validate=False, | ||
dump_config=False, | ||
plugin=None, | ||
out_obs_xml='./TEST_OBS.XML', | ||
in_obs_xml=None, | ||
collection='COLLECTION', | ||
observation='TEST_OBS', | ||
product_id='TEST_PRODUCT_ID', | ||
uri=uri, | ||
netrc=netrc_fqn, | ||
file_name='file:///test_files/TEST_FILE.FITS', | ||
verbose=False, | ||
debug=True, | ||
quiet=False, | ||
caom_namespace=obs_reader_writer.CAOM23_NAMESPACE, | ||
**kwargs) | ||
|
||
# load the observation into memory | ||
reader = obs_reader_writer.ObservationReader(False) | ||
observation = reader.read('./TEST_OBS.XML') | ||
|
||
# create the observation record with the service | ||
# | ||
# use 'update' if the observation has already been loaded to the CADC service. | ||
# The service generates a parallel set of database keys that must be honoured. | ||
# The id values must be consistent when doing 'create' and 'update' calls, or | ||
# a "This observation already exists" error will occur. | ||
# | ||
# existing_obs = repo_client.read('COLLECTION', 'TEST_OBS_ID') | ||
# writer = obs_reader_writer.ObservationWriter() | ||
# writer.write(existing_obs, '/fully/qualified/EXISTING.XML') | ||
# fits2caom2.augment(... | ||
# in_obs_xml='/fully/qualified/EXISTING.XML', | ||
# ...) | ||
# load the observation into memory | ||
# then use: | ||
# repo_client.update(observation) | ||
repo_client.create(observation) | ||
|
||
## More Information | ||
|
||
If you want: | ||
|
||
* to add direct CAOM2 model manipulation to your script, see [here](https://github.com/opencadc/caom2tools/tree/master/caom2) for an introduction to the possibilities. | ||
|
||
* a description of the latest version of the CAOM2 model, see [here](http://www.opencadc.org/caom2/). | ||
|
||
* a description of the operational version of the CADC Archive Metadata Service, see [here](http://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/ams/). | ||
|
||
* examples of the model and the client embedded in end-to-end workflows, see [here](https://github.com/opencadc-metadata-curation). Each application in this repository uses the tactic of programatically creating a unique blueprint for each file that is ingested, and then creating or updating the resulting CAOM2 Observation. |