-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get_hz_data_from_NASIS_db - issues with join to phsample #120
Comments
Recently resolved issue #119 has some lab pedon userpedonIDs that would be a good test case for this.
|
Breadcrumbs for picking up this issue in a bit. Thanks to @phytoclast and Matt Bromley in Grand Rapids for highlighting the severity of this issue. It is unclear why this type of result is causing duplication -- as the phiidrefs are indeed different.
|
In other instances, I've dealt with this type of situation by concatenating the results, which to my knowledge can't be done in T-SQL. Therefore I'd suggest creating a separate query to return the results, concatenate the results, and then join them to the horizon table. It would also seem advisable to generate an error message. |
I don't know that it was ever appropriate to assume phorizon and phsample were 1:1 -- it just works most of the time.
As I suggested above, we could define some sort of calculation In principle, there needs to be a defined way of supporting records that have actual subsamples or other reasons why repeated measures were made. Perhaps this can be achieved by allowing for weighted averaging overlapping portions as some sort of a pre-filter on the horizon data. A parallel problem is the "disaggregated glossic horizons" @phytoclast brought up in #122 -- which would benefit from a defined way of resolving overlap where the overlapping layers are unique and have one or more attributes filled out that defines the weighting. This probably would involve concatenation or dominant condition aggregation for categoricals. |
This issue is now resolved since the default for There may still be room for a |
Looks like probably both of these things need to be implemented, especially in light of the current DSP lab sampling guidance. I think there may be value in having some concept of a 1:1 relationship with phorizon--in current labsampnum column for backward compatibility--but then also a list column, or a comma-separated character column with all the labsampnum for a particular horizon. |
Related test added to aqp RE: overlapping horizons: ncss-tech/aqp@37d6e81 Full sample SPC c/o @andypaolucci |
The LEFT OUTER JOIN to phsample used in the main NASIS horizon data query has the potential to duplicate horizon records in the event of a many:one relationship.
There are some data in NASIS that have duplicate entries in phsample -- same labsampnum, sample bag number, depth intervals etc. These are not true subsamples but are rather are the result of duplicate lab pedon entries both being in the database, or a data entry error.
In the rare event of true subsamples (i.e. multiple labsampnum per phiid), the same duplication will occur.
Currently the LEFT OUTER JOIN works for 99% of data and is extremely convenient for getting labsampnum into
@horizons
for fetchNASIS result. That said, it can cause spurious data quality issues -- and exclusion of pedons with valid data with default argumentrmHzErrors=TRUE
.Doing all of the joins in SQL severely limits options for programmatically dealing with the child table phsample. I think this code should be re-vamped to utilize fewer assumptions (see related issue #52 RE:
MIN(texcl)
) and other SQL acrobatics.Preference should be to return the unique labsampnum for subsample spanning the whole depth interval of horizon in question -- this would deal cleanly with straight duplications of records in phsample.
Subsequent preference could be given to the largest proportional subsample interval. An additional field could contain a comma-separated string of non-dominant labsampnums.
The text was updated successfully, but these errors were encountered: