-
-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
check dataset element description #329
Comments
Good catch. I think the word 'dataset' is being used in two different ways in this description. First, two describe the scientific concept of a dataset and, second, to describe what an EML dataset is. I think sentence two holds if you use the second definition but not the first. I actually think that might have been the original intent of the wording. Did you and the others think up any alternatives, or would you like to have a try at it if you think it still needs tweaking? |
Hi folks,
Bryce-- thanks for the thoughts. When the documentation says:
A dataset is defined as all of the information describing a data
collection event.
...I think rather than a "dataset" per se, this is referring to the
eml-dataset module's *dataset* element. This could be grokked from
context, but as worded,
struck us as a bit confusing.
I think a clearer more accurate description for the eml-dataset
"dataset" field would be as follows:
*The EML dataset element is the top-level "container" organizing the
information describing aspects of the collection event that produces
the dataset.*
Perhaps this could be wordsmithed a bit more, but I think it conveys
what we are trying to describe in the context of the EML
documentation...
cheers,
Mark
…On Tue, Jan 8, 2019 at 12:28 PM Bryce Mecum ***@***.***> wrote:
Good catch. I think the word 'dataset' is being used in two different ways
in this description. First, two describe the scientific concept of a
dataset and, second, to describe what an EML dataset is. I think sentence
two holds if you use the second definition but not the first. I actually
think that might have been the original intent of the wording.
Did you and the others think up any alternatives, or would you like to
have a try at it if you think it still needs tweaking?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#329 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AE61-QkzFaV6azqhX5DT1ezWaUt1hQf2ks5vBP9wgaJpZM4Z0hE1>
.
|
Thanks @mpsaloha that looks pretty good. I put together a version with minimal modification to increase clarity:
What do you think? If you like your version better, I'd be fine with that. I'll send this over to the #eml channel in case anyone else has thoughts. |
Hi @amoeba - Interesting discussion. I am surely overthinking this but I find 'event' a bit misleading and, maybe, constraining as it conveys the sense that a dataset results only from going into the field. I wonder if the language could be a bit more encompassing to make it read less "fieldy" and reflect that a dataset could in fact describe the output extensive research. I played around a bit focusing on research effort as a substitute.
|
Thanks for chiming in, @srearl! I take your point about the constrained scope of the current wording. We certainly do use EML |
I'm fine with Stevan's suggestion, but still actually prefer event over
effort. Here is why--
While the term "event" is indeed vague, for me it simply connotes any
identified process/es *that occurred in some place/time,* but not
necessarily *"about" some place and time* (though I'm not sure this
distinction is readily clarified in EML). So I don't see it as too
"fieldy"-- something has to have happened for data to be collected or
created, and that something could commonly be called an "event". I might
suggest that "efforts" result in "events", that is, "efforts" connote a
more project-oriented view, but again usages are varied. Is a *model
execution* that generates simulation data, an effort or an event? For me
it is more naturally described by the latter. But I can't think of any
situations where replacing "event" with "effort" is going to be too
misleading...
I am of the old school, however, that prefers for more greatly constraining
the use of "dataset" to pertain to a distinct data object (e.g. a table or
image), rather than broadening it. I am a bit sad to see "dataset" become
whatever arbitrary circumscription someone wants to apply to a set of
digital objects-- where I vastly prefer the less well-established use of
the term "data package" as in DataONE (though terminological usages also
vary somewhat there).
In our EML documentation we suggested that a dataset could be used to
describe multiple "tables", but our example was referring to highly
inter-related if not co-dependent objects (e.g. a set of relational tables
with integrity constraints in a RDBMS).
Just my thoughts...
cheers,
Mark
…On Wed, Jan 9, 2019 at 2:53 PM Bryce Mecum ***@***.***> wrote:
Thanks for chiming in, @srearl <https://github.com/srearl>! I take your
point about the constrained scope of the current wording. We certainly do
use EML dataset to document resources where concepts like
temporal/spatial coverage do not apply (e.g., the derived output of running
a physical simulation model).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#329 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AE61-Q047H4zcWUnhJDOOYXbxh__TExMks5vBnLOgaJpZM4Z0hE1>
.
|
I'll just point out that the eml-dataset module has this sentence that (sort of) defines a dataset: "A dataset can be (and often is) composed of a series of data entities (tables) that are linked together by particular integrity constraints." I don't see that text appearing in any of the field descriptions. Perhaps it's worth adding to the "dataset" field description. Original context: https://github.com/NCEAS/eml/blob/BRANCH_EML_2_2/docs/eml-modules-resources.md#the-eml-dataset-module---dataset-specific-information |
Hi Steven,
Yes- that quote is what I was alluding to in the penultimate sentence of my
prior email. It might be good to reiterate here, as you suggest.
Note that this is encouraging a narrower usage of the term "dataset", than
saying that a dataset can be whatever someone or some project wants to
(somehow) group together. I think this latter usage has the inherent
danger of encouraging the lumping together as a "dataset" only loosely
related digital objects, although it does make it easier to provide
collective metadata at a less detailed level (which I consider problematic)
But again, these terms are ambiguous and subject to semantic drift. So I'm
always happy to nudge towards what may be the more traditional use.
Although I've been teased several times for bringing up this reference, it
is still the case that Wikipedia says :
A *data set* (or *dataset*) is a collection of data
<https://en.wikipedia.org/wiki/Data>. Most commonly a data set corresponds
to the contents of a single database table
<https://en.wikipedia.org/wiki/Table_(database)>, or a single statistical data
matrix <https://en.wikipedia.org/wiki/Data_matrix_(multivariate_statistics)>,
where every column <https://en.wikipedia.org/wiki/Column_(database)> of the
table represents a particular variable, and each row
<https://en.wikipedia.org/wiki/Row_(database)> corresponds to a given
member of the data set in question.
I concur with Wikipedia about this, based on my experience with its typical
usage among scientists, at least in the social science, ecology and
biodiversity domains. Wikipedia also goes on to say other stuff, but still
mentions these items would be "closely related" through some particular
experiment or event.
cheers,
Mark
…On Thu, Jan 10, 2019 at 10:23 AM Steven Chong ***@***.***> wrote:
I'll just point out that the eml-dataset module has this sentence that
(sort of) defines a dataset:
"*A dataset can be (and often is) composed of a series of data entities
(tables) that are linked together by particular integrity constraints.*"
I don't see that text appearing in any of the field descriptions. Perhaps
it's worth adding to the "dataset" field description.
Original context:
https://github.com/NCEAS/eml/blob/BRANCH_EML_2_2/docs/eml-modules-resources.md#the-eml-dataset-module---dataset-specific-information
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#329 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AE61-f2yC4Meafy4_g9m-CjqJXGUNoY8ks5vB4UZgaJpZM4Z0hE1>
.
|
Following a conversation with @mpsaloha and @gothub , we wanted to get clarification on the definition of a "dataset" that appears in the
dataset
element description:The second sentence caught our attention and sounds more relevant to the metadata about a dataset, rather than to a dataset itself.
If this description gets edited, note that it also appears in the
DatasetType
description.The text was updated successfully, but these errors were encountered: