-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectorizing some classes and reorganizing inheritance hierarchy #197
Comments
thanks @zachary-foster - it's complex so I'll take a bit to make sure I understand all of this. For my immediate concern about looking at the work on the |
Yea, its a big change conceptually, but I think the amount code change needed will not be too big though. The summary is: every class would hold multiple items and all complex classes (
The changes would build on the
Perhaps, it depends what parts of taxa you would be using most. Do you want to do a conference call to discuss the needs of taxize and so I can explain the changes with more context? |
Yes, a chat would be good. i'll email you to schedule that |
Dear people, sorry for absence: I'm just back from Kenya. I think, synonyms have been the mid-way solution to harmonize nomenclatures in vegetation-plot databases, and maybe also in taxonomy. Thus using a taxon view will not really require to apply synonyms but to establish relations between different taxon concepts that may be non-related at all, partially related or equivalents. Anyway, this is how I understand nomenclature and taxonomy following these references: Now may questions:
I was serously thinking to attempt a sort of |
Thanks for the thoughts @kamapu!
Cool! Sounds likes an interesting place to do field work in.
I think I understand. Synonyms are easy enough to deal with, but homonyms and taxa that overlap in what they include are difficult to encode. After thinking about this a bit, I am leaning toward thinking of a "taxon" in the general sense as an unknown, and unknowable idealized grouping. In order to encode how conflicting taxon concepts compare, we would need some kind of unique ID for the actual individual organisms we are trying to organize, such as an herbarium voucher. Such information is not usually available and would change over time due to evolution of organisms and changes in our understanding of biology. Therefore, rather than trying to defined exactly what the taxon is and how it related to other taxa, I am leaning towards a flexible data model that will give the user the freedom to apply it as they like.
Thanks! I read both these and now I have a better idea of what you mean by taxon view. Its a useful concept.
I am after reading those papers. I dont know scott's opinion on it. I don't think most users will need that level of detail, but it would be invaluable to a select few and might facilitate some interesting research and tools.
Its kindof functions as a taxon view, in the sense that it is an authority to reference, however, its pretty specific to online databases at the moment and since these are not the ultimate source of taxonomic authority, the concept is different from a taxon view.
hmm, I am not sure what you mean by that. Like, taxon views that reference and adapt other taxon views moving forward in time can be thought of as a tree of relationships? A taxon view family tree? Similar to the "Operational trees" in Zhong et al. (1996)?
Haha! Yea, its a tough problem and not really interesting to most people, even though it has big implications for interpreting historical information digitally. I wrote a bunch of confusing stuff below, mostly for my own notes, so don't feel obliged to read it or respond to it unless you want to. I would of course welcome feedback if you are so inclined : ) Here is the best model I have so far if we really want to try to encode all this confusion:
Below is how such a model might be implemented. However, it is excessively complicated for 95% of what people typically need to do with taxonomic data, so I am not proposing it for
A set of interlinked |
I did not like to insert it here, but there are some brief discussions here. |
Thats fine @kamapu, I will take a look. thanks |
Hi @sckott, want to take a look at the |
@zachary-foster yep, i'll take a look .. |
sorry for the delay! just back from vacation. Trying to install vectorize branch and getting errors related to vctrs - same problem with the dev and cran version of vctrs. Any thoughts? |
Hi Scott, no problem, I hope the vacation went well! It should be working now. There was a function renamed in vctrs since I last installed. I fixed it and now I can install using the github version of |
Looks good for the most part. is this intended behavior? x <- taxon(name = c('Homo sapiens', 'Bacillus', 'Ascomycota', 'Ericaceae'),
rank = c('species', 'genus', 'phylum', 'family'),
id = taxon_id(c('9606', '1386', '4890', '4345'), db = 'ncbi'),
auth = c('Linnaeus, 1758', 'Cohn 1872', NA, 'Juss., 1789'),
info = list(list(n = 1), list(n = 3), list(n = 2), list(n = 9)))
x[taxon_rank(x) > 'family']
#> <taxon[2]>
#> [1] 9606|Homo sapiens Linnaeus, 1758|species 1386|Bacillus Cohn 1872|genus
#> Rank levels: phylum < family < genus < species
#> Databases: ncbi(id)
#> Info keys: n(2) Looks like the eg returns ranks less than family, not greater than. Or am I interpreting wrong what the example does? |
Nice Its intended, but we could do it the other way if that is more intuitive. The way I set it up, high numbers indicate more fine scale ranks so species is greater than genus. I think I did that so that it looks like a factor when printing while displaying the ranks phylum -> species vs species -> phylum:
vs
But if think the oppisit ordering convertion is more intuitive, we could print it like:
Then specifying the ranks manually would look like: taxon_rank(c('A', 'B', 'C', 'D', 'A', 'D', 'D'),
levels = c(D = NA, A = 30, B = 20, C = 10)) instead of (how it is currently): taxon_rank(c('A', 'B', 'C', 'D', 'A', 'D', 'D'),
levels = c(D = NA, A = 10, B = 20, C = 30)) Here are the numbers corresponding to the ranks currently (if not specified by the user): > taxa:::rank_ref
domain superkingdom kingdom subkingdom superphylum infrakingdom
10 20 30 40 50 50
phylum division subphylum subdivision infradivision superclass
60 60 70 70 80 90
class subclass infraclass megacohort supercohort cohort
100 110 120 130 140 150
subcohort infracohort superorder order suborder infraorder
160 170 180 190 200 210
parvorder superfamily family subfamily tribe subtribe
220 230 240 250 260 270
genus subgenus section subsection species group species subgroup
280 290 300 310 320 330
species infraspecies subspecies variety varietas subvariety
340 350 360 370 370 380
race stirp form forma morph aberration
380 390 400 400 400 410
subform unspecified no rank clade
420 NA NA NA We could make the number decrease going from phylum to species if that is more intuitive |
Ah okay, right we also have the same ordering in the rank_ref dataset in taxize. That makes sense then |
Ok, cool. That list is mostly derived from |
What do you think of adding support for synonyms to the Also should the authorities be associated with the It would be nice to modify 'Bacillus' %in% x would return |
makes sense to add synonyms. isn't it possible that a user would creat a I think each synonym should be able to have a different authority for sure. - they're just like any other name i guess, with their own authority, id, etc. I can't quite visualize how this will work since |
Yea, I guess i was thinking more about how they were stored in the object. So this: name = c('A', 'B', 'C')
synonym = list(c('D', 'E'), character(0), character(0)) vs name = list(c('A', 'D', 'E'), 'B', 'C') After thinking about it, either would work. When I said "accepted" I was thinking the one that would appear in the print out and the output of
Hmm, thats a good point. I had thought each should have their own authority, but yea, they could also have different ranks, which would make things more complicated. Do synonyms ever have different IDs in databases? If so, then synonym would have to be defined using
I was thinking of just having multiple names by adding a list of It seems like there are two different things that we are trying to encode here:
What do you think about these set of changes to support these two concepts:
|
Perhaps I'm not that in into technical properties of In my opinion, there are two solutions to handle taxonomic lists, enabling their connection to observations and/or specimens:
Therefore:
|
Thanks for the input @kamapu!
I think it makes sense that a synonym can have a different rank, although I have not thought about it much before. You see it often below/at the species rank: taxize::synonyms('homo sapiens', db = 'col')[[1]][1:10, 1:3]
#> ══ 1 queries ═══════════════
#>
#> Retrieving data for taxon 'homo sapiens'
#> ✔ Found: homo sapiens
#> ══ Results ═════════════════
#>
#> ● Total: 1
#> ● Found: 1
#> ● Not Found: 0
#> id name
#> 1 27ef25a84b47c43bdf06e027d05e1e06 Homo aethiopicus
#> 2 475b397e5b049d4d6510e289036b7dcb Homo americanus
#> 3 c209900e7e04b5a67b8e3246e5215c66 Homo arabicus
#> 4 5d326b11b8bb53e1394c8c40cef4aa97 Homo australasicus
#> 5 a1723fb1a1b2d6d3e09f1ec6bae7fe9b Homo cafer
#> 6 1d92d3b9a7c4c85333d330a5d404e401 Homo capensis
#> 7 895d6bd6e6ef0e5bf2269f8df8cf05cb Homo columbicus
#> 8 2cb9d2a4281f3d88733d0e79940f01c3 Homo drennani
#> 9 00d994548f2f2e12980d4843d55d74e0 Homo fossilis proto-aethiopicus
#> 10 f85224c601bb5d16719b1c909d198531 Homo fossilis protoaethiopicus
#> rank
#> 1 species
#> 2 species
#> 3 species
#> 4 species
#> 5 species
#> 6 species
#> 7 species
#> 8 species
#> 9 infraspecies
#> 10 infraspecies Created on 2019-07-22 by the reprex package (v0.3.0) I dont think there is a problem with synonyms having ranks, as long as the all the taxon ranks in a taxon concept are more fine-scale than all the ranks in the parent taxon concept.
Since much of the information that will be used with
This is what I am going for, but there will be a kind of "first among equals" behavior when functions need a single taxon name (e.g. labeling a taxonomic tree). I was thinking the first synonym is the "representative" taxon name, but does not have more weight than the others in most operations.
The way I was thinking of it, there could still be multiple synonyms in a single taxon concept. Basically, if they have the same child taxa, they are synonyms. In the way I would encode it, F. ovina L. in Figure 1 would all be different taxon concepts (homonyms) since they have different child taxa. The two F. ovina agg. would also be different taxon concepts, but they would be the same concept (synonyms) if the placement of F. guestfalica did not differ between them.
Is this the same as saying "if they have the same child taxa, they are synonyms"?
Yea, I agree, its just an extra piece of information.
I am not sure I understand this. "Festuca ovina L. sec. Jäger & Werner (2005)" = "Festuca ovina ssp. vulgaris var. vulgaris sec. Ascherson (1864)"?
It will get more complex, since taxon concepts can then have multiple parents. I think its doable. The hard part will be to make it easy to understand for the user when there are no synonyms, since most users will not need to consider synonyms. Basically, I want new users to be unaware synonyms are even supported until they need that functionality. |
@zachary-foster looking at this in taxize: get_ fxns I'm not sure the I might play around with creating a class on top of
Same concern about classifcation applies here wrt overriding taxize fxn names if taxa is loaded after taxize. |
Hi @sckott,
The x <- vctrs::new_vctr(1:10, my_attr = 'result', class = 'test_class')
y <- vctrs::new_vctr(1:10, my_attr = 'different result', class = 'test_class')
attributes(x)
#> $my_attr
#> [1] "result"
#>
#> $class
#> [1] "test_class" "vctrs_vctr"
attributes(x[1])
#> $my_attr
#> [1] "result"
#>
#> $class
#> [1] "test_class" "vctrs_vctr"
attributes(c(x, x))
#> $my_attr
#> [1] "result"
#>
#> $class
#> [1] "test_class" "vctrs_vctr"
attributes(c(x, y))
#> No common type for `..1` <test_class> and `..2` <test_class>. Created on 2019-10-24 by the reprex package (v0.3.0) Note how this does not work if the attributes have different values. For example, I had to overwrite If we can come up with a general rule for combining attributes with different values, I can probably make it work. I can add a
I was talking about the per-value attributes (e.g. some info on each taxon). I had not considered per-vector/object attributes before. I am fine with adding either, I was just saying that per-value attributes might be better handled by columns in a table containing a
That was the strategy I was going to suggest when a specific attribute needs a specific way of being combined. Either way, making the
Good point, I should have noticed that. Perhaps we should change its name back to
The instances are the index of a taxon in the taxonomy. In the
We would use the algorithm from library(taxa)
#>
#> Attaching package: 'taxa'
#> The following object is masked from 'package:base':
#>
#> %in%
x <- taxize::classification(c(129313, 129310), db = 'itis')
x
#> $`129313`
#> name rank id
#> 1 Animalia kingdom 202423
#> 2 Bilateria subkingdom 914154
#> 3 Protostomia infrakingdom 914155
#> 4 Ecdysozoa superphylum 914158
#> 5 Arthropoda phylum 82696
#> 6 Hexapoda subphylum 563886
#> 7 Insecta class 99208
#> 8 Pterygota subclass 100500
#> 9 Neoptera infraclass 563890
#> 10 Holometabola superorder 914213
#> 11 Diptera order 118831
#> 12 Nematocera suborder 118832
#> 13 Culicomorpha infraorder 125808
#> 14 Chironomidae family 127917
#> 15 Chironominae subfamily 129228
#> 16 Chironomini tribe 129229
#> 17 Chironomus genus 129254
#> 18 Chironomus riparius species 129313
#>
#> $`129310`
#> name rank id
#> 1 Animalia kingdom 202423
#> 2 Bilateria subkingdom 914154
#> 3 Protostomia infrakingdom 914155
#> 4 Ecdysozoa superphylum 914158
#> 5 Arthropoda phylum 82696
#> 6 Hexapoda subphylum 563886
#> 7 Insecta class 99208
#> 8 Pterygota subclass 100500
#> 9 Neoptera infraclass 563890
#> 10 Holometabola superorder 914213
#> 11 Diptera order 118831
#> 12 Nematocera suborder 118832
#> 13 Culicomorpha infraorder 125808
#> 14 Chironomidae family 127917
#> 15 Chironominae subfamily 129228
#> 16 Chironomini tribe 129229
#> 17 Chironomus genus 129254
#> 18 Chironomus prior species 129310
#>
#> attr(,"class")
#> [1] "classification"
#> attr(,"db")
#> [1] "itis"
y <- lapply(x, function(y) taxon_name(name = y$name, rank = y$rank, id = y$id))
y
#> $`129313`
#> <taxon_name[18]>
#> [1] 202423|Animalia|kingdom 914154|Bilateria|subkingdom
#> [3] 914155|Protostomia|infrakingdom 914158|Ecdysozoa|superphylum
#> [5] 82696|Arthropoda|phylum 563886|Hexapoda|subphylum
#> [7] 99208|Insecta|class 100500|Pterygota|subclass
#> [9] 563890|Neoptera|infraclass 914213|Holometabola|superorder
#> [11] 118831|Diptera|order 118832|Nematocera|suborder
#> [13] 125808|Culicomorpha|infraorder 127917|Chironomidae|family
#> [15] 129228|Chironominae|subfamily 129229|Chironomini|tribe
#> [17] 129254|Chironomus|genus 129313|Chironomus riparius|species
#> Rank levels: kingdom < subkingdom < infrakingdom = superphylum < phylum < subphylum < class < subclass < infraclass < superorder < order < suborder < infraorder < family < subfamily < tribe < genus < species
#>
#> $`129310`
#> <taxon_name[18]>
#> [1] 202423|Animalia|kingdom 914154|Bilateria|subkingdom
#> [3] 914155|Protostomia|infrakingdom 914158|Ecdysozoa|superphylum
#> [5] 82696|Arthropoda|phylum 563886|Hexapoda|subphylum
#> [7] 99208|Insecta|class 100500|Pterygota|subclass
#> [9] 563890|Neoptera|infraclass 914213|Holometabola|superorder
#> [11] 118831|Diptera|order 118832|Nematocera|suborder
#> [13] 125808|Culicomorpha|infraorder 127917|Chironomidae|family
#> [15] 129228|Chironominae|subfamily 129229|Chironomini|tribe
#> [17] 129254|Chironomus|genus 129310|Chironomus prior|species
#> Rank levels: kingdom < subkingdom < infrakingdom = superphylum < phylum < subphylum < class < subclass < infraclass < superorder < order < suborder < infraorder < family < subfamily < tribe < genus < species
my_classification <- alternate_constructor(y)
#> Error in alternate_constructor(y): could not find function "alternate_constructor" Created on 2019-10-24 by the reprex package (v0.3.0)
Yea, will have to do something about that. Either change the prefix for all the |
Just looked at the
library(taxa)
#>
#> Attaching package: 'taxa'
#> The following object is masked from 'package:base':
#>
#> %in%
library(taxize)
#>
#> Attaching package: 'taxize'
#> The following objects are masked from 'package:taxa':
#>
#> classification, tax_name, tax_rank
library(tibble)
x = get_tsn(c("Chironomus riparius","Quercus douglasii"))
#> ══ 2 queries ═══════════════
#>
#> Retrieving data for taxon 'Chironomus riparius'
#> ✔ Found: Chironomus riparius
#>
#> Retrieving data for taxon 'Quercus douglasii'
#> ✔ Found: Quercus douglasii
#> ══ Results ═════════════════
#>
#> ● Total: 2
#> ● Found: 2
#> ● Not Found: 0
tibble(id = taxon_id(as.character(x), db = 'itis'), !!! attributes(x))
#> # A tibble: 2 x 6
#> id class match multiple_matches pattern_match uri
#> <tax_id> <chr> <chr> <lgl> <lgl> <chr>
#> 1 129313 (itis) tsn found FALSE FALSE https://www.iti…
#> 2 19322 (itis) tsn found FALSE FALSE https://www.iti… Created on 2019-10-24 by the reprex package (v0.3.0)
library(taxa)
#>
#> Attaching package: 'taxa'
#> The following object is masked from 'package:base':
#>
#> %in%
library(taxize)
#>
#> Attaching package: 'taxize'
#> The following objects are masked from 'package:taxa':
#>
#> classification, tax_name, tax_rank
library(tibble)
library(vctrs)
# Constructors
new_my_class <- function(id = character(), db = taxon_db(), match = match, .names = NULL) {
vctrs::new_rcrd(list(.names = .names, id = id, db = db, match = match),
.names_set = FALSE, # needed for taxon_id methods to work
class = c("taxize_my_class", 'taxa_taxon_id'))
}
my_class <- function(id = character(), db = NA, match = NA, .names = NULL) {
if (is.null(.names)) {
.names <- NA_character_
}
.names <- vctrs::vec_cast(.names, character())
id <- vctrs::vec_cast(id, character())
db <- vctrs::vec_cast(db, taxon_db())
match <- vctrs::vec_cast(match, character())
c(id, db, match, .names) %<-% vctrs::vec_recycle_common(id, db, match, .names)
new_my_class(.names = .names, id = id, db = db, match = match)
}
# S3 coercion functions
vec_ptype2.taxize_my_class <- function(x, y, ...) UseMethod("vec_ptype2.taxize_my_class", y)
vec_ptype2.taxize_my_class.default <- function(x, y, ..., x_arg = "", y_arg = "") {
vctrs::stop_incompatible_type(x, y, x_arg = x_arg, y_arg = y_arg)
}
vec_ptype2.taxize_my_class.vctrs_unspecified <- function(x, y, ...) x
vec_ptype2.taxize_my_class.taxize_my_class <- function(x, y, ...) my_class()
x = get_tsn(c("Chironomus riparius","Quercus douglasii"))
#> ══ 2 queries ═══════════════
#>
#> Retrieving data for taxon 'Chironomus riparius'
#> ✔ Found: Chironomus riparius
#>
#> Retrieving data for taxon 'Quercus douglasii'
#> ✔ Found: Quercus douglasii
#> ══ Results ═════════════════
#>
#> ● Total: 2
#> ● Found: 2
#> ● Not Found: 0
x
#> [1] "129313" "19322"
#> attr(,"class")
#> [1] "tsn"
#> attr(,"match")
#> [1] "found" "found"
#> attr(,"multiple_matches")
#> [1] FALSE FALSE
#> attr(,"pattern_match")
#> [1] FALSE FALSE
#> attr(,"uri")
#> [1] "https://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=129313"
#> [2] "https://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=19322"
y = my_class(as.character(x), db = 'itis', match = attr(x, 'match'))
y
#> <taxon_id[2]>
#> [1] 129313 (itis) 19322 (itis)
class(y)
#> [1] "taxize_my_class" "taxa_taxon_id" "vctrs_rcrd" "vctrs_vctr"
is_taxon_id(y)
#> [1] TRUE
vctrs::fields(y)
#> [1] ".names" "id" "db" "match"
vctrs::field(y, 'match')
#> [1] "found" "found"
vctrs::field(y[1], 'match')
#> [1] "found"
vctrs::field(c(y, y), 'match')
#> [1] "found" "found" "found" "found" Created on 2019-10-24 by the reprex package (v0.3.0)
|
Sorry about the long delay on this @zachary-foster - had a bunch of new versions of other pkgs to get out. Thanks for the good brainstorming. I'll play around with these and get back to you. My gut reaction is that I don't want to return tables from get_* fxns. It just doesn't seem like the right data structure, but maybe after trying it i'll change my mind. |
No problem, I have been busy with traveling anyway. I played around with this some more, and I think option 2 would probably be the most similar to how |
sounds good - i'll have a look at option 2 |
Sorry about another long delay on this @zachary-foster - Are you still planning to move to using |
Sorry for the delayed response. I have been on vacation after defending my PhD and was not checking my email much. I am back to work now. Yea, I am still trying to get |
Hi @sckott, I have made some good progress of the
All the classes besides |
Sounds good - did you want to do a call? |
Yea, that would be good. Does zoom work for you? |
Yeah, what day/time? |
@zachary-foster Tried using |
Yea, I think you could extend new_taxize_taxon <- function(.names = NULL, name = character(), rank = taxon_rank(), id = taxon_id(), auth = taxon_authority(), new_col = character(), ...) {
# Set names to NA if not set
if (is.null(names) || all(is.na(.names))) {
.names_set <- FALSE
.names <- vctrs::vec_recycle(NA_character_, length(name))
} else {
.names_set <- TRUE
vctrs::vec_assert(.names, ptype = character())
}
# Check that values are the correct type
vctrs::vec_assert(name, ptype = character())
# vctrs::vec_assert(rank, ptype = taxon_rank())
vctrs::vec_assert(id, ptype = taxon_id())
vctrs::vec_assert(auth, ptype = taxon_authority())
vctrs::vec_assert(new_col, ptype = character())
# Create new object
vctrs::new_rcrd(list(.names = .names, name = name, rank = rank, id = id, auth = auth, new_col = new_col),
.names_set = .names_set,
...,
class = c("taxize_test", "taxa_taxon"))
}
taxize_taxon <- function(name = character(0), rank = NA, id = NA, auth = NA, .names = NA, new_col = NA, ...) {
# Cast inputs to correct values
name <- vctrs::vec_cast(name, character())
rank <- vctrs::vec_cast(rank, taxon_rank())
id <- vctrs::vec_cast(id, taxon_id())
auth <- vctrs::vec_cast(auth, taxon_authority())
new_col <- vctrs::vec_cast(new_col, character())
.names <- vctrs::vec_cast(.names, character())
# Recycle ranks and databases to common length
recycled <- vctrs::vec_recycle_common(name, rank, id, auth, new_col, .names)
name <- recycled[[1]]
rank <- recycled[[2]]
id <- recycled[[3]]
auth <- recycled[[4]]
new_col <- recycled[[5]]
.names <- recycled[[6]]
# Create taxon object
new_taxize_taxon(.names = .names, name = name, rank = rank, id = id, auth = auth, new_col = new_col, ...)
}
vec_cast.taxize_test <- function(x, to, ..., x_arg, to_arg) UseMethod("vec_cast.taxize_test")
vec_cast.taxize_test.default <- function(x, to, ..., x_arg, to_arg) vctrs::vec_default_cast(x, to, x_arg, to_arg)
vec_cast.taxize_test.taxize_test <- function(x, to, ..., x_arg, to_arg) x
vec_cast.taxize_test.character <- function(x, to, ..., x_arg, to_arg) taxon(x)
vec_cast.character.taxize_test <- function(x, to, ..., x_arg, to_arg) as.character(vctrs::field(x, "name"))
x <- taxize_test(name = c('Homo sapiens', 'Bacillus', 'Ascomycota', 'Ericaceae'),
rank = c('species', 'genus', 'phylum', 'family'),
id = taxon_id(c('9606', '1386', '4890', '4345'), db = 'ncbi'),
auth = c('Linnaeus, 1758', 'Cohn 1872', NA, 'Juss., 1789'),
new_col = c('A', 'B', 'C', 'D'))
names(x) <- c('a', 'b', 'c', 'd')
vctrs::field(x[2:3], 'new_col') |
thanks @zachary-foster - i;ll try that |
I've used your example code above and built on it. work on this branch https://github.com/ropensci/taxize/compare/taxa-work Some thoughts:
|
Good find. I fixed it so now
That should be fixed now
I am not sure I understand. Even if a taxon ID is not found is |
thanks for the fixes. for the names applied via There seems to be an error in the print method when there is an NA, e.g., (or maybe that is correct, expected behavior) x <- taxon(c('A', 'B', 'C'), .names = c('d', 'e', NA_character_))
x
#> <taxon[3]>
#> Error: If any elements are named, all elements must be named. rlang::last_error()
<error/rlang_error>
If any elements are named, all elements must be named.
Backtrace:
1. (function (x, ...) ...
2. vctrs:::print.vctrs_vctr(x)
3. vctrs::obj_print(x, ...)
5. taxa:::obj_print_data.taxa_taxon(x, ...)
6. taxa:::printed_taxon(x, color = TRUE)
7. taxa:::font_tax_name(x)
12. taxa:::tax_rank.taxa_taxon(text)
13. taxa:::named_field(x, "rank")
15. vctrs:::`names<-.vctrs_vctr`(`*tmp*`, value = names(x))
Run `rlang::last_trace()` to see the full context. |
How about naming the output by the input, assuming the input is one-to-one with the output and is a character vector? That error seems to be due to I fixed the issue for |
Thanks for the fix. Could name by the inputs, might try that |
@zachary-foster do you need taxize anymore in taxa? its' in Imports, but I don't see any use of taxize in taxa vectorize branch. I need to have taxa in Imports in taxize now - and we can't have each others pkgs both in Imports. thoughts? |
weird, even after removing taxize from Imports in taxa, i keep getting a circular dependency error when i run check on taxize (with taxa in Imports) - i can't find taxize in taxa anywhere, not sure what's going on. it's clearly taxa since if I remove taxa from Imports, then check runs fine |
Yea, we should remove the taxize dependency. I will try to remove the dependency and see if I get the same problem. |
thanks for having a look - relevant taxize branch i'm working on is |
I got the same problem. It looks like R CMD check is looking up the dependencies from CRAN oddly enough. I tried the check with my internet turned off and got this error:
So I guess its only checking the version on CRAN? I saw you added:
So Im not sure. Could it be a bug in devtools or R CMD check? |
Weird, good catch. I am using R CMD CHECK on the command line so there's no devtools or remotes pkgs involved AFAIK. Tried it with internet off too and now it doesn't throw that circular dependency error - internet back on and it does. |
|
Ok, so it must be relying on the CRAN version to check dependencies. Thats annoying, but I suppose the problem will go away once taxa on CRAN is updated. Another option would be to rename taxa to taxa2 or something. I was wondering whether that would be a good idea to do anyway, so that after taxa is updated, published code that uses taxa (mostly papers using metacoder) will still work. About 100 papers have cited metacoder; I am not sure how many of those have published code though. |
The changes in taxa are quite large, so it might warrant a new name, or at least a major version bump if the same name. I'll leave it up to you. I guess folks using the current taxa or older versions would just need to avoid updating to this new version if you keep the same name. |
I have been thinking about how to make all of the
taxa
classes most useful and I think I have come up with a way to reorganize things that will make things more natural and R-like, while increasing flexibility.Vectorizing the
TaxonName
,TaxonRank
,TaxonId
, andTaxonDatabase
classesRight now these store individual values and an associated
TaxonDatabase
object.However, if they were like other R objects, they would allow multiple values.
Currently, their primary use is to store information in
Taxon
objects, but if they could hold multiple values, then they could be used on their own more usefully.They would also be useful as column in tibbles if this PR gets accepted.
Question: do we allow multiple
TaxonDatabase
objects in a singleTaxonName
,TaxonRank
, orTaxonId
object with multiple values, or do all values have to come from the same database?Question: What happens when we combine objects from different database with
c.TaxonName
?We could allow to hold multiple values and behave like a factor.
Reorganizing inheritance
Until a recent changes in the eval branch, the class inheritance looked like this:
I think it would be much more elegant code-wise, to do this:
Taxonomy
has most of the important/complicated methods andTaxa
has most of the important getters/setters. In this setup,Hierarchies
would be the same asTaxonomy
internally, but have different getters/setters and print method to make it appear 1-dimentional. This would be easier for users to understand, but also preserve the performance improvements ofTaxonomy
. Also things likefilter_taxa(taxon_ranks != "family")
would work onHierarchies
objects without modification to the code forfilter_taxa
.Hierarchies
would be particularly useful as a column in atibble
if this PR gets accepted.Removing/hiding non-vetorized classes
We could remove/hide the
Hierarchy
andTaxon
class in favor of theTaxa
andHierarchies
classes, since the plural forms can do everything the singular forms can. We can also rename the plural form to the singular to be more in R style (character
is notcharacters
), so the inheritance hierarchy would become:Multiple names/ranks/ids per taxon
@kamapu pointed out that
taxa
does not support synonyms.The
Taxmap
is flexible enough that it can encode synonyms, but the other classes cannot.If we allow multiple
TaxonName
,TaxonRank
, andTaxonId
objects, inTaxon
objects (or each item in aTaxa
object), then that would allow for disagreements among multiple sources of taxonomic information (same taxon from multiple databases). We would then need to define methods like==.Taxon
/==.Taxa
andas.character.Taxon
to account for this, but that is doable.Whew, that was a lot of stuff. I put it all in one issues since many of the changes are related and dependent on eachother, but if we decide to go this route, then I will split it up into multiple issues and make a milestone.
I would appreciate feedback from anyone who sees this, especially @sckott.
The text was updated successfully, but these errors were encountered: