-
Notifications
You must be signed in to change notification settings - Fork 12
Description
First I want to say thank you for this package, I'm working on some metagenomic data with lots of 'unusual' taxa, and trying to find a good (accessible) database to get a quick summary of characteristics of these has been surprisingly difficult.
This package saved me a lot of headaches trying 'manually' parse the API search results myself
I have neither a bug nor feature request, rather just some info which might be useful for others.
You can use a sequence of tidyverse tools convert the results from the BacDiveR::retrieve_data() function to a clean(ish) table format using the following code:
## get some search results
data_bacdive_raw <- BacDiveR::retrieve_data("Fusobacterium", searchType = "taxon")
## convert list of lists to tibble
data_bacdive_tib <- data_bacdive_raw %>%
unlist() %>%
bind_rows() %>%
gather(grouped_category, value, 1:ncol(.)) %>%
separate(grouped_category, sep = "\\.", into = c("bacdive_id", "section", "subsection", "field", "key")) %>%
distinct()
#>Warning message:
#>Expected 4 pieces. Missing pieces filled with `NA` in 144 rows [72, 73, 74, 75, 76, 77, 156, 157, 158, 159, 160, 161, 250, 251, 252, 253, 254, 255, 461, 462, ...].
## print final table
data_bacdive_tib
#># A tibble: 18,555 x 6
#> bacdive_id section subsection field key value
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2654 taxonomy_name strains_tax_PNU species_epithet NA mortiferum
#> 2 2654 taxonomy_name strains_tax_PNU subspecies_epithet NA NA
#> 3 2654 taxonomy_name strains_tax_PNU is_type_strain NA FALSE
#> 4 2654 taxonomy_name strains_tax_PNU domain NA Bacteria
#> 5 2654 taxonomy_name strains_tax_PNU phylum NA Fusobacteria
#> 6 2654 taxonomy_name strains_tax_PNU class NA Fusobacteriia
#> 7 2654 taxonomy_name strains_tax_PNU ordo NA NA
#> 8 2654 taxonomy_name strains_tax_PNU family NA Fusobacteriaceae
#> 9 2654 taxonomy_name strains_tax_PNU status_fam NA NA
#>10 2654 taxonomy_name strains_tax_PNU genus NA Fusobacterium As far as I can see with the table from the search above the only issue is the references field is not correctly formatted (being placed in the subsection rather than field column - thus the 'NA' messages), because in the original results it is a dataframe rather than a list itself.
This worked for me using BacDiveR_0.7.0