Skip to content

Converting retrieve_data() results to a data frame (tibble) #100

@jfy133

Description

@jfy133

First I want to say thank you for this package, I'm working on some metagenomic data with lots of 'unusual' taxa, and trying to find a good (accessible) database to get a quick summary of characteristics of these has been surprisingly difficult.

This package saved me a lot of headaches trying 'manually' parse the API search results myself

I have neither a bug nor feature request, rather just some info which might be useful for others.

You can use a sequence of tidyverse tools convert the results from the BacDiveR::retrieve_data() function to a clean(ish) table format using the following code:

## get some search results
data_bacdive_raw <- BacDiveR::retrieve_data("Fusobacterium", searchType = "taxon")

## convert list of lists to tibble
data_bacdive_tib <- data_bacdive_raw %>% 
  unlist() %>% 
  bind_rows() %>% 
  gather(grouped_category, value, 1:ncol(.)) %>%
  separate(grouped_category, sep = "\\.", into = c("bacdive_id", "section", "subsection", "field", "key")) %>%
  distinct()

#>Warning message:
#>Expected 4 pieces. Missing pieces filled with `NA` in 144 rows [72, 73, 74, 75, 76, 77, 156, 157, 158, 159, 160, 161, 250, 251, 252, 253, 254, 255, 461, 462, ...]. 

## print final table
data_bacdive_tib

#># A tibble: 18,555 x 6
#>   bacdive_id section       subsection      field              key   value           
#>   <chr>      <chr>         <chr>           <chr>              <chr> <chr>           
#> 1 2654       taxonomy_name strains_tax_PNU species_epithet    NA    mortiferum      
#> 2 2654       taxonomy_name strains_tax_PNU subspecies_epithet NA    NA              
#> 3 2654       taxonomy_name strains_tax_PNU is_type_strain     NA    FALSE           
#> 4 2654       taxonomy_name strains_tax_PNU domain             NA    Bacteria        
#> 5 2654       taxonomy_name strains_tax_PNU phylum             NA    Fusobacteria    
#> 6 2654       taxonomy_name strains_tax_PNU class              NA    Fusobacteriia   
#> 7 2654       taxonomy_name strains_tax_PNU ordo               NA    NA              
#> 8 2654       taxonomy_name strains_tax_PNU family             NA    Fusobacteriaceae
#> 9 2654       taxonomy_name strains_tax_PNU status_fam         NA    NA              
#>10 2654       taxonomy_name strains_tax_PNU genus              NA    Fusobacterium   

As far as I can see with the table from the search above the only issue is the references field is not correctly formatted (being placed in the subsection rather than field column - thus the 'NA' messages), because in the original results it is a dataframe rather than a list itself.

This worked for me using BacDiveR_0.7.0

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions