Drop MySQL dependency (so no centralized db server) #13

johrstrom · 2021-11-08T16:20:18Z

[Taken from the UCR repo]

Can switch to sqlite3. Could also stop using genes.tar.db and store file content in gdbm if the key is the path, as long as you add the path information to a files table associated with each Species. There is actually a files table in the schema but it is not yet in use.

Could also explore using geojson, 1 per species, which would result in 87000 files instead of 200000+ files. Though this shifts complexity to string extraction prior to aligning and then adding back. Also I'm not aware of an embedded database that lets you work with geojson unless you used sqlite3 and added a table that had a column with geojson data. Finally, even though we want to do a search to find all the species, we still have to generate an Occurrence file with only the occurrences included in the search, so if there was a search box around Australia we would omit occurrences in North America.

But... MongoDB does support geojson. You could opt for using one solution for the pipeline and another solution for the query. There is a lot of freedom here.

But again the simplest change is to sqlite3.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drop MySQL dependency (so no centralized db server) #13

Drop MySQL dependency (so no centralized db server) #13

johrstrom commented Nov 8, 2021

Drop MySQL dependency (so no centralized db server) #13

Drop MySQL dependency (so no centralized db server) #13

Comments

johrstrom commented Nov 8, 2021