Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop MySQL dependency (so no centralized db server) #13

Open
johrstrom opened this issue Nov 8, 2021 · 0 comments
Open

Drop MySQL dependency (so no centralized db server) #13

johrstrom opened this issue Nov 8, 2021 · 0 comments

Comments

@johrstrom
Copy link
Contributor

[Taken from the UCR repo]

Can switch to sqlite3. Could also stop using genes.tar.db and store file content in gdbm if the key is the path, as long as you add the path information to a files table associated with each Species. There is actually a files table in the schema but it is not yet in use.

Could also explore using geojson, 1 per species, which would result in 87000 files instead of 200000+ files. Though this shifts complexity to string extraction prior to aligning and then adding back. Also I'm not aware of an embedded database that lets you work with geojson unless you used sqlite3 and added a table that had a column with geojson data. Finally, even though we want to do a search to find all the species, we still have to generate an Occurrence file with only the occurrences included in the search, so if there was a search box around Australia we would omit occurrences in North America.

But... MongoDB does support geojson. You could opt for using one solution for the pipeline and another solution for the query. There is a lot of freedom here.

But again the simplest change is to sqlite3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant