-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: Using sqlalchemy-file (or similar) ? #2691
Comments
https://github.com/jowilf/sqlalchemy-file (probably more relevant project) |
/kind question |
Is the idea here to implement a more generic file storage adapter to be able to eventually migrate documents from S3 to another object storage service?
Is there any evidence to support this? I don't know if referring to documents through SQL would drastically improve the efficiency of document storage and retrieval vs. the implementation time and effort it could require. |
Not exactly, at least, it's not the main benefits from my POV. Rather, From my understanding, the idea behind sqlalchemy-media is to basically class SomeModel(Base):
document = FileField() # accessed like a FileObject
date = DateField
doc_type = EnumSomething The ORM plugin handle the machinery (= storing a reference to the object storage
Yeah, I agree. It was more meant as a illustration of the idea.
I was mainly thinking about how we handle metadata. Another random thing: Do we have an average size for the documents (I'm currently searching through |
Thanks for the example, this is more clear to me now.
Ok, I see how that could be useful. However I think the current implementation of models might be clear enough and I am not sure if mixing up file storage handling and tables implementation is a good idea readability-wise. @harshad16 wdyt? |
I am not sure if mixing up file storage handling and tables
implementation is a good idea readability-wise.
Should we implement this, storages should not do any file storage
handling at all, the plugin should handle it.
If it (=handling file storage) bleeds into storages code, that would
just be the same (but differently ^) than now with more dependencies, so
yeah that would be useless.
|
It would be useful (and not too time-consuming I guess) to have an estimation for that, but from memory, I know that some documents take up a lot of memory and it would be inconvenient to store them directly in the database via JSONB. In this case, I don't think we should start storing any document directly in the database. |
It would be useful (and not too time-consuming I guess) to have an
estimation for that
I'll open an issue in metrics-exporter (unless maybe it's already
there) to see if we can come up with something (probably).
|
This is a question/discussion/maybe proposal on the Thoth storage model.
I think refuting it could help better understand the current storage model, and
hence document it (#2661).
Why do we not use something like sqlalchemy-file
-> ie, an extension to the sql-alchemy ORM to store files in "file storage"
(including S3) and reference them in the SQL database ?
The question arose from working on thoth-station/document-sync-job#37
and #2674. From what I can tell, we use ad-hoc structuration of the S3
documents, by prefix + by date, and we have a important number of metadata (->
thoth/storages/result_schema.py).
That looks like data which would be more efficiently handled in SQL (the date
stuff, particularly).
Unifying (aka, single source of truth) also looks like some features would
become simpler (such as #2657 with judicious cascading deletion).
I still don't grok fully the current storage model, so it's entirely possible
that I'm completely off-base here. I think articulating the reasons why would
really help #2661.
I didn't find a reasoning browsing through past issues and PR. If there is one,
please point me to it 👍 !
Thanks
/sig stack-guidance
EDIT: changed prospective "file field" project. Although it's really knew, it
reuses Apache libcloud and seems more suited to the purpose.
The text was updated successfully, but these errors were encountered: