You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This follows a conversation with @tgourdel . As of today, when you run a pipeline that needs some libraries not packaged with python, like duckdb, Amphi will download it automatically.
From my perspective, this leads to serious issyes
1/requires an internet connection during runtime, something that's always possible (imagine a sacralized server)
2/the speed of execution is affected, we lost some seconds each time
3/ another point is the version control, we don't have the hand of the library version and can break the workflow.
(however we can still keep the ability to do so, for custom components, tests, etc).
Best regards,
Simon
The text was updated successfully, but these errors were encountered:
Being able to run Amphi in environment without internet access is something that should be possible and relatively easy.
A few notes to continue the conversation:
It is possible to have a setup where Amphi can run without internet. This is today a manual process but not extremely complex. There are multiple ways to achieve this but basically, create a virtual python environment with amphi-etl package and the packages needed to run all the components such as Duckdb and a few others like sqlalchemy. Creating a Docker image is also a solution.
What's missing today is maybe an automated way to get the list of packages that are downloaded on the fly. It needs to be documented
On the speed of execution, the download happens once, the first time you execute the component if the package is not installed. Afterwards the download is not triggered of course.
We have the option to add "mandatory" packages when installing amphi-etl. This of course increase the time to install amphi and the footprint of the install on the machine. We should probably find a balance with the core libraries needed and the ones that are rarely needed and not essential. Duckdb could be added to the list. Today I decided to only have pandas only (essential to have the core components working) as you configured here:
Hello
This follows a conversation with @tgourdel . As of today, when you run a pipeline that needs some libraries not packaged with python, like duckdb, Amphi will download it automatically.
From my perspective, this leads to serious issyes
1/requires an internet connection during runtime, something that's always possible (imagine a sacralized server)
2/the speed of execution is affected, we lost some seconds each time
3/ another point is the version control, we don't have the hand of the library version and can break the workflow.
(however we can still keep the ability to do so, for custom components, tests, etc).
Best regards,
Simon
The text was updated successfully, but these errors were encountered: