Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make training happen on digital ocean #41

Open
josh-chamberlain opened this issue Feb 27, 2024 · 4 comments
Open

make training happen on digital ocean #41

josh-chamberlain opened this issue Feb 27, 2024 · 4 comments

Comments

@josh-chamberlain
Copy link
Contributor

We have a training dataset and stuff in hugging face. We should use cloud computing resources for the training.

@bonjarlow bonjarlow assigned bonjarlow and unassigned bonjarlow Feb 27, 2024
@mbodeantor
Copy link
Contributor

We have droplet already provisioned for the data-sources-mirror which I'm happy to share the key for experimentation for this purpose. Not sure if this will be sufficient, we should discuss if not.

@maxachis maxachis mentioned this issue Mar 5, 2024
7 tasks
@maxachis
Copy link
Collaborator

@josh-chamberlain @mbodeantor Would we want to manually trigger this training, set up a cron job to have it occur at regular intervals, or both?

Additionally, which components of the pipeline would we want to use in training? All parts of the pipeline, or only some?

@maxachis
Copy link
Collaborator

maxachis commented Mar 21, 2024

We have droplet already provisioned for the data-sources-mirror which I'm happy to share the key for experimentation for this purpose. Not sure if this will be sufficient, we should discuss if not.

@mbodeantor @josh-chamberlain Looking at the graphs for the droplet, I note that the droplet tends to have a lot of downtime with CPU resource under-utilization, with brief bursts of near-100% CPU activity. From the standpoint of CPU alone, that would be promising.

However, I would note that the droplet has the following limitations:

  1. 512 MB memory (with average usage hovering around 65%)
  2. 10 GB Disk (with average usage hovering around 42%)

The two of those, and the memory especially, are probably not enough for training. At best, it will make training take quite a bit of time. At worst, the code just might fail. And even in the best case scenario, we'd have to think about how design the new activity so it doesn't interfere with the existing functionality of data-sources-mirror.

It'd probably be easier and more viable to have training occur on a droplet specifically provisioned for training.

@josh-chamberlain
Copy link
Contributor Author

@maxachis OK, let's provision a droplet. We should start with an entry-level one, since they appear to be easily resizeable, and scale up if we need to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

4 participants