-
-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Choose object storage provider for haikuports #141
Comments
My preference:
|
I've reached out to wasabi to try and get "actual" bandwidth utilization numbers. They don't publish it in our portal (but I sure as hell know they look at it since they have cut us off before due to egress) |
Why do you have Backblaze as "20-25 a month"? If we factor in the CDN with free egress then shouldn't it be storage costs only, and thus be equivalent to Wasabi + CDN? |
"Free egress up to 3x their average monthly storage amount. Egress over average stored is $0.01/GiB. EDIT: I did that math wrong. Lets re-run the cost numbers. Assuming 6TiB egress, and 400GiB storage. Backblaze:
Storj:
Wasabi:
Telnyx:
Backblaze + bunny.net CDN seems like the best deal tbh with controlled risk. The Bunny.net CDN could cut that 6TiB way down to a "a few TiB or less" on all providers, but it's an unknown how efficient their caching is in our use-case |
EDIT - Actual worst-case egress bandwidth numbers:
|
For me, while I think reliability is important, it is not the end of the world if we get cut off and need to relocate. However, how do we keep control of our packages? I.e. is there going to be a backup or a primary source for them? |
The nice thing about s3 is it actually gets easier to back things up. Today we have the automatic "compress all the artifacts, encrypt them, and upload to an s3 bucket" backup system. That doesn't work for huge things though since I really don't want to work with 300GiB tar delta's 😅 In the model where some object storage provider is the source of truth, we really just need to rclone the bucket "somewhere" else. Historically i've just rcloned to a dedicated bit of local storage at my house as a cold backup (you could do the same). rclone works off of deltas like rsync, so it's bandwidth consumption friendly after the initial clone. rclone also lets you sync between storage providers... and it supports a TON We actually have an rclone container today ready to go that will do that to storj. We can make some fixes though to make it more generic. I also have rclonefs which will (theoretically) let us mount s3 buckets as fuse storage mounts on each k8s node so we can (theoretically) offer s3 buckets over rsync to mirrors from pods running on any k8s node. (fuse in k8s is weird though, and we need elevated security context).
Agree. Definitely the biggest pain point of object storage. I really like the pricing of Telnyx, but the whole "per million API hits" thing makes me nervous on something complex and large like haikuports.
Agree. Lets strike DO off the list. They had some appealing things to them, but needing a whole gaggle of buckets to groom to get reasonable pricing is too much lift. I'm tired of forming infrastructure "around" providers weird limitations. |
I updated #141 (comment) with the pricing based on the actual worst case bandwidth numbers I saw on digital ocean. |
Oh, and I just looked at the Wasabi bill.. it does list "908.40 API requests" for the month. I'm guessing that's 1000's though given the decimal point.. so 908,400 makes more sense. |
Looks like the preferred is backblaze + bunny then? |
Agree. I think backblaze + bunny are going to be the cheapest combo. Bunny will cut down the xfer 50%, so that $30 / month should be "worst case" |
Ryan went ahead and entered our billing info. I went ahead and deployed a temporary VM @ digital ocean to use to shovel artifacts over to backblaze. I'm going to start with the Haiku repos themselves since it's an easy (smaller) test of data before moving on to haikuports. |
That's not great and definitely false advertising... |
I went ahead and put the haiku repo over onto backblaze. We already blew past the "free tier" of class C api calls during the last sync. 😮💨 I'm about to head out of town and will be back Sunday.. so here are important facts:
If the 💩 hits the fan, you can take the following actions to undo the migration to backblaze: |
Other possible alternatives (not mutually exclusive):
|
I'm not super happy with backblaze. The pricing is ok, but we ran into a few weird gotchas which are annoying enough to make me want to swerve:
Given the above, it's kinda soured my opinions on B2. On the bright side, since it's just S3 we can move easily. With Backblaze crossed off, I looked to https://www.s3compare.io/ to find which provider is cheapest.
Of course, we could go colo / dedicated and run our own S3 server if we're looking to be cheap. Since we want to offer rsync mirroring this might be a good compromise. I just want to keep our business logic away from our storage servers. |
I can check into this. My only fear is we use quite a bit of space since we're a "Whole OS". Digital Ocean only offers ~50GiB of "free" Object Storage, and places a high valuation on it when you scale up towards 1TiB.
Yeah, i'm honestly not against this at this point. We could run our own S3 and rsync server on a dedicated / colo box. One of my goals has been to keep "business logic" and "data" separate. Previous historic infrastructure designs have mixed mission critical services with storage which means when things go wrong they go WRONG. It's nice to have the s3 API level of separation between things if we do have a dedicated server / colo system. |
With haikuporter's support of s3, we need to choose a object storage provider.
For context, this will be replacing our Digital Ocean volume block attachment which is $25 month / 250GiB
Assuming ~400GiB stored... 2TiB of egress a month (which gives us a lot of head room)
Assuming 35 million API ops a month (17M Class A, 17M Class B)
$16-35 / month likely as we grow. Risk of pulling too much egress and getting shut off.
Likely $11-20 / month, $ per API operations a big risk. Haikuporter, hpkgbouncer, all hit APIs
Likely $12-20 / month
$16-24 / month
Digital Ocean Spaces -We don't like having to have multiple buckets to get reasonable pricing.~$23 per month for 400GiB + 2TiB xfer
Notes: We don't have to go all-in on a single S3 provider. Haiku can remain at wasabi, haikuports can be "where ever". We can run one deployment of hpkgbouncer per repo.
The text was updated successfully, but these errors were encountered: