Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS S3 or OCI objectstore backends? #302

Open
hrstoyanov opened this issue Mar 5, 2025 · 6 comments
Open

AWS S3 or OCI objectstore backends? #302

hrstoyanov opened this issue Mar 5, 2025 · 6 comments

Comments

@hrstoyanov
Copy link

hrstoyanov commented Mar 5, 2025

Can I store emails in AWS S3 or Oracle OCI object store? Not just backup, but use as primary storage rather than files.

@mjl-
Copy link
Owner

mjl- commented Mar 5, 2025

Hi @hrstoyanov, thanks for creating the issue.

It isn't currently possible to store emails externally. It's only in the local file system at the moment.
I suppose you would be interested in this so you can scale to large mailboxes with many/large messages?

Something to keep in mind: Mox also keeps a message index database, where it stores metadata about the message (that are not in the message file), including the mailbox it is stored in, and message flags (like junk/notjunk). These index databases (one per account) are essential to normal operation. The message index databases aren't too large, so you can still keep them local.

With remote file storage, I think we would also need a local cache of recent message files. Otherwise many operations (like webmail showing messages in a mailbox), may become slow. Reading a remote file is much higher latency than just reading a local file.

This probably won't be on my shortlist any time soon, but it seems like a useful feature to get in the future. If anyone is interested on working on this, I can give pointers.

@hrstoyanov
Copy link
Author

Thanks @mjl- .
I agree, there might be some need for storing indexes/catching to speed up operations, but the massive advantages of directly storing in S3 (or similar data store like Seph) are:

  • no need for backups
  • no need for running redundant email servers, S3 is highly available already with redundant copies.
  • import/export to S3 solves the migration problem.
  • other apps can be developed for processing the a cumulated emails directly off S3 (training AI/LLM?)

For inspiration, take a look at EclipseStore - an ultra fast database engine that can store data in several objectstore providers (S3, Oracle, Google, Azure) as well as files.

@mjl-
Copy link
Owner

mjl- commented Mar 6, 2025

The redundancy and scalability of storage is indeed great, but it feels like it isn't a full solution to backups and redundancy.

Metadata, like which mailbox a message is in, and the message flags, are still important, and still need to be backed up, and would still be good to have redundancy for. The metadata is relatively small, so easier to backup than all the message files. But the message files themselves are immutable, so already relatively easy to back up. Syncing mutable data to s3 or object storage isn't what those stores are built for, and that's more like regular backups. Also, online remote storage (with files that can be removed) isn't a complete alternative for backups.

I would personally only want to store files in remote s3/cloud object storage when encrypted. Migrations and other applications could still access those files if they have keys, but it won't necessarily become all that much easier. Of course admins could choose not to encrypt files...

I think the question/feature request will come up more. Hopefully someone will come along who's interested in picking this up!

@teamsnelgrove
Copy link

I stumbled across this project while looking for an email server with a static binary and a sqlite storage backend. I was scrolling through the issues to see if someone had discussed it in the past.

For me the ideal scenario would be for both emails and metadata to be stored in the same sqlite database and replicated for disaster recovery using Litestream to any blob storage provider or SFTP. Sqlite can certainly handle storing email content just fine. The major downside being you probably cut out a larger ecosystem of tools that work directly off email files. I haven't had a chance to read through the persistence layer to know how relevant that might be in this context.

Gonna poke around more to learn more about the project. So far looks really promising!

@mjl-
Copy link
Owner

mjl- commented Mar 20, 2025

The initial mox code used sqlite. I switched to boltdb/bstore because I didn't want cgo in mox. Nowadays, there is a pure go transpilation of sqlite, but I don't know how well it works in practice. I also quite like the bstore interface, providing some type checking, and not needing any (untyped) sql strings in the code base. Of course, sqlite is more powerful.

I have thought about having messages in the database. I am feeling a bit uneasy about that due to random reading messages. I don't know, but suspect you can't do random reads from a field of a row in an sqlite database. The common case is likely to read messages in one go, but it is currently possible over IMAP and the webmail to read just one attachment somewhere in the middle/end of a large email file without having to read through the whole message (because we keep the parsed mime structure of the (immutable) email file in the database). If you know that it is possible to do random reading of a field (email message file blob) in a database, that would be good to know!

I have been thinking about replication of the database and the email files. The boltdb databases are very simple. The operations they need are essentially: 1. Write a block; 2. Sync data to disk. If we can hook into those operations, and keep them in a local log and stream them to a backup server, we have live replication. The message files would have to be synced as well. Should be a matter of wrapping all file operations in mox, and ensuring all operations are sent to the remote server too. It could be done with sftp (to any backup server), or with a (hopefully simple) protocol that a remote mox speaks. The latest data could be restored from sftp in case of disaster. With a custom protocol the backup could be a mox server that we can failover to (and it would be good to have such a server, also for an additional outgoing IP for deliveries in case of reputation issues with the first). Anyway, this is quite a bit of work, and there are many more higher prio things to work on, so don't expect this anytime soon. Thoughts/discussions welcome of course.

About not being able to use a large ecosystem of tools on email messages: Mox also doesn't give users admin to mbox or maildirs. It makes it much harder to keep track of mailbox state if external applications make changes. You'll have to go through IMAP. Or you can export/import messages as mbox/maildir.

@hrstoyanov
Copy link
Author

fyi, these daysLimbo is the peferred SqlLite store with replication. This would allow MOX to doffer new capabilities, like text-similarity/vector search and more, perhaps via the web console. Also, allow 3rd party tools to ingest the emails for various AI reasons.

.. But I agree using SQLLite and/or cloud store providerds is a project on its own, and given MOX is a one-man effort, it is too much to ask for.

Also, using fault-tolarenat distributed file system (many cloud vendors offer Ceph) should be easy even now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants