Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attachment extraction too slow #448

Open
baiomys opened this issue Feb 22, 2025 · 18 comments
Open

Attachment extraction too slow #448

baiomys opened this issue Feb 22, 2025 · 18 comments

Comments

@baiomys
Copy link

baiomys commented Feb 22, 2025

Hi.

Please consider user selectable option for attachment compression and additional endpoint(s).
Thanks.

@axllent
Copy link
Owner

axllent commented Feb 22, 2025

Hi.

I don't understand the question(s). The message summary returns all Attachments & Inline filenames & sizes. This also includes the PartID which is used as the index for the attachment retrieval. How is this too slow?

@baiomys
Copy link
Author

baiomys commented Feb 22, 2025

Call to extract ANY message part via

/api/v1/message/{id}/part/{part}

takes over 7 seconds if large attachment(s) (15 megabytes) exist in message.

dual core cpu
vendor_id       : AuthenticAMD
cpu family      : 20
model           : 2
model name      : AMD G-T40N Processor
stepping        : 0
microcode       : 0x500010d
cpu MHz         : 801.352
cache size      : 512 KB

On Ryzen things are slightly better, but also not brilliant.

Message summary does return filenames and sizes, but it is not always convenient to call and parse this method from JS on page.

Sorry I mixed an issue and feature request together, my fault.

@axllent
Copy link
Owner

axllent commented Feb 22, 2025

This is really a hardware limitation as I return a 27MB attachment in 0.64s. A dual core G-T40N is an extremely weak processor from 2011, and you're not ever going to get any performance from this these days.

In order for Mailpit to get any attachment, it has to extract the entire raw message from the database, decompress that message (ZSTD), parse the message, and finally return the attachment via HTTP. In the next release of Mailpit there is a change that is using a faster ZSTD compression setting (less-compressed), but it's not going to make any noticeable difference on your laptop unfortunately. A 15MB attachment means a message of about 25-30MB in size (due to email encoding), so it's a lot of work to process a message like that.

I really don't know what to say about the "not always convenient" comment either. Email parts are stored dynamically in messages, so there is no guaranteed format or order of message parts. The API conveniently extracts all the information you need in the summary, and the Mailpit web UI uses the exact same API calls to get all the message attachments, paths, sizes, names etc. If you want to know what's in the message then you have to use the summary - it tells you exactly what you need to know to get the parts.

@baiomys
Copy link
Author

baiomys commented Feb 22, 2025

Is it possible to add an option to COMPLETELY disable compression?

RAW email message can be easily stored in database without any extra cpu consuming operations, so IMHO user should be able to choose between space and speed.

BTW most modern Linux systems use ZFS and BTRFS, which support native compression, so using it TWICE is massive overkill.

@axllent
Copy link
Owner

axllent commented Feb 22, 2025

Yes, I will consider this option, although I do not know how much difference it will make in your situation. No compression means more data in/out of the database (disk i/o) which is slower. The message still needs to be parsed and the attachment"extracted", so it may not make much difference in the end. Is your laptop using a 5400RPM spinning laptop hard drive, or an SSD?

I need to do a lot more testing and give this some more thought.

Also yes, some modern filesystems have optional compression, but I don't think they work well (or even at all) on files like SQLite databases which are continuously changing.

@baiomys
Copy link
Author

baiomys commented Feb 22, 2025

I ran gzip compression/decompression test on RAW email which was so slow to process
Bash script made 20 (!) compression/decompression cycles of entire message in 4 seconds on same CPU.
So it looks like your concept of storing and retrieving attachments is somewhat not optimal.

@axllent
Copy link
Owner

axllent commented Feb 22, 2025

I said zstd, not gzip - those are very different compression techniques, and zstd is much faster. I also don't understand the second part about the bash script with 20 cycles in 4 seconds - 20 cycles doing what?

I will say it again though, just to make it perfectly clear, even if I add an option to remove database compression altogether, you are not going to get great performance on your old hardware working with a 15MB attachment in an email. This is because of your hardware. Speaking of your hardware, you did not answer my question about what type of hard drive you are using?

@baiomys
Copy link
Author

baiomys commented Feb 22, 2025

what type of hard drive you are using

Empty database on ramfs

20 cycles doing what?

20 cycles of compressing and decompressing RAW email message
GZIP is slower than ZSTD, but it took only 200 milliseconds to process file in both directions.
So I am curious what took approximately 6800 ms (7000 - 200) to handle API request on same CPU.
=)

@axllent
Copy link
Owner

axllent commented Feb 22, 2025

Are you saying that it took 200ms to compress that raw email (with the 15MB attachment) on your machine using gzip? That is not possible unless your machine is caching the input and output, or your bash script is not working as you expected.

Ramfs will give you "disk" performance because it's all in RAM. How much RAM does that machine have, and how much usable RAM is actually available for Mailpit when you are running it?

@baiomys
Copy link
Author

baiomys commented Feb 22, 2025

Well, you were right, script results for GZIP was wrong as I took truncated file.
Sorry for confusion.

Now I downloaded ZSTD and took right file, results are:

1.5 seconds for level 1
3 seconds for level 3

raw file size 20.5M

So if we completely remove compression and all intermediate conversion procedures POSSIBLY results will be less weird.

How much RAM does that machine have, and how much usable RAM is actually available for Mailpit when you are running it?

It's home OPENWRT router equipped with 4Gb of RAM and most of it is free.

@axllent
Copy link
Owner

axllent commented Feb 22, 2025

The current edge docker built has zstd compression set as 1 (it was 3 before). Could you try that and see if there is at least some improvement (compared to the 7 seconds you mentioned before)? You will need to store the message in the database again (so it is also compressed with 1). This is just to test as I'm curious. I think that most CPU is used for compression, not decompression, so I'm not expecting any miracles, but it will be interesting anyway.

The option I will look at is disabling the zstd compression altigether, but note that Mailpit still will need to process the email to get the attachment parts - there is no way around this. The only difference would be that it does not need to decompress it.

I will also look at a separate option to possibly disable HTTP compression, which may be another thing that is slowing you down. This is easy to test though using curl on the API attachment URL which by default does not request HTTP compression (unlike your browser which asks for it).

@baiomys
Copy link
Author

baiomys commented Feb 22, 2025

separate option to possibly disable HTTP compression

Nice, but 7 seconds result was just for displaying 200K image in HTML part from email containing 15 Mb attachment.
So even without direct download of attachment results are frustrating. And it is definitely caused by message preprocessing.

current edge docker built has zstd compression set as 1

Will try it in the morning. Thanks for your time!

@axllent
Copy link
Owner

axllent commented Feb 22, 2025

To get the 200K image it still has to decompress and process the whole message to extract that attachment. The message processing requirement will not change - I'm just making this clear - so if it's that part which becomes the bottleneck on your end, then you're just going to have to either learn to live with it or use better hardware.

@baiomys
Copy link
Author

baiomys commented Feb 23, 2025

We won a second or even less in new version.
I have IO timeout set to 5 seconds on API operations and it still triggers.
Maybe you can do some profiling on code performing extraction of MIME parts and make it a bit more efficient,
i.e. unroll recursion, limit concurrent goroutines etc.

Maybe this can help
https://github.com/inflex/ripMIME

@axllent
Copy link
Owner

axllent commented Feb 23, 2025

I have been doing a lot of profiling today in relation to #447 - but that is RAM-specific, not CPU. Disabling message compression may help your problem once it's complete & merged, it may not. As I said before, I will also add an option to disable HTTP compression (if the browser requests it), but that's about it. I can spend days profiling, looking for little improvements, but the fact is you're using a shit processor and expecting too much :-) What if you now want to attach a 40MB file, or 100MB.... where does it end?

@baiomys
Copy link
Author

baiomys commented Feb 23, 2025

want to attach a 40MB file, or 100MB.... where does it end?

This limit is already set to 25Mb entire message size on edge SMTP gateway.

you're using a shit processor and expecting too much

It's a challenge. Most of time (over 20 years) I develop commercial software using C and C++ for embedded systems, where resources are VERY limited. So it is painful to see how Go application spend 7 seconds to process TINY plain text message.
=))))

I hope that removing compression will reduce processing time to reasonable value.

BTW you mentioned 27MB attachment in 0.64s, which makes me think that speed depends on thread/core count.
How many cores are on CPU performing so fast and how many threads are active during message processing?

@axllent
Copy link
Owner

axllent commented Feb 23, 2025

They also landed on the moon with something like 256K RAM ;-) Anyway, I can't change much about modern programming languages. I'm sure that if the entire application was written in something like C (from scratch) it would be better optimized, but that would take someone years to achieve a similar result, and in many ways would also be inferior too. This is an entire webserver, SMTP server, database engine, web UI, plus all the other features, packed into a single static binary file that runs on multiple OSes and architectures. If you try any similar application (to Mailpit), even those written in NodeJS or even rust, I'm sure you will also experience same or even worse performance issues too, and the chances are they won't have many of the features. Mailpit is actually pretty well optimized, far much, MUCH, M-U-C-H faster than the software it was built to replace (MailHog)....

I think that removing the compression on both the database level (zstd) decompression, and probably more the HTTP (gzip) compression will help on your end. It's actually the gzip HTTP compression I believe is making your end much slower - it's gzipping every HTTP response including attachments.

I think the machine I was testing on has 12 cores - my laptop (I'm now on) is older (Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz - 4 cores), and the same 27MB attachment takes 2.6 seconds on this machine. Just keep in mind that it also takes time to read that amount of data from any database too, so it's not just parsing / extracting. This processor is much weaker though than my desktop CPU. I'm sure the threads does count too, but the big thing is the CPU I think.

Edit; sorry, missed the question about how many threads are active at the time - I'm not actually sure to be honest. I'd think one for the email parsing etc, however there are multiple background services and cron jobs running within Mailpit too, including HTTPD, SMTPD.

@baiomys
Copy link
Author

baiomys commented Feb 23, 2025

Did you think about introducing sort of "MIME offset markers" during message receiving stage, which can later greatly increase MIME parsing speed. Since email message is immutable, IMHO this can be done relatively easy.
Of course it depends on how SMTP server is organized in your code.

@baiomys baiomys changed the title Attachment extraction too slow and no API endpoints to get attachment filename/size by index Attachment extraction too slow Feb 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants