-
-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attachment extraction too slow #448
Comments
Hi. I don't understand the question(s). The message summary returns all |
Call to extract ANY message part via /api/v1/message/{id}/part/{part} takes over 7 seconds if large attachment(s) (15 megabytes) exist in message.
On Ryzen things are slightly better, but also not brilliant. Message summary does return filenames and sizes, but it is not always convenient to call and parse this method from JS on page. Sorry I mixed an issue and feature request together, my fault. |
This is really a hardware limitation as I return a 27MB attachment in 0.64s. A dual core G-T40N is an extremely weak processor from 2011, and you're not ever going to get any performance from this these days. In order for Mailpit to get any attachment, it has to extract the entire raw message from the database, decompress that message (ZSTD), parse the message, and finally return the attachment via HTTP. In the next release of Mailpit there is a change that is using a faster ZSTD compression setting (less-compressed), but it's not going to make any noticeable difference on your laptop unfortunately. A 15MB attachment means a message of about 25-30MB in size (due to email encoding), so it's a lot of work to process a message like that. I really don't know what to say about the "not always convenient" comment either. Email parts are stored dynamically in messages, so there is no guaranteed format or order of message parts. The API conveniently extracts all the information you need in the summary, and the Mailpit web UI uses the exact same API calls to get all the message attachments, paths, sizes, names etc. If you want to know what's in the message then you have to use the summary - it tells you exactly what you need to know to get the parts. |
Is it possible to add an option to COMPLETELY disable compression? RAW email message can be easily stored in database without any extra cpu consuming operations, so IMHO user should be able to choose between space and speed. BTW most modern Linux systems use ZFS and BTRFS, which support native compression, so using it TWICE is massive overkill. |
Yes, I will consider this option, although I do not know how much difference it will make in your situation. No compression means more data in/out of the database (disk i/o) which is slower. The message still needs to be parsed and the attachment"extracted", so it may not make much difference in the end. Is your laptop using a 5400RPM spinning laptop hard drive, or an SSD? I need to do a lot more testing and give this some more thought. Also yes, some modern filesystems have optional compression, but I don't think they work well (or even at all) on files like SQLite databases which are continuously changing. |
I ran gzip compression/decompression test on RAW email which was so slow to process |
I said zstd, not gzip - those are very different compression techniques, and zstd is much faster. I also don't understand the second part about the bash script with 20 cycles in 4 seconds - 20 cycles doing what? I will say it again though, just to make it perfectly clear, even if I add an option to remove database compression altogether, you are not going to get great performance on your old hardware working with a 15MB attachment in an email. This is because of your hardware. Speaking of your hardware, you did not answer my question about what type of hard drive you are using? |
Empty database on ramfs
20 cycles of compressing and decompressing RAW email message |
Are you saying that it took 200ms to compress that raw email (with the 15MB attachment) on your machine using gzip? That is not possible unless your machine is caching the input and output, or your bash script is not working as you expected. Ramfs will give you "disk" performance because it's all in RAM. How much RAM does that machine have, and how much usable RAM is actually available for Mailpit when you are running it? |
Well, you were right, script results for GZIP was wrong as I took truncated file. Now I downloaded ZSTD and took right file, results are: 1.5 seconds for level 1 raw file size 20.5M So if we completely remove compression and all intermediate conversion procedures POSSIBLY results will be less weird.
It's home OPENWRT router equipped with 4Gb of RAM and most of it is free. |
The current edge docker built has zstd compression set as 1 (it was 3 before). Could you try that and see if there is at least some improvement (compared to the 7 seconds you mentioned before)? You will need to store the message in the database again (so it is also compressed with 1). This is just to test as I'm curious. I think that most CPU is used for compression, not decompression, so I'm not expecting any miracles, but it will be interesting anyway. The option I will look at is disabling the zstd compression altigether, but note that Mailpit still will need to process the email to get the attachment parts - there is no way around this. The only difference would be that it does not need to decompress it. I will also look at a separate option to possibly disable HTTP compression, which may be another thing that is slowing you down. This is easy to test though using |
Nice, but 7 seconds result was just for displaying 200K image in HTML part from email containing 15 Mb attachment.
Will try it in the morning. Thanks for your time! |
To get the 200K image it still has to decompress and process the whole message to extract that attachment. The message processing requirement will not change - I'm just making this clear - so if it's that part which becomes the bottleneck on your end, then you're just going to have to either learn to live with it or use better hardware. |
We won a second or even less in new version. Maybe this can help |
I have been doing a lot of profiling today in relation to #447 - but that is RAM-specific, not CPU. Disabling message compression may help your problem once it's complete & merged, it may not. As I said before, I will also add an option to disable HTTP compression (if the browser requests it), but that's about it. I can spend days profiling, looking for little improvements, but the fact is you're using a shit processor and expecting too much :-) What if you now want to attach a 40MB file, or 100MB.... where does it end? |
This limit is already set to 25Mb entire message size on edge SMTP gateway.
It's a challenge. Most of time (over 20 years) I develop commercial software using C and C++ for embedded systems, where resources are VERY limited. So it is painful to see how Go application spend 7 seconds to process TINY plain text message. I hope that removing compression will reduce processing time to reasonable value. BTW you mentioned 27MB attachment in 0.64s, which makes me think that speed depends on thread/core count. |
They also landed on the moon with something like 256K RAM ;-) Anyway, I can't change much about modern programming languages. I'm sure that if the entire application was written in something like C (from scratch) it would be better optimized, but that would take someone years to achieve a similar result, and in many ways would also be inferior too. This is an entire webserver, SMTP server, database engine, web UI, plus all the other features, packed into a single static binary file that runs on multiple OSes and architectures. If you try any similar application (to Mailpit), even those written in NodeJS or even rust, I'm sure you will also experience same or even worse performance issues too, and the chances are they won't have many of the features. Mailpit is actually pretty well optimized, far much, MUCH, M-U-C-H faster than the software it was built to replace (MailHog).... I think that removing the compression on both the database level (zstd) decompression, and probably more the HTTP (gzip) compression will help on your end. It's actually the gzip HTTP compression I believe is making your end much slower - it's gzipping every HTTP response including attachments. I think the machine I was testing on has 12 cores - my laptop (I'm now on) is older (Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz - 4 cores), and the same 27MB attachment takes 2.6 seconds on this machine. Just keep in mind that it also takes time to read that amount of data from any database too, so it's not just parsing / extracting. This processor is much weaker though than my desktop CPU. I'm sure the threads does count too, but the big thing is the CPU I think. Edit; sorry, missed the question about how many threads are active at the time - I'm not actually sure to be honest. I'd think one for the email parsing etc, however there are multiple background services and cron jobs running within Mailpit too, including HTTPD, SMTPD. |
Did you think about introducing sort of "MIME offset markers" during message receiving stage, which can later greatly increase MIME parsing speed. Since email message is immutable, IMHO this can be done relatively easy. |
Hi.
Please consider user selectable option for attachment compression and additional endpoint(s).
Thanks.
The text was updated successfully, but these errors were encountered: