Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Error parsing and getting mails #23

Open
informaticaeloy opened this issue Jul 29, 2022 · 11 comments
Open

[BUG] Error parsing and getting mails #23

informaticaeloy opened this issue Jul 29, 2022 · 11 comments
Labels
bug Something isn't working

Comments

@informaticaeloy
Copy link

Describe the bug

When i push over the "List Mails" i get an error. It appears to be a issue with the list_emails.py. on line 184. I have tried with plain/text mails and html mails

Work environment

Question Answer
OS version (server) Ubuntu Desktop 22.04
OS version (client) Ubuntu, ...
Python version 3.10.4
Type of email address used office 365
Browser type & version Chrome
Virtualized Env. True
Dedicated RAM 8 GB
vCPU 4
ThePhish version
TheHive version 4.1.9-1
Cortex version 3.1.1-1
MISP version 2.4.148
Installed using Docker and Docker Compose True
Docker Version 20.10.12
Docker Compose version 1.29.2

Screenshots
image

image

Log

thephish | AttributeError: 'NoneType' object has no attribute 'contents'
thephish |
thehive | [info] o.t.s.AccessLogFilter [00000004|] 172.19.0.1 GET /api/status took 3ms and returned 200 752 bytes
thehive | [info] o.t.s.AccessLogFilter [00000005|] 172.19.0.1 GET /api/status took 2ms and returned 200 752 bytes
thehive | [info] o.t.s.AccessLogFilter [00000006|] 192.168.46.213 GET /api/status took 2ms and returned 200 752 bytes
thephish | [INFO][list_emails]: Connected to myemail@[email protected]:993/inbox
thephish | [INFO]
[list_emails]: 3 unread messages to process
thephish | [INFO][list_emails]: Message from: b' [email protected]' with subject: hola
thephish | [INFO]
[list_emails]: Message from: b' [email protected]' with subject: prueba 4
thephish | [ERROR]_[list_emails]: Error while trying to retrieve the emails: Traceback (most recent call last):
thephish | File "/root/thephish/list_emails.py", line 250, in main
thephish | emails_info = retrieve_emails(connection)
thephish | File "/root/thephish/list_emails.py", line 184, in retrieve_emails
thephish | body = soup.body.div.p.span.contents[0]
thephish | AttributeError: 'NoneType' object has no attribute 'contents'
thephish |
thehive | [info] o.t.s.AccessLogFilter [00000007|] 172.19.0.1 GET /api/status took 2ms and returned 200 752 bytes
thehive | [info] o.t.s.AccessLogFilter [00000008|] 172.19.0.1 GET /api/status took 1ms and returned 200 752 bytes
thehive | [info] o.t.s.AccessLogFilter [00000009|] 192.168.46.213 GET /api/status took 1ms and returned 200 752 bytes

@informaticaeloy informaticaeloy added the bug Something isn't working label Jul 29, 2022
@majo053
Copy link

majo053 commented Sep 23, 2022

Hello, this is problem with encoding email header From:

You can fix it:

Change file list_emails.py to this:

            msg = email.message_from_bytes(message)
            decode = email.header.decode_header(msg['From'])
            from_field = ""
            for decode_item in decode:
                    if decode_item[1] is not None:
                            from_field += decode_item[0].decode(decode_item[1])
                    else:
                            if isinstance(decode_item[0], bytes):
                                    from_field += decode_item[0].decode()
                            else:
                                    from_field += str(decode_item[0])

Change file case_from_email.py to this:

            msg = email.message_from_bytes(message)
            decode = email.header.decode_header(msg['From'])
            external_from_field = ""
            for decode_item in decode:
                    if decode_item[1] is not None:
                            external_from_field += decode_item[0].decode(decode_item[1])
                    else:
                            if (isinstance(decode_item[0], bytes)):
                                    external_from_field += decode_item[0].decode()
                            else:
                                    external_from_field += str(decode_item[0])
            parsed_from_field = email.utils.parseaddr(external_from_field)
            if len(parsed_from_field) > 1:
                    external_from_field = parsed_from_field[1]

@emalderson Can you please update this files?

@emalderson
Copy link
Owner

Hello, sorry for the late reply but i'm very busy lately.
Anyway, thank you for providing the code to fix the bug that you encountered, but the problem with that part of code is that when you "fix" one thing, you can easily break 100 other things. What I mean is that if I blindly added your fix to the code, I may break the parsing logic for many other emails in which the from field has different properties. I need to test the change on all the emails that I have and then I'll consider adding your code and mention you for the contribution.

@tiagotsi
Copy link

Does anyone have a working image of ThePhish in .OVF? I am not able to install by Docker and Docker compose.

@LoriSchochWIT
Copy link

Hello, is there an update for this issue?
Or can anybody provide a solution on how to implement the suggested code from @majo053? I don't understand which lines to replace exactly.
Thanks in advance!

@LoriSchochWIT
Copy link

Hi, is there still no working solution that can be implemented? @emalderson

@emalderson
Copy link
Owner

Hello. Unfortunately, errors like these need a thorough testing process. Those are infact related to the absurdly big number of ways in which an email can be encoded into the MIME multipart format. I managed to cover the most widespread use cases, but I cannot predict how every email client encodes the emails. This means that the fields that ThePhish needs to extract are not located in any of the fields that I search in programmatically, so the code breaks. Plus, there may also be some issues with chinese or japanese characters.

This same error is also mentioned in issue #40.

Moreover, the code provided by majo053 does not fix the problem, since the problem highlighted here is related to the encoding and decoding of the HTML part in the email.

@skyluke70
Copy link

somebody have news about this error?

@emalderson
Copy link
Owner

@skyluke70 if you have an email that generates such an error, would you mind sending it to me or posting the EML file content here (after anonymizing the information in the EML file, of course), so that I can reproduce the problem and try to fix it?

@skyluke70
Copy link

skyluke70 commented Sep 9, 2024 via email

@skyluke70
Copy link

skyluke70 commented Sep 9, 2024 via email

@emalderson
Copy link
Owner

@skyluke70

I mean, I actually need the content (header and body) of the .eml file you are attaching when sending the email for analysis to ThePhish.


Mi serve il contenuto del file .EML, completo di header e body dell'email da analizzare. In questo modo posso provare ad analizzarlo io e capire il problema.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants