Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better MBOX parser #19

Open
terhechte opened this issue Dec 19, 2021 · 0 comments
Open

Better MBOX parser #19

terhechte opened this issue Dec 19, 2021 · 0 comments

Comments

@terhechte
Copy link
Owner

Postsack currently uses mbox-reader for MBOX parsing, but it doesn't properly implement the standard. It only checks for the FROM string at the beginning of a line which means any email containing a newline with a FROM somewhere in the body is regarded as two different emails. The correct way to detect a new email in MBOX according to the RFC 4155 is:

Each message in the mbox database MUST be immediately preceded
by a single separator line, which MUST conform to the following
syntax:

  • The exact character sequence of "From";

  • a single Space character (0x20);

  • the email address of the message sender (as obtained from the message envelope or other authoritative source), conformant with the "addr-spec" syntax from RFC 2822;

  • a single Space character;

  • a timestamp indicating the UTC date and time when the message was originally received, conformant with the syntax of the traditional UNIX 'ctime' output sans timezone (note that the use of UTC precludes the need for a timezone indicator);

  • an end-of-line marker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant