Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export letters.ParseHeaders() so that a message can be parsed into letters.Headers only #85

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mnako
Copy link
Owner

@mnako mnako commented Jul 14, 2024

In #72 @rorycl makes a good point for parsing email headers only: the email bodies might be malformed or the user might be interested in headers only.

I do not think, however, that a string arg letters.ParseEmail(r, "HeadersOnly") is the best way to implement this feature. I believe that this should be at least a config struct for future maintainability, but for the time being there is a simpler solution: export the ParseHeaders() identifier so that users can use it directly.

This PR renames letters.parseHeaders() to letters.ParseHeaders(), adds a test, and updates README.md with an example.

One point that might merit discussion, is whether the now public ParseHeaders() should not accept the same parameters as ParseEmail so that the user is not responsible for reading the message using mail.ReadMessage() and passing msg.Header to ParseHeaders().

…ail headers only.

Add test.
Update README.md.
Copy link
Collaborator

@ValleyCrisps ValleyCrisps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice simple fix!

I agree that it would be a (small) usability improvent if ParseHeaders() accepted the same parameters as ParseEmail(), thus hiding the ReadMessage() logic from the user.
Curious if there are any pros to delegating the ReadMessage() task to the client. Do we foresee a scenario where the intermediate object msg is required elsewhere?

@mnako
Copy link
Owner Author

mnako commented Jul 21, 2024

Thank you for the review, @ValleyCrisps 🙇

@rorycl raised good points in the discussion here #64, where they mentioned that

[skipping body parsing] dropped my processing time to 1/10th of the time in comparison to full email parsing for about 1700 messages

and in one more place that I cannot find now (not sure if it was a PR comment that was deleted or if I lost track of one thread), where @rorycl said that

sometimes someone might want to check the header of the email with ParseHeaders and then dive into it with ParseEmail, in which case it might make sense to store the header info in the Email struct and avoid parsing the headers again.

I think it is a very good point.

This might be a minor point, but I still have some objections against a headersOnly option, regardles of whether we implement that as a string arg, a config struct, or—I really like that idea as referenced by @rorycl—a self-referential function options.

Generally, I think that options of the form xOnly, skipY, overrideZ point out design flaws and highlight places that are not flexible enough. I think that we should offer the option to parse headers only, but in a more coherent way, rather than framing it as an exception to the default flow.

My current thinking is that we should refactor ParseHeaders and ParseBody as methods on the Email struct.

The default behaviour stays the same: the simplest use case is to parse an email completely by creating an Email instance from a reader:

email, err := letters.ParseEmail(r)
if err != nil {
    log.Fatal(err)
}

but if somebody has a more advanced use case, we can enable them to create an empty Email and parse headers and bodies separately:

var e Email

err := e.ParseHeaders(r)
if err != nil {
	// handle headers parser error
}

err = e.ParseBody(r)
if err != nil {
	// handle body parsing error
}

That way,

  • e.ParseBody() can be skipped, if needed,
  • we keep a mostly consistent ParseX(r io.Reader) developer-facing friendly API (at the cost of some additional complexities on our side, like perhaps handling or at least documenting the case of trying to read from the same stream twice);
  • we hide the use of mail.ReadMessage();
  • we allow more use cases: the developer can read headers and bodies from wherever they want under whatever conditions they want.

This is a pretty nitty discussion, but we have the luxury of taking the time to build a nice library, so let's try to get this nice and elegant.

Thoughts?

@ValleyCrisps
Copy link
Collaborator

Thank you for the discussion, I wouldn't have thought of this approach on my own, and I'm learning a lot just by reviewing and reading your comments.

I like the consistency in the method parameters and the logic encapsulation in the proposed solution.
Since we are exploring alternatives, I'd like to hear your opinion on the following points:

  1. Do we need to change the Email struct in order to store partially parsed emails in it? I'm not familiar with optional types in Go, but it feels like we may be weakening our data structure.
  2. Without looking at implementation, it isn't intuitive which fields are being parsed by ParseBody. For example, are attachments included? ContentTypeHeader is another grey area as it must be parsed even for email bodies. To be clear, my point here is about usability more than duplication. Good documentation helps a lot, but self-explanatory function names are even better.
  3. (related to 2.) Have you considered further splitting ParseBody into text (Text, EnrichedText, HTML) and attachments (InlineFiles, AttachedFiles)? In a use-case where performance is a concern, I think skipping attachments would lead to the biggest improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants