Export `letters.ParseHeaders()` so that a message can be parsed into `letters.Headers` only #85

mnako · 2024-07-14T06:11:05Z

In #72 @rorycl makes a good point for parsing email headers only: the email bodies might be malformed or the user might be interested in headers only.

I do not think, however, that a string arg letters.ParseEmail(r, "HeadersOnly") is the best way to implement this feature. I believe that this should be at least a config struct for future maintainability, but for the time being there is a simpler solution: export the ParseHeaders() identifier so that users can use it directly.

This PR renames letters.parseHeaders() to letters.ParseHeaders(), adds a test, and updates README.md with an example.

One point that might merit discussion, is whether the now public ParseHeaders() should not accept the same parameters as ParseEmail so that the user is not responsible for reading the message using mail.ReadMessage() and passing msg.Header to ParseHeaders().

…ail headers only. Add test. Update README.md.

ValleyCrisps

Nice simple fix!

I agree that it would be a (small) usability improvent if ParseHeaders() accepted the same parameters as ParseEmail(), thus hiding the ReadMessage() logic from the user.
Curious if there are any pros to delegating the ReadMessage() task to the client. Do we foresee a scenario where the intermediate object msg is required elsewhere?

mnako · 2024-07-21T08:01:46Z

Thank you for the review, @ValleyCrisps 🙇

@rorycl raised good points in the discussion here #64, where they mentioned that

[skipping body parsing] dropped my processing time to 1/10th of the time in comparison to full email parsing for about 1700 messages

and in one more place that I cannot find now (not sure if it was a PR comment that was deleted or if I lost track of one thread), where @rorycl said that

sometimes someone might want to check the header of the email with ParseHeaders and then dive into it with ParseEmail, in which case it might make sense to store the header info in the Email struct and avoid parsing the headers again.

I think it is a very good point.

This might be a minor point, but I still have some objections against a headersOnly option, regardles of whether we implement that as a string arg, a config struct, or—I really like that idea as referenced by @rorycl—a self-referential function options.

Generally, I think that options of the form xOnly, skipY, overrideZ point out design flaws and highlight places that are not flexible enough. I think that we should offer the option to parse headers only, but in a more coherent way, rather than framing it as an exception to the default flow.

My current thinking is that we should refactor ParseHeaders and ParseBody as methods on the Email struct.

The default behaviour stays the same: the simplest use case is to parse an email completely by creating an Email instance from a reader:

email, err := letters.ParseEmail(r)
if err != nil {
    log.Fatal(err)
}

but if somebody has a more advanced use case, we can enable them to create an empty Email and parse headers and bodies separately:

var e Email

err := e.ParseHeaders(r)
if err != nil {
	// handle headers parser error
}

err = e.ParseBody(r)
if err != nil {
	// handle body parsing error
}

That way,

e.ParseBody() can be skipped, if needed,
we keep a mostly consistent ParseX(r io.Reader) developer-facing friendly API (at the cost of some additional complexities on our side, like perhaps handling or at least documenting the case of trying to read from the same stream twice);
we hide the use of mail.ReadMessage();
we allow more use cases: the developer can read headers and bodies from wherever they want under whatever conditions they want.

This is a pretty nitty discussion, but we have the luxury of taking the time to build a nice library, so let's try to get this nice and elegant.

Thoughts?

ValleyCrisps · 2024-07-26T10:37:27Z

Thank you for the discussion, I wouldn't have thought of this approach on my own, and I'm learning a lot just by reviewing and reading your comments.

I like the consistency in the method parameters and the logic encapsulation in the proposed solution.
Since we are exploring alternatives, I'd like to hear your opinion on the following points:

Do we need to change the Email struct in order to store partially parsed emails in it? I'm not familiar with optional types in Go, but it feels like we may be weakening our data structure.
Without looking at implementation, it isn't intuitive which fields are being parsed by ParseBody. For example, are attachments included? ContentTypeHeader is another grey area as it must be parsed even for email bodies. To be clear, my point here is about usability more than duplication. Good documentation helps a lot, but self-explanatory function names are even better.
(related to 2.) Have you considered further splitting ParseBody into text (Text, EnrichedText, HTML) and attachments (InlineFiles, AttachedFiles)? In a use-case where performance is a concern, I think skipping attachments would lead to the biggest improvement.

Export letters.ParseHeaders() so that a message can be parsed into em…

5b13aef

…ail headers only. Add test. Update README.md.

mnako self-assigned this Jul 14, 2024

mnako requested review from NikitaKurpas, adomanskagrzyb and ValleyCrisps July 14, 2024 06:11

rorycl mentioned this pull request Jul 15, 2024

Discussion: options and config/state struct #86

Closed

ValleyCrisps approved these changes Jul 17, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export `letters.ParseHeaders()` so that a message can be parsed into `letters.Headers` only #85

Export `letters.ParseHeaders()` so that a message can be parsed into `letters.Headers` only #85

mnako commented Jul 14, 2024

ValleyCrisps left a comment

mnako commented Jul 21, 2024

ValleyCrisps commented Jul 26, 2024

Export letters.ParseHeaders() so that a message can be parsed into letters.Headers only #85

Are you sure you want to change the base?

Export letters.ParseHeaders() so that a message can be parsed into letters.Headers only #85

Conversation

mnako commented Jul 14, 2024

ValleyCrisps left a comment

Choose a reason for hiding this comment

mnako commented Jul 21, 2024

ValleyCrisps commented Jul 26, 2024

Export `letters.ParseHeaders()` so that a message can be parsed into `letters.Headers` only #85

Export `letters.ParseHeaders()` so that a message can be parsed into `letters.Headers` only #85