Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLS + SMTP Error when using OTP/26 #785 #328

Open
fekle opened this issue Jul 3, 2023 · 20 comments
Open

TLS + SMTP Error when using OTP/26 #785 #328

fekle opened this issue Jul 3, 2023 · 20 comments

Comments

@fekle
Copy link

fekle commented Jul 3, 2023

Describe the bug

Hi there - I already opened this issue in swoosh/swoosh#785 but it looks like it is related to gen_smtp, so I'll open it here as well.

I recently updated my Projects to Elixir 1.15 and OTP/26, and everything but sending emails via swoosh and gen_smtp works fine.
I am sending E-Mails via SMTP, using a local business mail provider as a relay.

Swoosh tries sending the email multiple times via gen_smtp and finally fails with the following error:

{:error,
 {:retries_exceeded, {:temporary_failure, ~c"XXX.XXX.XXX.XXX", :tls_failed}}}

Here is my configuration:

  config :redacted, REDACTED.Mailer,
    adapter: Swoosh.Adapters.SMTP,
    relay: System.get_env("SMTP_HOST") || raise("missing SMTP_HOST"),
    username: System.get_env("SMTP_USERNAME") || raise("missing SMTP_USERNAME"),
    password: System.get_env("SMTP_PASSWORD") || raise("missing SMTP_PASSWORD"),
    tls: :always,
    ssl: false,
    auth: :always,
    port: 587,
    retries: 15

The provider only supports STARTTLS via port 587, so that's why ssl is set to false.

With Elixir 1.15 on OTP/25 it worked fine, with OTP/26 the issue appears.

I checked the Erlang documentation, but there don't seem to be any breaking changes related to TLS apart from SSL certificate validation being on by default, but that should be fine? I know this is not much to go on, but unfortunately swoosh does not give me any more information apart from :tls_failed which makes this hard to debug.

Steps to Reproduce the Bug or Issue

Send an email via Swoosh's SMTP adapter and the aforementioned config with OTP/26.

Expected behavior

The email sends without issues.

Your reproducible example

No response

Screenshots or Videos

No response

Platform

  • OTP: 26.0.2
  • Elixir: 1.15.2

Additional context

No response

@seriyps
Copy link
Collaborator

seriyps commented Jul 3, 2023

Well, the TLS certificate validation could easily be the root cause. Wihout looking into details, may you try to add following options to gen_smtp client?

[{tls_options, [{verify, verify_none}]}]

@fekle
Copy link
Author

fekle commented Jul 3, 2023

@seriyps Thanks, that worked - now the emails send again. Maybe gen_smtp could report verify issues explicitly?

Also do you have an idea why that happens, given that the SMTP server's certificate is valid? "Normal" email clients like Outlook or Apple Mail have no issues, and openssl also says the certificate is valid.

> openssl s_client -connect REDACTED:587 -starttls smtp                                                                        16:45:52
CONNECTED(00000005)
[...]
---
Server certificate
-----BEGIN CERTIFICATE-----
[...]
-----END CERTIFICATE-----
subject=CN = REDACTED ( same as in the config )
[...]
---
No client certificate CA names sent
Peer signing digest: SHA512
Peer signature type: RSA
Server Temp Key: ECDH, prime256v1, 256 bits
---
SSL handshake has read 5939 bytes and written 473 bytes
Verification: OK
---
New, TLSv1.2, Cipher is ECDHE-RSA-AES256-GCM-SHA384
Server public key is 4096 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
[...]

Or am I misunderstanding something here?

@seriyps
Copy link
Collaborator

seriyps commented Jul 4, 2023

I think this document describes this topic quite well https://erlef.github.io/security-wg/secure_coding_and_deployment_hardening/ssl

In short, even when SMTP server has a certificate that is signed by a trusted issuer, you should tell Erlang's ssl which set of root certificates it should consider as trusted. In OTP-25 public_key:cacerts_get() function was added which detects the set of trusted root certificates from OS, which is a good default choice, but you still have to provide it explicitly.

@fekle
Copy link
Author

fekle commented Jul 4, 2023

Thanks for the link, I checked it out - however trying public_key:cacerts_get(), :ceritify and CaStore modules all take me right back to the original error - :tls_failed. Is there a way to find out what exactly about tls is failing here?

For more detail:

Adding the cacerts or cacertfile (in the case of using :certifi) option causes it to log the following notice:

[notice] TLS :client: In state :certify at ssl_handshake.erl:2140 generated CLIENT ALERT: Fatal - Handshake Failure
 - {:bad_cert, :hostname_check_failed}

Now that's something I am familiar with, and in the past i used the :ssl_verify_fun module to help with this by adding verify_fun: &:ssl_verify_hostname.verify_fun/3 to the options. This removes the :hostname_check_failed error, taking me right back to just :tls_failed with no other information. I'm feeling a bit lost here, so any help in finding the underlying issue would be greatly appreciated :)

PS: this issue occurs on both macOS 13 and Debian 11. I also tried sending via gmail (smtp.gmail.com) and get the same issues - so I doubt it has anything to do with the smtp server I'm using.

@seriyps
Copy link
Collaborator

seriyps commented Jul 4, 2023

Sorry, I don't have a good tips for you. gen_smtp tends to swallow some of the error details, because it tries to connect to each SMTP server returned by MX DNS lookup and only returns an error when there is no more servers left.

You can try to enable debug logging for OTP ssl app, but I don't remember all the options you should tweak to do that... I think smth like

logger:set_primary_config(level, debug).
ssl:connect("google.com", 443, [{log_level, debug}]).

@fekle
Copy link
Author

fekle commented Jul 4, 2023

No worries, thanks anyways for your quick replies! I tried the setting the log level or using ssl:connect, but the log level doesn't change anything ( I already have the global log level set to debug) and ssl:connect doesnt show the same errors.

I concede that for now STARTTLS doesn't seem to work with OTP/26 unless you set verify: :verify_none in the tls_options (granted, that was the OTP/25 default). Of course validating the server would be a good security best practice, so maybe someone more experienced with OTP and its TLS implementation can find a solution here :-)

@seriyps
Copy link
Collaborator

seriyps commented Jul 4, 2023

granted, that was the OTP/25 default

nope, it was just non-fatal error prior to OTP-26

@fekle
Copy link
Author

fekle commented Jul 4, 2023

I did some more digging in the documentation and created a quick reproduction script.
It turns out that when enabling SNI with the same domain as the relay, it works! For example:

Mix.install([
  {:gen_smtp, "~> 1.2"},
  {:certifi, "~> 2.11"},
  {:eiconv, "~> 1.0"}
])

require Logger

:application.ensure_all_started(:ssl)

Logger.configure(level: :debug)
:logger.set_application_level(:ssl, :debug)

Logger.debug("Starting SMTP TLS test")

{:ok, pid} =
  :gen_smtp_client.send(
    {"[email protected]", ["[email protected]"],
     "Subject: testing\r\nFrom: Test <[email protected]>\r\nTo: Test <[email protected]>\r\n\r\nThis is the email body"},
    relay: "smtp.gmail.com",
    retries: 0,
    tls: :always,
    ssl: false,
    auth: :always,
    port: 587,
    username: "[email protected]",
    password: "test",
    tls_options: [
      verify: :verify_peer,
      cacerts: :certifi.cacerts(), # <--- this is required!
      server_name_indication: ~c"smtp.gmail.com", # <--- this is required!
      depth: 99,
      log_level: :debug
    ]
  )

Process.monitor(pid)

receive do
  _ ->
    :ok
end

Now returns ** (EXIT from #PID<0.98.0>) {:error, :no_more_hosts, {:permanent_failure, ~c"142.250.147.109", :auth_failed}} as expected - using correct credentials now works.

The important parts are:

      cacerts: :certifi.cacerts(), # <--- this is required!
      server_name_indication: ~c"smtp.gmail.com", # <--- this is required!

I'm using certifi here to be independent from the system ca certs. I know my example code is elixir, but the same should apply for erlang.

@seriyps what do you think - maybe we should use those options per default, or otherwise maybe mention this in the documentation? I'd be happy to open a PR for this!

@seriyps
Copy link
Collaborator

seriyps commented Jul 5, 2023

I think mentioning it in the docs should be the solution - pre-selecting the list of trusted root certificates might be not very secure plus only works without dependencies on OTP-25+

@mworrell WDYT?

@mworrell
Copy link
Collaborator

mworrell commented Jul 5, 2023

I am actually ok with a default using the list from tls_certificate_check:options/1 if there are no other ssl options overruling the ssl options.

A lot of projects started to include this app, and it is quite good maintained. If we have a loose dependency, like:

{tls_certificate_check, "~> 1.18"},

Then the project using gen_smtp can always use a newer version.

Example of the usage:

(zotonic@PoToi)17> tls_certificate_check:options("smtp.gmail.com").
[{verify,verify_peer},
 {depth,100},
 {cacerts,[<<48,130,3,117,48,130,2,93,160,3,2,1,2,2,11,4,0,
             0,0,0,1,21,75,...>>,
           <<48,130,4,42,48,130,3,18,160,3,2,1,2,2,4,56,99,222,248,
             48,13,6,...>>,
           <<48,130,3,119,48,130,2,95,160,3,2,1,2,2,4,2,0,0,185,48,
             13,...>>,
           <<48,130,4,145,48,130,3,121,160,3,2,1,2,2,4,69,107,80,84,
             48,...>>,
           <<48,130,4,50,48,130,3,26,160,3,2,1,2,2,1,1,48,13,6,...>>,
           <<48,130,5,183,48,130,3,159,160,3,2,1,2,2,2,5,9,48,...>>,
           <<48,130,6,157,48,130,4,133,160,3,2,1,2,2,2,5,198,...>>,
           <<48,130,3,90,48,130,2,66,160,3,2,1,2,2,1,0,...>>,
           <<48,130,4,48,48,130,3,24,160,3,2,1,2,2,16,...>>,
           <<48,130,4,0,48,130,2,232,160,3,2,1,2,2,...>>,
           <<48,130,4,15,48,130,2,247,160,3,2,1,2,...>>,
           <<48,130,3,183,48,130,2,159,160,3,2,1,...>>,
           <<48,130,3,175,48,130,2,151,160,3,2,...>>,
           <<48,130,3,197,48,130,2,173,160,3,...>>,
           <<48,130,5,186,48,130,3,162,160,...>>,
           <<48,130,5,189,48,130,3,165,...>>,
           <<48,130,3,184,48,130,2,...>>,
           <<48,130,3,188,48,130,...>>,
           <<48,130,4,29,48,...>>,
           <<48,130,2,137,...>>,
           <<48,130,3,...>>,
           <<48,130,...>>,
           <<"0"...>>,<<...>>|...]},
 {verify_fun,{fun ssl_verify_hostname:verify_fun/3,
              [{check_hostname,"smtp.gmail.com"}]}},
 {partial_chain,fun tls_certificate_check_shared_state:find_trusted_authority/1},
 {customize_hostname_check,[{match_fun,#Fun<public_key.6.103305546>}]},
 {server_name_indication,"smtp.gmail.com"}]

@fekle
Copy link
Author

fekle commented Jul 7, 2023

I agree with both points - including this by default would make it easier for most people to use, however I also understand that, as gen_smtp delegates TLS/SSL to the ssl module, it is not really the duty of gen_smtp to choose which CA Certificates should be used. (If you ask me, erlang's ssl should use the system CA certificates by default unless otherwise specified, as that is how most software does it and I suspect this would be the behaviour that most developers expect - but that is a different discussion unrelated to gen_smtp :-) ).

I do however think that gen_smtp should set the SNI option to the domain specified in the relay option, as otherwise the server may not know which certificate to choose (as gen_smtp looks up the IP via DNS and passes the IP to ssl as i understand) - this would make it so that certificate and host validation succeeds once the user specifies which certificates to use. Or at least document that developers need to set the server_name_indication in addition to the relay option for TLS/SSL to work with certificate validation.

@gmile
Copy link

gmile commented Jul 10, 2023

I did some more digging in the documentation and created a quick reproduction script.

@fekle thanks for posting a workaround with explicitly specified options!

It didn't work for me initially, as I decided not to specify the depth (since your comment didn't mention is as critical).

Before specifying the depth parameter, I get a few of the notices below, then a failure as a result:

15:54:17.877 [notice] TLS :client: In state :certify at ssl_handshake.erl:2140 generated CLIENT ALERT: Fatal - Handshake Failure
 - {:bad_cert, :max_path_length_reached}

# ...

{:error, :retries_exceeded, {:temporary_failure, '52.57.139.126', :tls_failed}}

After specifying the depth parameter, all worked as expected, and a "provider queue" string was returned:

"Ok: queued as mgmql-u1Q8aRoiwOkOYDmA\r\n"

My final code also doesn't depend on :certifi (to/from emails are changed):

Mix.install([:gen_smtp])

:gen_smtp_client.send_blocking(
  {"[email protected]", ["[email protected]"],
   "Subject: Emails are working\r\nFrom: Eugene's App <[email protected]>\r\nTo: Eugene <[email protected]>\r\n\r\nLooks like emails are working"},
  auth: :always,
  relay: "smtp.sendgrid.net",
  port: 587,
  tls: :always,
  username: "apikey",
  password: "SG.YDst...",
  tls_options: [
    verify: :verify_peer,
    cacerts: :public_key.cacerts_get(),
    server_name_indication: 'smtp.sendgrid.net',
    depth: 99
  ]
)

@tosie
Copy link

tosie commented Jul 15, 2023

Having spent some time with this I stumbled upon an interesting problem. We are using gen_smtp to send emails via Microsoft365's SMTP server. According to their documentation the server is available at smtp.office365.com. Using this value as relay works, but host verification still fails even when server_name_indication is set.

The server's certificate specifies *.office365.com in its alternative names field, but this seems to not be accepted by the host name verification function ({:bad_cert, : hostname_check_failed ).

Setting SNI (server_name_indication) to "office365.com" (another one of the alternative name entries) solves the problem.

Here is the full example for Microsoft365:

Mix.install([:gen_smtp])

:gen_smtp_client.send_blocking(
  {"[email protected]", ["[email protected]"],
   "Subject: Emails are working\r\nFrom: Eugene's App <[email protected]>\r\nTo: Eugene <[email protected]>\r\n\r\nLooks like emails are working"},
  auth: :always,
  relay: "smtp.office365.com",
  port: 587,
  auth: :always,
  ssl: false,
  tls: :always,
  username: "...",
  password: "...",
  tls_options: [
    verify: :verify_peer,
    versions: ["tlsv1.3"],
    cacerts: public_key:cacerts_get(),
    server_name_indication: "office365.com",
    depth: 99
  ]
)

@fekle
Copy link
Author

fekle commented Jul 17, 2023

@gmile you're welcome! :-) I didn't mark depth, as it wasn't required for the server I used.

I ended up using what @mworrell showed, that works great with all checks:

[...]
 tls_options: :tls_certificate_check.options(System.get_env("SMTP_HOST"))
[...]

Before we close this issue - should we now change the defaults in gen_smtp or just mention it in the documentation, and leave it up to upstream projects like swoosh to handle the issue with thinks like :tls_certificate_check.options ?

@tosie
Copy link

tosie commented Jul 17, 2023

Very nice, did not know about :tls_certificate_check 👍👍

Since OTP changed its defaults maybe it would be a good idea to change the defaults of gen_smtp to something sensible that works and it safe to use, too 🤷‍♂️

@neilberkman
Copy link

In case anyone needs to get this working for AWS SES, this seems to be the minimal config...

    smtp_relay = System.get_env("AWS_SMTP_RELAY") # e.g. "email-smtp.us-east-1.amazonaws.com"
    config = [
      relay: smtp_relay,
      port: 587,
      username: System.get_env("AWS_SMTP_USERNAME"),
      password: System.get_env("AWS_SMTP_PASSWORD"),
      tls: :always,
      tls_options: :tls_certificate_check.options(smtp_relay)
    ]

@aeros281
Copy link

aeros281 commented Jan 30, 2024

I'm using bamboo_smtp (with patched logic to forward tls_options to gen-smtp) right now to send email via AWS SES. Even with tls_options set to :tls_certificate_check.options(smtp_relay) the issue is not fixed. Debug logs were not useful at all. I resolved it with:

tls_options: [
    ...
    versions: ["tlsv1.2"],
    server_name_indication: smtp_server |> to_charlist(),
    ...
 ]

The SES endpoint only support TLS v1.2, I think.

@linges
Copy link

linges commented Feb 15, 2024

AWS SES worked for me like this in elixir:

...
  {:tls_options, [
           versions: [:"tlsv1.2"]
         ] ++ :tls_certificate_check.options(server)},
...

@gmile
Copy link

gmile commented Mar 7, 2024

Had to come back to this once more while upgrading an older app. My previous answer didn't immediately work anymore.

ℹ️ It did worked after changing exactly two things:

  1. changed value of SMTP relay from an IP to using a hostname instead,
  2. added customize_hostname_check to tls_options

The final code looks like this:

Mix.install([:gen_smtp])

:gen_smtp_client.send_blocking(
  {"[email protected]", ["[email protected]"],
   "Subject: Emails are working\r\nFrom: Eugene's App <[email protected]>\r\nTo: Eugene <[email protected]>\r\n\r\nLooks like emails are working"},
  auth: :always,
  relay: System.get_env("SMTP_RELAY"),
  port: 587,
  tls: :always,
  username: System.get_env("SMTP_USERNAME"),
  password: System.get_env("SMTP_PASSWORD"),
  tls_options: [
    verify: :verify_peer,
    cacerts: :public_key.cacerts_get(),
    depth: 99,
    server_name_indication: ~c"#{System.get_env("SMTP_RELAY")}",
    customize_hostname_check: [
      match_fun: :public_key.pkix_verify_hostname_match_fun(:https)
    ]
  ]
)

As a side note, this code doesn't rely on tls_certificate_check. In the app code am configuring the :gen_smtp_client using config/runtime.exs, and calling :tls_certificate_check.options/1 there was failing for me with some error coming from the depths of :tls_certificate_check sources. I think it could be due to tls_certificate_check app not starting when config/runtime.exs is evaluated, though I am not sure.

On another separate note, this older app I was upgrading is using a regional, lesser known SMTP provider, so until I realized I needed to change from using IP to using a hostname, I was getting the :retries_exceeded error from OP's message and started to doubt if the provider even supports TLS cert verification. This tool proved helpful to dispel the doubts I had about the server's support of TLS: https://www.checktls.com/TestReceiver

oestrich added a commit to nerves-hub/nerves_hub_web that referenced this issue Mar 25, 2024
OTP 26 changed the error level of what we're seeing when trying to send
email to SES via TLS. This additional configuration should let us
communicate via TLS without issues.

See gen-smtp/gen_smtp#328
oestrich added a commit to nerves-hub/nerves_hub_web that referenced this issue Mar 25, 2024
OTP 26 changed the error level of what we're seeing when trying to send
email to SES via TLS. This additional configuration should let us
communicate via TLS without issues.

See gen-smtp/gen_smtp#328
@OlegGorod
Copy link

Well, the TLS certificate validation could easily be the root cause. Wihout looking into details, may you try to add following options to gen_smtp client?

[{tls_options, [{verify, verify_none}]}]

Thanks, it helped me too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants