Exim closes connection when negotiating SSL handshake
Mike Clarke
jmc-freebsd2 at milibyte.co.uk
Sat Sep 5 16:49:43 UTC 2020
A couple of months ago I had a problem with exim after it was upgraded exim-
sa-exim from 4.93 to 4.94. When attempting to send mail exim would close the
connection while trying to establish an SSL connection.
I've now managed get 4.94 working but I would appreciate any advice on
whether what I've done is a valid modification to my configuration or if it's
just a lucky workaround which works for now but might come back and bite me
after some future upgrade.
The original problem was that exim would immediately close the connection on
starting to negotiate an SSL handshake. It would then periodically retry to
send the email and eventually succeed but only after an unpredictable number
of retries, for example one message succeeded on the second attempt after 30
minutes but a different one needed seven retries over three and a half hours.
I posted the problem in an earlier thread here and received a number of
useful comments which helped me to track down where things were going wrong
but didn't enable me to determine why it was happening or to resolve the
problem so as a temporary workaround I reverted to 4.93. The full thread is
at https://lists.freebsd.org/pipermail/freebsd-questions/2020-June/
290344.html
Things have now moved on and a recent upgrade to perl has led to a dependency
issue which I temporarily worked around by adding an entry for libperl.so.
5.30 in /etc/libmap.conf but this has prompted me to try to fix the problem
properly before any future dependency issues create a more serious problem.
The following extract from truss output for exim 4.94 while attempting to
send an email covers the SSL negotiation
3216: read(9,"##\n## ca-root-nss.crt -- Bundl"...,131072) = 131072 (0x20000)
After about 40 lines reading /etc/ssl/cert.pem it continues with:
3216: read(9,0x8011adec0,131072) = 0 (0x0)
3216: close(9) = 0 (0x0)
3216: issetugid() = 0 (0x0)
3216: setitimer(0,{ 0.000000, 300.000000 },{ 0.000000, 0.000000 }) = 0 (0x0)
3216: mmap(0x0,24576,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) =
34385039360 (0x801821000)
3216: mmap(0x0,24576,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) =
34385063936 (0x801827000)
3216: getpid() = 3216
(0xc90)
3216: getpid() = 3216
(0xc90)
3216: getpid() = 3216
(0xc90)
3216: getpid() = 3216
(0xc90)
3216: getpid() = 3216
(0xc90)
3216: getpid() = 3216
(0xc90)
3216: write(7,"\^V\^C\^A\^A%\^A\0\^A!\^C\^Ck"...,298) ERR#57 'Socket is not
connected'
A similar section for 4.93 follows the same pattern but ends with
88001: write(7,"\^V\^C\^A\^A%\^A\0\^A!\^C\^C\M-`"...,298) = 298 (0x12a)
... and goes on to successfully send the email.
When I look in the debug output from exim 4.94 for details of the SSL
negotiation I see the following
12:14:25 48240 Connecting to electron.ecohosting.co.uk [185.53.59.181]:465
... 185.53.59.181 in hosts_try_fastopen? yes (matched "*")
12:14:25 48240 TFO mode connection attempt to 185.53.59.181, 0 data
12:14:25 48240 connected
12:14:25 48240 ╭considering: $primary_hostname
12:14:25 48240 ├──expanding: $primary_hostname
12:14:25 48240 ╰─────result: curlew.milibyte.co.uk
12:14:25 48240 185.53.59.181 in hosts_avoid_esmtp? no (option unset)
12:14:25 48240 185.53.59.181 in hosts_require_ocsp? no (option unset)
12:14:25 48240 185.53.59.181 in hosts_request_ocsp? yes (matched "*")
12:14:25 48240 setting SSL CTX options: 0x42004000
12:14:25 48240 Diffie-Hellman initialized from default with 2048-bit prime
12:14:25 48240 Initialized TLS
12:14:25 48240 185.53.59.181 in tls_verify_hosts? no (option unset)
12:14:25 48240 185.53.59.181 in tls_try_verify_hosts? yes (matched "*")
12:14:25 48240 tls_verify_certificates: system
12:14:25 48240 185.53.59.181 in tls_verify_cert_hostnames? yes (matched "*")
12:14:25 48240 Cert hostname to check: "electron.ecohosting.co.uk"
12:14:25 48240 Calling SSL_connect
12:14:25 48240 SSL_connect: before SSL initialization
12:14:25 48240 SSL_connect: error in SSLv3/TLS write client hello
12:14:25 48240 TLS error '(SSL_connect): error:
00000000:lib(0):func(0):reason(0)'
12:14:25 48240 TLS session fail: (SSL_connect): error:
00000000:lib(0):func(0):reason(0)
12:14:25 48240 SMTP(close)>>
12:14:25 48240 set_process_info: 48240 delivering 1k7GcL-000CY0-IH: just
tried electron.ecohosting.co.uk [185.53.59.181]:465 for #REDACTED#@gmail.com:
result DEFER
12:14:25 48240 added retry item for T:electron.ecohosting.co.uk:
185.53.59.181:465: errno=-37 more_errno=0,A flags=2
It appears that exim is failing to make a TCP Fast Open connection to the
remote server. In fact it looks like it doesn't even try. Wireshark shows
that it closes the connection immediately after receiving the first response
from the remote server.
No. Time Source Destination Protocol
Length Info
1 0.000000000 192.168.1.13 185.53.59.181 TCP
78 54794 → 465 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 WS=64 SACK_PERM=1
TSval=4153193768 TSecr=0 TFO=R
No. Time Source Destination Protocol
Length Info
2 0.022693018 185.53.59.181 192.168.1.13 TCP
74 465 → 54794 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1452
SACK_PERM=1 TSval=2019435234 TSecr=4153193768 WS=128
No. Time Source Destination Protocol
Length Info
3 0.022718023 192.168.1.13 185.53.59.181 TCP
66 54794 → 465 [FIN, ACK] Seq=1 Ack=1 Win=66752 Len=0 TSval=4153193791
TSecr=2019435234
This problem is not related to a specific remote server, I have tried routing
through mail servers run by four different service providers and see exactly
the same behaviour.
But looking at a similar section from debug output for 4.93 it appears that
4.93 does not attempt to use TCP Fast Open even though hosts_try_fastopen is
set to *.
12:00:36 44920 Connecting to electron.ecohosting.co.uk [185.53.59.181]:465
... 185.53.59.181 in hosts_try_fastopen? yes (matched "*")
12:00:36 44920 non-TFO mode connection attempt to 185.53.59.181, 0 data
12:00:36 44920 connected
12:00:36 44920 ╭considering: $primary_hostname
12:00:36 44920 ├──expanding: $primary_hostname
12:00:36 44920 ╰─────result: curlew.milibyte.co.uk
12:00:36 44920 185.53.59.181 in hosts_avoid_esmtp? no (option unset)
12:00:36 44920 185.53.59.181 in hosts_require_ocsp? no (option unset)
12:00:36 44920 185.53.59.181 in hosts_request_ocsp? yes (matched "*")
12:00:36 44920 setting SSL CTX options: 0x42004000
12:00:36 44920 Diffie-Hellman initialized from default with 2048-bit prime
12:00:36 44920 Initialized TLS
12:00:36 44920 185.53.59.181 in tls_verify_hosts? no (option unset)
12:00:36 44920 185.53.59.181 in tls_try_verify_hosts? yes (matched "*")
12:00:36 44920 tls_verify_certificates: system
12:00:36 44920 185.53.59.181 in tls_verify_cert_hostnames? yes (matched "*")
12:00:36 44920 Cert hostname to check: "electron.ecohosting.co.uk"
12:00:36 44920 Calling SSL_connect
12:00:36 44920 SSL_connect: before SSL initialization
12:00:36 44920 SSL_connect: SSLv3/TLS write client hello
12:00:36 44920 SSL_connect: SSLv3/TLS write client hello
12:00:36 44920 SSL_connect: SSLv3/TLS read server hello
12:00:36 44920 SSL verify ok: depth=2 SN=/O=Digital Signature Trust Co./
CN=DST Root CA X3
12:00:36 44920 SSL verify ok: depth=1 SN=/C=US/O=Let's Encrypt/CN=Let's
Encrypt Authority X3
12:00:36 44920 SSL authenticated verify ok: depth=0 SN=/
CN=electron.ecohosting.co.uk
12:00:36 44920 SSL_connect: SSLv3/TLS read server certificate
12:00:36 44920 SSL_connect: SSLv3/TLS read server key exchange
12:00:36 44920 Received TLS status callback (OCSP stapling):
12:00:36 44920 null
12:00:36 44920 SSL_connect: SSLv3/TLS read server done
So as an experiment I removed the default value of * for hosts_try_fastopen
for 4.94 by reseting it in the transport section of the configuration file f:
remote_auth_smtp:
driver = smtp
protocol = smtps
hosts_require_auth = $host_address
hosts_require_tls = $host_address
hosts_try_fastopen =
This appears to have 'fixed' the problem with exim 4.94 but leaves various
questions in my mind:
1) exim 4.93 appears to not use TCP Fast Open despite the default value of
hosts_try_fastopen implying that it should use it. Was this a bug or a
feature?
2) exim 4.94 attempts to respect the default value of hosts_try_fastopen and
tries to use TFO but fails. Should it not fall back to not using TFO after a
failure instead of aborting and is this a bug?
3) Is TFO supported by FreeBSD 12? The exim documentation states "On (at
least some) current Linux distributions the facility must be enabled in the
kernel by the sysadmin before the support is usable." - https://www.exim.org/
exim-html-current/doc/html/spec_html/ch-the_smtp_transport.html .
Would this also apply for FreeBSD though I assume it is supported because
sysctl shows net.inet.tcp.fastopen.client_enable: 1
3) Is there some error in my configuration which was not sufficient to cause
a problem with 4.93 but causes problems with 4.94? Since web searches haven't
produced details of anyone else suffering from this problem then it could be
quite likely that I've done something wrong.
--
Mike Clarke
More information about the freebsd-questions
mailing list