Exim closes connection when negotiating SSL handshake

Sat Sep 5 16:49:43 UTC 2020

A couple of months ago I had a problem with exim after it was upgraded exim-
sa-exim from 4.93 to 4.94. When attempting to send mail exim would close the 
connection while trying to establish an SSL connection.

I've now managed get 4.94 working but I would appreciate any advice on 
whether what I've done is a valid modification to my configuration or if it's 
just a lucky workaround which works for now but might come back and bite me 
after some future upgrade.

The original problem was that exim would immediately close the connection on 
starting to negotiate an SSL handshake. It would then periodically retry to 
send the email and eventually succeed but only after an unpredictable number 
of retries, for example one message succeeded on the second attempt after 30 
minutes but a different one needed seven retries over three and a half hours.

I posted the problem in an earlier thread here and received a number of 
useful comments which helped me to track down where things were going wrong 
but didn't enable me to determine why it was happening or to resolve the 
problem so as a temporary workaround I reverted to 4.93. The full thread is 
at  https://lists.freebsd.org/pipermail/freebsd-questions/2020-June/
290344.html 

Things have now moved on and a recent upgrade to perl has led to a dependency 
issue which I temporarily worked around by adding an entry for libperl.so.
5.30 in /etc/libmap.conf but this has prompted me to try to fix the problem 
properly before any future dependency issues create a more serious problem.

The following extract from truss output for exim 4.94 while attempting to 
send an email covers the SSL negotiation

3216: read(9,"##\n##  ca-root-nss.crt -- Bundl"...,131072) = 131072 (0x20000)

After about 40 lines reading /etc/ssl/cert.pem it continues with:

 3216: read(9,0x8011adec0,131072)                = 0 (0x0)
 3216: close(9)                                  = 0 (0x0)
 3216: issetugid()                               = 0 (0x0)
 3216: setitimer(0,{ 0.000000, 300.000000 },{ 0.000000, 0.000000 }) = 0 (0x0)
 3216: mmap(0x0,24576,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 
34385039360 (0x801821000)
 3216: mmap(0x0,24576,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 
34385063936 (0x801827000)
 3216: getpid()                                  = 3216 
(0xc90)
 3216: getpid()                                  = 3216 
(0xc90)
 3216: getpid()                                  = 3216 
(0xc90)
 3216: getpid()                                  = 3216 
(0xc90)
 3216: getpid()                                  = 3216 
(0xc90)
 3216: getpid()                                  = 3216 
(0xc90)
 3216: write(7,"\^V\^C\^A\^A%\^A\0\^A!\^C\^Ck"...,298) ERR#57 'Socket is not 
connected'

A similar section for 4.93 follows the same pattern but ends with 
 88001: write(7,"\^V\^C\^A\^A%\^A\0\^A!\^C\^C\M-`"...,298) = 298 (0x12a)
... and goes on to successfully send the email.
When I look in the debug output from exim 4.94 for details of the SSL 
negotiation I see the following 

12:14:25 48240 Connecting to electron.ecohosting.co.uk [185.53.59.181]:465 
... 185.53.59.181 in hosts_try_fastopen? yes (matched "*")
12:14:25 48240  TFO mode connection attempt to 185.53.59.181, 0 data
12:14:25 48240  connected
12:14:25 48240  ╭considering: $primary_hostname
12:14:25 48240  ├──expanding: $primary_hostname
12:14:25 48240  ╰─────result: curlew.milibyte.co.uk
12:14:25 48240 185.53.59.181 in hosts_avoid_esmtp? no (option unset)
12:14:25 48240 185.53.59.181 in hosts_require_ocsp? no (option unset)
12:14:25 48240 185.53.59.181 in hosts_request_ocsp? yes (matched "*")
12:14:25 48240 setting SSL CTX options: 0x42004000
12:14:25 48240 Diffie-Hellman initialized from default with 2048-bit prime
12:14:25 48240 Initialized TLS
12:14:25 48240 185.53.59.181 in tls_verify_hosts? no (option unset)
12:14:25 48240 185.53.59.181 in tls_try_verify_hosts? yes (matched "*")
12:14:25 48240 tls_verify_certificates: system
12:14:25 48240 185.53.59.181 in tls_verify_cert_hostnames? yes (matched "*")
12:14:25 48240 Cert hostname to check: "electron.ecohosting.co.uk"
12:14:25 48240 Calling SSL_connect
12:14:25 48240 SSL_connect: before SSL initialization
12:14:25 48240 SSL_connect: error in SSLv3/TLS write client hello
12:14:25 48240 TLS error '(SSL_connect): error:
00000000:lib(0):func(0):reason(0)'
12:14:25 48240 TLS session fail: (SSL_connect): error:
00000000:lib(0):func(0):reason(0)
12:14:25 48240   SMTP(close)>>
12:14:25 48240 set_process_info: 48240 delivering 1k7GcL-000CY0-IH: just 
tried electron.ecohosting.co.uk [185.53.59.181]:465 for #REDACTED#@gmail.com: 
result DEFER
12:14:25 48240 added retry item for T:electron.ecohosting.co.uk:
185.53.59.181:465: errno=-37 more_errno=0,A flags=2

It appears that exim is failing to make a TCP Fast Open connection to the 
remote server. In fact it looks like it doesn't even try. Wireshark shows 
that it closes the connection immediately after receiving the first response 
from the remote server.

No.     Time           Source                Destination           Protocol 
Length Info
      1 0.000000000    192.168.1.13          185.53.59.181         TCP      
78     54794 → 465 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 WS=64 SACK_PERM=1 
TSval=4153193768 TSecr=0 TFO=R

No.     Time           Source                Destination           Protocol 
Length Info
      2 0.022693018    185.53.59.181         192.168.1.13          TCP      
74     465 → 54794 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1452 
SACK_PERM=1 TSval=2019435234 TSecr=4153193768 WS=128

No.     Time           Source                Destination           Protocol 
Length Info
      3 0.022718023    192.168.1.13          185.53.59.181         TCP      
66     54794 → 465 [FIN, ACK] Seq=1 Ack=1 Win=66752 Len=0 TSval=4153193791 
TSecr=2019435234

This problem is not related to a specific remote server, I have tried routing 
through mail servers run by four different service providers and see exactly 
the same behaviour.

But looking at a similar section from debug output for 4.93 it appears that 
4.93 does not attempt to use TCP Fast Open even though hosts_try_fastopen is 
set to *.

12:00:36 44920 Connecting to electron.ecohosting.co.uk [185.53.59.181]:465 
... 185.53.59.181 in hosts_try_fastopen? yes (matched "*")
12:00:36 44920  non-TFO mode connection attempt to 185.53.59.181, 0 data
12:00:36 44920  connected
12:00:36 44920  ╭considering: $primary_hostname
12:00:36 44920  ├──expanding: $primary_hostname
12:00:36 44920  ╰─────result: curlew.milibyte.co.uk
12:00:36 44920 185.53.59.181 in hosts_avoid_esmtp? no (option unset)
12:00:36 44920 185.53.59.181 in hosts_require_ocsp? no (option unset)
12:00:36 44920 185.53.59.181 in hosts_request_ocsp? yes (matched "*")
12:00:36 44920 setting SSL CTX options: 0x42004000
12:00:36 44920 Diffie-Hellman initialized from default with 2048-bit prime
12:00:36 44920 Initialized TLS
12:00:36 44920 185.53.59.181 in tls_verify_hosts? no (option unset)
12:00:36 44920 185.53.59.181 in tls_try_verify_hosts? yes (matched "*")
12:00:36 44920 tls_verify_certificates: system
12:00:36 44920 185.53.59.181 in tls_verify_cert_hostnames? yes (matched "*")
12:00:36 44920 Cert hostname to check: "electron.ecohosting.co.uk"
12:00:36 44920 Calling SSL_connect
12:00:36 44920 SSL_connect: before SSL initialization
12:00:36 44920 SSL_connect: SSLv3/TLS write client hello
12:00:36 44920 SSL_connect: SSLv3/TLS write client hello
12:00:36 44920 SSL_connect: SSLv3/TLS read server hello
12:00:36 44920 SSL verify ok: depth=2 SN=/O=Digital Signature Trust Co./
CN=DST Root CA X3
12:00:36 44920 SSL verify ok: depth=1 SN=/C=US/O=Let's Encrypt/CN=Let's 
Encrypt Authority X3
12:00:36 44920 SSL authenticated verify ok: depth=0 SN=/
CN=electron.ecohosting.co.uk
12:00:36 44920 SSL_connect: SSLv3/TLS read server certificate
12:00:36 44920 SSL_connect: SSLv3/TLS read server key exchange
12:00:36 44920 Received TLS status callback (OCSP stapling):
12:00:36 44920  null
12:00:36 44920 SSL_connect: SSLv3/TLS read server done

So as an experiment I removed the default value of * for hosts_try_fastopen 
for 4.94 by reseting it in the transport section of the configuration file f:

remote_auth_smtp: 
 driver = smtp 
 protocol = smtps 
 hosts_require_auth = $host_address 
 hosts_require_tls = $host_address 
 hosts_try_fastopen =

This appears to have 'fixed' the problem with exim 4.94 but leaves various 
questions in my mind:

1) exim 4.93 appears to not use TCP Fast Open despite the default value of 
hosts_try_fastopen implying that it should use it. Was this a bug or a 
feature?

2) exim 4.94 attempts to respect the default value of hosts_try_fastopen and 
tries to use TFO but fails. Should it not fall back to not using TFO after a 
failure instead of aborting and is this a bug?

3) Is TFO supported by FreeBSD 12? The exim documentation states "On (at 
least some) current Linux distributions the facility must be enabled in the 
kernel by the sysadmin before the support is usable." - https://www.exim.org/
exim-html-current/doc/html/spec_html/ch-the_smtp_transport.html .
Would this also apply for FreeBSD though I assume it is supported because 
sysctl shows net.inet.tcp.fastopen.client_enable: 1

3) Is there some error in my configuration which was not sufficient to cause 
a problem with 4.93 but causes problems with 4.94? Since web searches haven't 
produced details of anyone else suffering from this problem then it could be 
quite likely that I've done something wrong.

-- 
Mike Clarke