EPERM from write()

Steve Watt steve at Watt.COM
Tue May 3 01:58:13 PDT 2005


I'm down in the bowels of chasing a weird bug that is being presented
by sendmail when doing TLS.  For *some* clients it works, but for the
ones that fail, they fail hard.

There are various weird things in my analyses.

The proximate symptom is that the call to write() to put out the
ServerHello message returns -1, errno 1 (EPERM), but I can see from
the tcpdump trace that most (all?) of it got out.  Looking backwards
through the ktrace reveals that the fd that is being written to (fd 4)
is the fd that the "Ready to start TLS" message went out on, and that
(previous) write completed successfully.

Here's the ktrace from that successful write to the one that
blew up, sans numerous calls to gettimeofday and getpid:

 35388 sendmail GIO   fd 4 wrote 30 bytes
       "220 2.0.0 Ready to start TLS\r
       "
 35388 sendmail RET   write 30/0x1e
 35388 sendmail CALL  fcntl(0x8,0x3,0)
 35388 sendmail RET   fcntl 2
(get flags on fd 8)
 35388 sendmail CALL  fcntl(0x8,0x4,0x6)
 35388 sendmail RET   fcntl 0
(set O_NONBLOCK on fd 8)
 35388 sendmail CALL  break(0x8191000)
 35388 sendmail RET   break 0
 35388 sendmail CALL  break(0x8196000)
 35388 sendmail RET   break 0
 35388 sendmail CALL  read(0x8,0x8188000,0xb)
 35388 sendmail RET   read -1 errno 35 Resource temporarily unavailable
 35388 sendmail CALL  select(0x9,0xbfbfbe00,0,0xbfbfb850,0xbfbfb648)
 35388 sendmail RET   select 1
 35388 sendmail CALL  read(0x8,0x8188000,0xb)
(Here's the ClientHello)
 35388 sendmail GIO   fd 8 read 11 bytes
       0x0000 807c 0103 0100 6300 0000 10                        |.|....c....|

 35388 sendmail RET   read 11/0xb
 35388 sendmail CALL  read(0x8,0x818800b,0x73)
(and the rest of the ClientHello, with all the cipher suites)
 35388 sendmail GIO   fd 8 read 115 bytes
       0x0000 0000 3900 0038 0000 3500 0016 0000 1300 000a 0700  |..9..8..5...........|
       0x0014 c000 0033 0000 3200 002f 0300 8000 0066 0000 0500  |...3..2../.....f....|
       0x0028 0004 0100 8008 0080 0000 6300 0062 0000 6100 0015  |..........c..b..a...|
       0x003c 0000 1200 0009 0600 4000 0065 0000 6400 0060 0000  |........ at ..e..d..`..|
       0x0050 1400 0011 0000 0800 0006 0400 8000 0003 0200 804d  |...................M|
       0x0064 57e4 2a11 c967 1804 e745 7cfc a991 93              |W.*..g...E|....|

 35388 sendmail RET   read 115/0x73
 35388 sendmail CALL  break(0x8197000)
 35388 sendmail RET   break 0
 35388 sendmail CALL  break(0x8198000)
 35388 sendmail RET   break 0
(here's the write that fails, but I see all 0xa59 bytes on the wire!)
 35388 sendmail CALL  write(0x4,0x8174000,0xa59)
 35388 sendmail RET   write -1 errno 1 Operation not permitted
(And just for fun the syslog that comes out, too.)
 35388 sendmail CALL  sendto(0x3,0xbfbfa4d0,0x6d,0,0,0)
 35388 sendmail GIO   fd 3 wrote 109 bytes
       "<20>May  2 23:16:25 sm-mta[35388]: STARTTLS=server, error: accept failed=-1, SSL\
        _error=5, timedout=0, errno=1"
 35388 sendmail RET   sendto 109/0x6d

I'm really kind of puzzled at this point.  Why would I get -1/EPERM
on an apparently completed write?

The box is 5.4-STABLE, fresh cvsup as of 2 May 05 17:47 GMT.  I
forced a rebuild of the SASL stuff (cyrus-sasl-2.1.20_1 and
cyrus-sasl-saslauthd-2.1.20_1) , and then rebuilt sendmail again.

I've got the following in my make.conf re: sendmail building:
SENDMAIL_CFLAGS+=-I/usr/local/include -DSASL=2
SENDMAIL_CFLAGS+=-D_FFR_SMTP_SSL
SENDMAIL_LDFLAGS+=-L/usr/local/lib
SENDMAIL_LDADD+=-lsasl2

The client machine appears to be running 8.13.1.

What I see in the tcpdump disassembly (thanks to Ethereal!) is
(paraphrased)
{client} {mybox}  TCP S
{mybox}  {client} TCP SA
{client} {mybox}  TCP A
{mybox}  {client} TCP A seq=1 ack=1 len=89 "220 {mybox} ESMTP Sendmail 8.13.3/8.13.3" (+date)
{client} {mybox}  TCP A seq=1 ack=90 len=0
{client} {mybox}  TCP A seq=1 ack=90 len=24 "EHLO {client}\r\n"
{mybox}  {client} TCP A seq=90 ack=25 len=226 ESMTP options menu, includes STARTTLS
{client} {mybox}  TCP A seq=25 ack=316 len=0
{client} {mybox}  TCP A seq=25 ack=316 len=10 "STARTTLS\r\n"
{mybox}  {client} TCP A seq=316 ack=35 len=30 "220 Ready to start TLS\r\n"
{client} {mybox}  TCP A seq=35 ack=346 len=0
{client} {mybox}  TCP A seq=35 ack=346 len=126
                    SSLv2 Record len 124
                      Handshake: ClientHello, TLS 1.0, gaggle of cipher
                         suites, no resume session, 16 byte challenge
{mybox}  {client} TCP A seq=346 ack=161 len=1448
                    TLS Record Handshake (22) TLS 1.0 (0x0301) len 74
                      Handshake: ServerHello (2), random, cipher suite
                         TLS_DHE_RSA_WITH_AES256_CBC_SHA, no compression
                    TLS Record Handshake (22) TLS 1.0 len 2184
                      Handshake: Certificate pile (first segment)
{client} {mybox}  TCP A seq=161 ack=1794 len=0
{mybox}  {client} TCP FPA seq=1796 ack=161 len=1199
                      Continuation of certificate pile
                    TLS Record Handshake (22) TLS 1.0 (0x0301) len 205
                      Handshake: ServerKeyExchange
                    TLS Record Handshake (22) TLS 1.0 (0x0301) len 166
                      Handshake: CertificateRequest
{client} {mybox}  TCP A seq=161 ack=1794 len=0
{mybox}  {client} TCP FPA seq=1794 ack=161 len=1201
                      Continuation of certificate pile
                    TLS Record Handshake (22) TLS 1.0 (0x0301) len 205
                      Handshake: ServerKeyExchange
                    TLS Record Handshake (22) TLS 1.0 (0x0301) len 166
                      Handshake: CertificateRequest
{client} {mybox}  TCP A seq=161 ack=2996 len=146
                    TLS Record Handshake (22) TLS 1.0 (0x0301) len 7
                      Handshake: Certificate (0 certificates)
                    TLS Record Handshake (22) TLS 1.0 (0x0301) len 70
                      Handshake: ClientKeyExchange (len 66)
                    TLS Record ChangeCipherSpec TLS 1.0 (0x0301) len 1
                    TLS Record Handshake (22) TLS 1.0 (0x0301) len 48
                      Handshake: Encrypted
{mybox}  {client}  TCP RST seq=2996 ack={0} len=0

The offset sequence number in the first FIN segment from my
FreeBSD box (sequence 1796, when 1794 is the next expected)
is reproduceable.

Another odd thing is that the client responded to the SSL record
from my server, which I find mildly unexpected since there's no
ServerHelloDone message.  I suppose that could be because the
data got pushed up the SSL stack because of the FIN, and it didn't
notice.

Extra intriguingly, the ISC server that's running the DHCP
mailing list can successfully transfer mail, and it uses TLS.
As can any number of other servers...

This appears to have broken sometime between late February and
mid April, but enough sites work that I didn't notice it until
recently.

I'm pretty stumped, and looking for ideas...  Solving the TLS
handshake problem is first, but that 2 byte skip in sequence
numbers is _weird_.  I suppose it could be related somehow.

-- 
Steve Watt KD6GGD  PP-ASEL-IA          ICBM: 121W 56' 57.8" / 37N 20' 14.9"
 Internet: steve @ Watt.COM                         Whois: SW32
   Free time?  There's no such thing.  It just comes in varying prices...


More information about the freebsd-hackers mailing list