EPERM from write()
Steve Watt
steve at Watt.COM
Tue May 3 01:58:13 PDT 2005
I'm down in the bowels of chasing a weird bug that is being presented
by sendmail when doing TLS. For *some* clients it works, but for the
ones that fail, they fail hard.
There are various weird things in my analyses.
The proximate symptom is that the call to write() to put out the
ServerHello message returns -1, errno 1 (EPERM), but I can see from
the tcpdump trace that most (all?) of it got out. Looking backwards
through the ktrace reveals that the fd that is being written to (fd 4)
is the fd that the "Ready to start TLS" message went out on, and that
(previous) write completed successfully.
Here's the ktrace from that successful write to the one that
blew up, sans numerous calls to gettimeofday and getpid:
35388 sendmail GIO fd 4 wrote 30 bytes
"220 2.0.0 Ready to start TLS\r
"
35388 sendmail RET write 30/0x1e
35388 sendmail CALL fcntl(0x8,0x3,0)
35388 sendmail RET fcntl 2
(get flags on fd 8)
35388 sendmail CALL fcntl(0x8,0x4,0x6)
35388 sendmail RET fcntl 0
(set O_NONBLOCK on fd 8)
35388 sendmail CALL break(0x8191000)
35388 sendmail RET break 0
35388 sendmail CALL break(0x8196000)
35388 sendmail RET break 0
35388 sendmail CALL read(0x8,0x8188000,0xb)
35388 sendmail RET read -1 errno 35 Resource temporarily unavailable
35388 sendmail CALL select(0x9,0xbfbfbe00,0,0xbfbfb850,0xbfbfb648)
35388 sendmail RET select 1
35388 sendmail CALL read(0x8,0x8188000,0xb)
(Here's the ClientHello)
35388 sendmail GIO fd 8 read 11 bytes
0x0000 807c 0103 0100 6300 0000 10 |.|....c....|
35388 sendmail RET read 11/0xb
35388 sendmail CALL read(0x8,0x818800b,0x73)
(and the rest of the ClientHello, with all the cipher suites)
35388 sendmail GIO fd 8 read 115 bytes
0x0000 0000 3900 0038 0000 3500 0016 0000 1300 000a 0700 |..9..8..5...........|
0x0014 c000 0033 0000 3200 002f 0300 8000 0066 0000 0500 |...3..2../.....f....|
0x0028 0004 0100 8008 0080 0000 6300 0062 0000 6100 0015 |..........c..b..a...|
0x003c 0000 1200 0009 0600 4000 0065 0000 6400 0060 0000 |........ at ..e..d..`..|
0x0050 1400 0011 0000 0800 0006 0400 8000 0003 0200 804d |...................M|
0x0064 57e4 2a11 c967 1804 e745 7cfc a991 93 |W.*..g...E|....|
35388 sendmail RET read 115/0x73
35388 sendmail CALL break(0x8197000)
35388 sendmail RET break 0
35388 sendmail CALL break(0x8198000)
35388 sendmail RET break 0
(here's the write that fails, but I see all 0xa59 bytes on the wire!)
35388 sendmail CALL write(0x4,0x8174000,0xa59)
35388 sendmail RET write -1 errno 1 Operation not permitted
(And just for fun the syslog that comes out, too.)
35388 sendmail CALL sendto(0x3,0xbfbfa4d0,0x6d,0,0,0)
35388 sendmail GIO fd 3 wrote 109 bytes
"<20>May 2 23:16:25 sm-mta[35388]: STARTTLS=server, error: accept failed=-1, SSL\
_error=5, timedout=0, errno=1"
35388 sendmail RET sendto 109/0x6d
I'm really kind of puzzled at this point. Why would I get -1/EPERM
on an apparently completed write?
The box is 5.4-STABLE, fresh cvsup as of 2 May 05 17:47 GMT. I
forced a rebuild of the SASL stuff (cyrus-sasl-2.1.20_1 and
cyrus-sasl-saslauthd-2.1.20_1) , and then rebuilt sendmail again.
I've got the following in my make.conf re: sendmail building:
SENDMAIL_CFLAGS+=-I/usr/local/include -DSASL=2
SENDMAIL_CFLAGS+=-D_FFR_SMTP_SSL
SENDMAIL_LDFLAGS+=-L/usr/local/lib
SENDMAIL_LDADD+=-lsasl2
The client machine appears to be running 8.13.1.
What I see in the tcpdump disassembly (thanks to Ethereal!) is
(paraphrased)
{client} {mybox} TCP S
{mybox} {client} TCP SA
{client} {mybox} TCP A
{mybox} {client} TCP A seq=1 ack=1 len=89 "220 {mybox} ESMTP Sendmail 8.13.3/8.13.3" (+date)
{client} {mybox} TCP A seq=1 ack=90 len=0
{client} {mybox} TCP A seq=1 ack=90 len=24 "EHLO {client}\r\n"
{mybox} {client} TCP A seq=90 ack=25 len=226 ESMTP options menu, includes STARTTLS
{client} {mybox} TCP A seq=25 ack=316 len=0
{client} {mybox} TCP A seq=25 ack=316 len=10 "STARTTLS\r\n"
{mybox} {client} TCP A seq=316 ack=35 len=30 "220 Ready to start TLS\r\n"
{client} {mybox} TCP A seq=35 ack=346 len=0
{client} {mybox} TCP A seq=35 ack=346 len=126
SSLv2 Record len 124
Handshake: ClientHello, TLS 1.0, gaggle of cipher
suites, no resume session, 16 byte challenge
{mybox} {client} TCP A seq=346 ack=161 len=1448
TLS Record Handshake (22) TLS 1.0 (0x0301) len 74
Handshake: ServerHello (2), random, cipher suite
TLS_DHE_RSA_WITH_AES256_CBC_SHA, no compression
TLS Record Handshake (22) TLS 1.0 len 2184
Handshake: Certificate pile (first segment)
{client} {mybox} TCP A seq=161 ack=1794 len=0
{mybox} {client} TCP FPA seq=1796 ack=161 len=1199
Continuation of certificate pile
TLS Record Handshake (22) TLS 1.0 (0x0301) len 205
Handshake: ServerKeyExchange
TLS Record Handshake (22) TLS 1.0 (0x0301) len 166
Handshake: CertificateRequest
{client} {mybox} TCP A seq=161 ack=1794 len=0
{mybox} {client} TCP FPA seq=1794 ack=161 len=1201
Continuation of certificate pile
TLS Record Handshake (22) TLS 1.0 (0x0301) len 205
Handshake: ServerKeyExchange
TLS Record Handshake (22) TLS 1.0 (0x0301) len 166
Handshake: CertificateRequest
{client} {mybox} TCP A seq=161 ack=2996 len=146
TLS Record Handshake (22) TLS 1.0 (0x0301) len 7
Handshake: Certificate (0 certificates)
TLS Record Handshake (22) TLS 1.0 (0x0301) len 70
Handshake: ClientKeyExchange (len 66)
TLS Record ChangeCipherSpec TLS 1.0 (0x0301) len 1
TLS Record Handshake (22) TLS 1.0 (0x0301) len 48
Handshake: Encrypted
{mybox} {client} TCP RST seq=2996 ack={0} len=0
The offset sequence number in the first FIN segment from my
FreeBSD box (sequence 1796, when 1794 is the next expected)
is reproduceable.
Another odd thing is that the client responded to the SSL record
from my server, which I find mildly unexpected since there's no
ServerHelloDone message. I suppose that could be because the
data got pushed up the SSL stack because of the FIN, and it didn't
notice.
Extra intriguingly, the ISC server that's running the DHCP
mailing list can successfully transfer mail, and it uses TLS.
As can any number of other servers...
This appears to have broken sometime between late February and
mid April, but enough sites work that I didn't notice it until
recently.
I'm pretty stumped, and looking for ideas... Solving the TLS
handshake problem is first, but that 2 byte skip in sequence
numbers is _weird_. I suppose it could be related somehow.
--
Steve Watt KD6GGD PP-ASEL-IA ICBM: 121W 56' 57.8" / 37N 20' 14.9"
Internet: steve @ Watt.COM Whois: SW32
Free time? There's no such thing. It just comes in varying prices...
More information about the freebsd-hackers
mailing list