fetch hangs when trying to http-download from http://ftp5.de.FreeBSD.org/

Josh Carroll josh.carroll at gmail.com
Fri Nov 5 23:24:29 UTC 2010


> Here's the last 30 lines of the output from kdump after it has hung
> (the trace file no longer gets written to once the fetch process
> hangs):

*snip*

>  38016 fetch    RET   read 53/0x35
>  38016 fetch    CALL  read(0x3,0x81006835,0x3cb)

I believe this read corresponds to this part of fetch.c (with line
numbers for reference from stable/8 svn):

 625         if ((size = fread(buf, 1, size, f)) == 0) {
 626             if (ferror(f) && errno == EINTR && !sigint)
 627                 clearerr(f);
 628             else
 629                 break;
 630         }

This fread() never returns the second time through the loop in the bad
case. Since I'm not very good with gdb, I just added some printf()'s
throughout this section of the code and pulled the fread() out of the
if() so I could check the return value explicitly. Comparing a fetch
from that http server and from my own local http server (with a copy
of the file in question) shows the following.

The first time through the loop, the first 4096 bytes are properly read in:

before stat_start()
ecore-txt-0.9.9.042.tbz                         0% of 6594  B    0
Bpsafter stat_start()
reset sigalrm, siginfo and sigint to 0
setup SIGINFO handler
while we don't get a sigint
size set to B_size: 4096
Before calling: size = fread(buf, 1, 4096, f) (fileno(f) is: -1)
after fread(), fread returned setting size = 4096
After check for size ?= 0 and fread()
After stat_update()
while we don't get a sigint
size = 6594 - 4096 = 2498
size after: 2498
Before calling: size = fread(buf, 1, 2498, f) (fileno(f) is: -1)

But this is where it hangs in the case of that particular server/file
combination. If I fetch the same file from my local apache server, I
see it properly read the remaining 2498 bytes and finish up:

after fread(), fread returned setting size = 2498
After check for size ?= 0 and fread()
After stat_update()
while we don't get a sigint
size = 6594 - 6594 = 0
size after: 0
Before calling: size = fread(buf, 1, 0, f) (fileno(f) is: -1)
after fread(), fread returned setting size = 0
fread() returned 0
We weren't interrupted, break out of while()
AFTER large while(!sigint) loop
!sigalarm
Set SIGINFO back to SIG_DFL
Before stat_end()
ecore-txt-0.9.9.042.tbz                       100% of 6594  B   59 MBps
after stat_end()


So for some reason it's hanging during that second fread() for that
particular file for that particular server. Perhaps the pcap Taras
provided will shed some light on why this fread() is hanging.

I was able to fetch a different tarball (zsh-4.3.10_4.tbz) from that
server without any problem, so there is something in particular about
the combination of that file/server that is causing the problem.

Thanks,
Josh


More information about the freebsd-stable mailing list