cam / ata timeout limited to 2147 due to overflow bug?

Steven Hartland killing at multiplay.co.uk
Thu Aug 4 23:12:59 UTC 2011


I'm working on adding security methods to camcontrol and have
come up against a strange issue. It seems that the timeout
value for cam, at least on ata (ahci), is limited to less than
2148 seconds.

This can be seen by running:-
camcontrol identify ada0 -t 2148 -v
(pass0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
(pass0:ahcich0:0:0:0): CAM status: Command timeout

Also seen in /var/log/messages at this time is:-
Aug  4 23:29:51 cfdev kernel: ahcich0: Timeout on slot 24
Aug  4 23:29:51 cfdev kernel: ahcich0: is 00000000 cs 01000000 ss 00000000 rs 01000000 tfd d0 serr 00000000

Dropping the timeout down to 2147 and the command runs fine.

I've done some digging and it seems like this is implemented via:-
sys/dev/ahci/ahci.c
ahci_execute_transaction(struct ahci_slot *slot)
{
...
    /* Start command execution timeout */
    callout_reset(&slot->timeout, (int)ccb->ccb_h.timeout * hz / 2000,
        (timeout_t*)ahci_timeout, slot);

Now its documented that:-
"Non-positive values of ticks are silently converted to the value 1"

So I suspect that this is what's happening resulting in an extremely
small timeout instead of a large one. Now I know that passed in value
to the timeout is seconds * 1000 so we should be seeing 2148000
for ccb->ccb_h.timeout now multiply that by 1000 (hz) and your over
the int wrap point 2147483647.

So instead of the wrap point being 2147483 seconds (24 days), I suspect
because of the way this is structured its actually 2147 seconds (26mins).

If this is the case the fix is likely to be something like:-
 callout_reset(&slot->timeout, (int)(ccb->ccb_h.timeout * (hz / 2000)),

Does this sound reasonable? What I don't understand is why the /2000?

For reference the reason for wanting a large timeout is that a
secure erase of large media could take many hours so I'm using
the erase time reported by the drive for this, in my case here is
400 minutes.

Currently this instantly fails with a Command timeout which is
clearly not right.

    Regards
    Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster at multiplay.co.uk.



More information about the freebsd-hackers mailing list