cam / ata timeout limited to 2147 due to overflow bug?

Kostik Belousov kostikbel at gmail.com
Fri Aug 5 08:00:01 UTC 2011


On Fri, Aug 05, 2011 at 12:02:19AM +0100, Steven Hartland wrote:
> I'm working on adding security methods to camcontrol and have
> come up against a strange issue. It seems that the timeout
> value for cam, at least on ata (ahci), is limited to less than
> 2148 seconds.
> 
> This can be seen by running:-
> camcontrol identify ada0 -t 2148 -v
> (pass0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 
> 00
> (pass0:ahcich0:0:0:0): CAM status: Command timeout
> 
> Also seen in /var/log/messages at this time is:-
> Aug  4 23:29:51 cfdev kernel: ahcich0: Timeout on slot 24
> Aug  4 23:29:51 cfdev kernel: ahcich0: is 00000000 cs 01000000 ss 00000000 
> rs 01000000 tfd d0 serr 00000000
> 
> Dropping the timeout down to 2147 and the command runs fine.
> 
> I've done some digging and it seems like this is implemented via:-
> sys/dev/ahci/ahci.c
> ahci_execute_transaction(struct ahci_slot *slot)
> {
> ...
>    /* Start command execution timeout */
>    callout_reset(&slot->timeout, (int)ccb->ccb_h.timeout * hz / 2000,
>        (timeout_t*)ahci_timeout, slot);
> 
> Now its documented that:-
> "Non-positive values of ticks are silently converted to the value 1"
> 
> So I suspect that this is what's happening resulting in an extremely
> small timeout instead of a large one. Now I know that passed in value
> to the timeout is seconds * 1000 so we should be seeing 2148000
> for ccb->ccb_h.timeout now multiply that by 1000 (hz) and your over
> the int wrap point 2147483647.
> 
> So instead of the wrap point being 2147483 seconds (24 days), I suspect
> because of the way this is structured its actually 2147 seconds (26mins).
> 
> If this is the case the fix is likely to be something like:-
> callout_reset(&slot->timeout, (int)(ccb->ccb_h.timeout * (hz / 2000)),
For hz == 1000, hz / 2000 == 0 according to the C rules, so the
result will be 0 always.

> 
> Does this sound reasonable? What I don't understand is why the /2000?
> 
> For reference the reason for wanting a large timeout is that a
> secure erase of large media could take many hours so I'm using
> the erase time reported by the drive for this, in my case here is
> 400 minutes.
> 
> Currently this instantly fails with a Command timeout which is
> clearly not right.
> 
>    Regards
>    Steve
> 
> ================================================
> This e.mail is private and confidential between Multiplay (UK) Ltd. and the 
> person or entity to whom it is addressed. In the event of misdirection, the 
> recipient is prohibited from using, copying, printing or otherwise 
> disseminating it or any information contained in it. 
> In the event of misdirection, illegible or incomplete transmission please 
> telephone +44 845 868 1337
> or return the E.mail to postmaster at multiplay.co.uk.
> 
> _______________________________________________
> freebsd-hackers at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20110805/5892fec4/attachment.pgp


More information about the freebsd-hackers mailing list