cam / ata timeout limited to 2147 due to overflow bug?
Kostik Belousov
kostikbel at gmail.com
Fri Aug 5 08:00:01 UTC 2011
On Fri, Aug 05, 2011 at 12:02:19AM +0100, Steven Hartland wrote:
> I'm working on adding security methods to camcontrol and have
> come up against a strange issue. It seems that the timeout
> value for cam, at least on ata (ahci), is limited to less than
> 2148 seconds.
>
> This can be seen by running:-
> camcontrol identify ada0 -t 2148 -v
> (pass0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00
> 00
> (pass0:ahcich0:0:0:0): CAM status: Command timeout
>
> Also seen in /var/log/messages at this time is:-
> Aug 4 23:29:51 cfdev kernel: ahcich0: Timeout on slot 24
> Aug 4 23:29:51 cfdev kernel: ahcich0: is 00000000 cs 01000000 ss 00000000
> rs 01000000 tfd d0 serr 00000000
>
> Dropping the timeout down to 2147 and the command runs fine.
>
> I've done some digging and it seems like this is implemented via:-
> sys/dev/ahci/ahci.c
> ahci_execute_transaction(struct ahci_slot *slot)
> {
> ...
> /* Start command execution timeout */
> callout_reset(&slot->timeout, (int)ccb->ccb_h.timeout * hz / 2000,
> (timeout_t*)ahci_timeout, slot);
>
> Now its documented that:-
> "Non-positive values of ticks are silently converted to the value 1"
>
> So I suspect that this is what's happening resulting in an extremely
> small timeout instead of a large one. Now I know that passed in value
> to the timeout is seconds * 1000 so we should be seeing 2148000
> for ccb->ccb_h.timeout now multiply that by 1000 (hz) and your over
> the int wrap point 2147483647.
>
> So instead of the wrap point being 2147483 seconds (24 days), I suspect
> because of the way this is structured its actually 2147 seconds (26mins).
>
> If this is the case the fix is likely to be something like:-
> callout_reset(&slot->timeout, (int)(ccb->ccb_h.timeout * (hz / 2000)),
For hz == 1000, hz / 2000 == 0 according to the C rules, so the
result will be 0 always.
>
> Does this sound reasonable? What I don't understand is why the /2000?
>
> For reference the reason for wanting a large timeout is that a
> secure erase of large media could take many hours so I'm using
> the erase time reported by the drive for this, in my case here is
> 400 minutes.
>
> Currently this instantly fails with a Command timeout which is
> clearly not right.
>
> Regards
> Steve
>
> ================================================
> This e.mail is private and confidential between Multiplay (UK) Ltd. and the
> person or entity to whom it is addressed. In the event of misdirection, the
> recipient is prohibited from using, copying, printing or otherwise
> disseminating it or any information contained in it.
> In the event of misdirection, illegible or incomplete transmission please
> telephone +44 845 868 1337
> or return the E.mail to postmaster at multiplay.co.uk.
>
> _______________________________________________
> freebsd-hackers at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20110805/5892fec4/attachment.pgp
More information about the freebsd-hackers
mailing list