kern/160943: iSCSI initiator ignores block offset causing silent data corruption

Fri Sep 23 16:40:10 UTC 2011

>Number:         160943
>Category:       kern
>Synopsis:       iSCSI initiator ignores block offset causing silent data corruption
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Sep 23 16:40:09 UTC 2011
>Closed-Date:
>Last-Modified:
>Originator:     Craig Boston
>Release:        8.2 Stable
>Organization:
>Environment:
FreeBSD XXX 8.2-STABLE FreeBSD 8.2-STABLE #0: Wed Sep 21 14:58:49 CDT 2011     root at XXX:/compile/obj/compile/src/sys/GENERIC  amd64
>Description:
This is the result of troubleshooting silent data corruption issues when setting up an EqualLogic DS4000 iSCSI unit. The corruption was detected during testing as checksum errors in ZFS soon after the pool was created, but I was also able to reproduce the problem with UFS. It is identical to the issue reported here:

http://lists.freebsd.org/pipermail/freebsd-scsi/2010-June/004403.html

Down to the very same inode number being corrupt after a fresh newfs / fsck cycle.

After examining ktrace output from newfs and fsck to determine which block was different when read than when written, I cross referenced that with a network dump and identified the following exchange for a 64k write (simplified):

Initiator:  SCSI Write 128 blocks (i.e. length = 0x10000)
Target:     Ready to Transfer, desried data length = 0x0c000
Initiator:  Data out, length = 0x0c000 [correct data]
Target:     Ready to Transfer, buffer offset = 0x0c000, desired data length = 0x04000
Initiator:  Data out, length = 0x04000 [WRONG DATA!]

In the second data transfer from the initiator, it ignores the buffer offset, and instead sends the first 0x04000 bytes again. This results in incorrect data being written to the disk.

I'm not sure exactly why the EqualLogic unit sometimes sends R2Ts with a length of 0x10000 and sometimes uses 0x0c0000 (maybe related to its internal striping scheme). Such behavior is unusual, but perfectly valid according to the RFC. Whatever the reason, it's a bug that we don't correctly follow the iSCSI spec.

The attached patch corrects this.
>How-To-Repeat:
Get an EqualLogic DS4000 or DS6500, connect to it over iSCSI, and watch your data be silently corrupted as it's written.

Alternatively, modify a software iSCSI target to break transfers into smaller pieces by sending R2Ts with different offsets / lengths.
>Fix:

--- sys/dev/iscsi/initiator/iscsi_subr.c.orig   2011-09-23 10:38:12.000000000 -0500
+++ sys/dev/iscsi/initiator/iscsi_subr.c        2011-09-23 11:17:03.000000000 -0500
@@ -84,6 +84,7 @@
               caddr_t          bp = csio->data_ptr;

               bo = ntohl(r2t->bo);
+              bp += MIN(bo, edtl - ddtl);
               bleft = ddtl;

               if(sp->opt.maxXmitDataSegmentLength > 0) // danny's RFC

>Release-Note:
>Audit-Trail:
>Unformatted: