CAM Target over FC and UNMAP problem

Fri Mar 13 17:08:01 UTC 2015

Hi Alexander,
thanks for your comments, I'm still working on this

On 06.03.2015 13:28, Alexander Motin wrote:
> On 06.03.2015 11:49, Emil Muratov wrote:
>> On 05.03.2015 22:16, Alexander Motin wrote:
>>> What's about the large amount of reads during UNMAP, I have two guesses:
>>> 1) it may be read of metadata absent in ARC. Though I doubt that there
>>> are so much metadata to read them during several minutes.
>> Just to be sure I setup SSD card, made L2 ARC cache over it and set the
>> vol properties to 'secondarycache=metadata'. Then run the tests again -
>> acording to gstat ssd is almost idle both for reads and writes but hdds
>> are still heavily loaded for reads.
> L2ARC is empty on boot and filled at limited rate. You may need to read
> the file several times before deleting it to make metadata get into L2ARC.
Done more tests with L2ARC. Warming up L2ARC gives small improvement (if
any), but the problem with IO blocking timeouts is still actual.
Observing gstat behavior during IO blocks I can see that HDDs are
reading something lazily at a rate about 100-200 iops for several
seconds (disk queue bumps to 10, 100% busy) than for an instant comes
l2ARC reads burst with 3000-4000 iops and than again many seconds of
lazy hdd reads. Maybe I should dive deep into L2ARC hits and misses, but
I think that there is something more than metadata reads queue, see
below. Not sure if I should disable L2ARC and do clean tests or continue
with caching improvement.

>>> 2) if UNMAP ranges were not aligned to ZVOL block, I guess ZFS could try
>>> to read blocks that need partial "unmap". I've made experiment with
>>> unmapping 512 bytes of 8K ZVOL block, and it indeed zeroed specified 512
>>> bytes, from SCSI perspective while it would be fine to just ignore the
>>> request.
>> Maybe I should take a closer look into this. Although I've tried to do
>> best to align upper layer fs to zvol blocks, I've put GPT over LUN,
>> win2012 should align it to 1M boundaries, than formatted NTFS partition
>> with 8K cluster. As far as I can see during heavy writes there is no
>> reads at the same time from the zvol, but I will do some more tests
>> investigating this point.
> You should check for reads not only during writes, but also during
> REwrites. If initiator actively practices UNMAP, then even misaligned
> initial write may not cause read-modify-write cycle, since there is just
> nothing to read.
Simple large files overwrite test shows interesting results - system
writes several gigs chunk of data to disks for about a minute or two,
gstat shows constant large speed disk writes with a low disk queue,
almost no reads - so there shouldn't be any misalign problems.  Than for
a very short period of time here comes that blocking behavior - writes
stops, queue bumps to 10 and guest IO blocks for several seconds.
Although this blocks are clearly visible but short in time and it
doesn't produce guest OS timeouts. Looks like when CoW releases unused
blocks from zvol those issues arises. I should repeat this test with
UNMAP disabled, not sure if this is ZFS CoW or UNMAP behavior.

>
>> Besides this why there should be so a lot of reads at the first place?
>> Isn't it enough to just update metadata to mark unmapped blocks as free?
> As I can see in ZFS code, if UNMAP is not aligned to zvol blocks, then
> first and last blocks are not unmapped, but instead affected parts are
> written with zeroes. Those partial writes may trigger read-modify-write
> cycle, if data are not already in cache. SCSI spec allows device to skip
> such zero writes, and I am thinking about implementing such filtering on
> CTL level.
>
>> And what is the most annoying is that all IO blocks for a time, I'm not
>> an expert in this area but isn't there any way to reorder or delay those
>> unmap op's or even drop it out if there are a lot of other pending IOs?
> That was not easy to do, but CTL should be clever about this now. It
> should now block only access to blocks that are affected by specific
> UNMAP command. From the other side after fixing this issue on CTL level
> I've noticed that in ZFS UNMAP also significantly affects performance of
> other commands to the same zvol.
>
> To check possible CTL role in this blocking you may try to add to your
> LUN configuration `option reordering unrestricted`. It makes CTL to not
> track any potential request collisions. If after that UNMAP will still
> block other I/Os, then all questions to ZFS.

I've tried 'reordering=unrestricted' - not much of a help indeed for a
single zvol. But working with two zvols simultaneously gives other
results. Reading/writing/unmapping data on the same zvol blocks
everything very fast. Reading/writing one zvol and unmapping files on
another zvol blocks only that particular zvol where unmap is in
progress. IO to the other zvol is still processed, only with a
performance penalty and more bursty in nature, but at least no timeouts
on the guest and tons of ctl errors in log on the target.

I've made another test - attached 3rd zvol to the guest and initiated
large data EXTENDED_COPY with a guest system from the 2nd zvol to the
3rd zvol. Monitoring gstat, I saw that fast speed disk reads and writes
is in progress, than I  started unmapping lots of large files from the
first zvol. At the beginning when 1st zvol blocked completely both
operations (EXTENDED_COPY and UNMAP) worked in parallel, disk queue
bumped to 10, reads/writes speed decreased (but still stayed at
fast-copy level). Continue pushing with more large unmaps I've reached
disk queue bumped to 20-30 and then it all went to the state when all 3
zvols blocked, fast disk reads and writes stopped and disk IO went into
a previously mentioned long lazy reads/short L2ARC burst reads pattern,
it lasted for a minute or two until all unmaps were finished and 
EXTENDED_COPY continued.

More and more confusion with all of this.