Unexpected SU+J inconsistency AGAIN -- please, don't shift topic to ZFS!
Lev Serebryakov
lev at FreeBSD.org
Thu Feb 28 14:57:02 UTC 2013
Hello, Ivan.
You wrote 28 февраля 2013 г., 18:19:38:
>> Maybe, it is subtile interference between raid5 implementation and
>> SU+J, but in such case I want to understand what does raid5 do
>> wrong.
IV> You guessed correctly, I was going to blame geom_raid5 :)
It is not first time :( But every time such discussion ends without
any practical results.
One time, Kirk say, that delayed writes are Ok for SU until bottom
layer doesn't lie about operation completeness. geom_raid5 could
delay writes (in hope that next writes will combine nicely and allow
not to do read-calculate-write cycle for read alone), but it never
mark BIO complete until it is really completed (layers down to
geom_raid5 returns completion). So, every BIO in wait queue is "in
flight" from GEOM/VFS point of view. Maybe, it is fatal for journal :(
And want I really want to see is "SYNC" flag for BIO and that all
journal-related writes will be marked with it. Also all commits
originated with fsync() MUST be marked in same way, really. Alexander
Motin (ahci driver author) assured me, that he'll add support for
such flag in driver to flush drive cache too, if it will be
introduced.
IMHO, lack of this (or similar) flag is bad idea even without
geom_raid5 with its optimistic behavior.
There was commit r246876, but I don't understand exactly what it
means, as no real FS or driver's code was touched.
But I'm writing about this idea for 3rd or 4th time without any
results :( And I don't mean, that it should be implemented ASAP by
someone, I mean I didn't see any support from FS guys (Kirk and
somebody else, I don't remember exactly participants of these old
thread, but he was not you) like "go ahead and send your patch". All
these threads was very defensive from FS guru side, like "we don't
need it, fix hardware, disable caches".
IV> Is this a production setup you have? Can you afford to destroy it and
IV> re-create it for the purpose of testing, this time with geom_raid3
IV> (which should be synchronous with respect to writes)?
Unfortunately, it is production setup and I don't have any spare
hardware for second one :(
I've posted panic stacktrace -- and it is FFS-related too -- and now
preparing setup with only one HDD and same high load to try reproduce
it without geom_raid5. But I don't have enough hardware (3 spare HDDs
at least!) to reproduce it with geom_raid3 or other copy of
geiom_radi5.
--
// Black Lion AKA Lev Serebryakov <lev at FreeBSD.org>
More information about the freebsd-current
mailing list