The geom_raid(8) is not load-balancing reads across all available subdisks

Tue Apr 18 22:17:06 UTC 2017

Hi, I've got curious as to why running the build on my machine on top of
the RAID1 volume seems to prefer loading one of the drives for reading.
Digging into the code I found this:

                prio += (G_RAID_SUBDISK_S_ACTIVE - sd->sd_state) << 16;
                /* If disk head is precisely in position - highly prefer
it. */
                if (G_RAID_SUBDISK_POS(sd) == bp->bio_offset)
                        prio -= 2 * G_RAID_SUBDISK_LOAD_SCALE;
                else
                /* If disk head is close to position - prefer it. */
                if (ABS(G_RAID_SUBDISK_POS(sd) - bp->bio_offset) <
                    G_RAID_SUBDISK_TRACK_SIZE)
                        prio -= 1 * G_RAID_SUBDISK_LOAD_SCALE;
                if (prio < bestprio) {
                        best = sd;
                        bestprio = prio;
                }

Both my drives in RAID are SSDs, so I am wondering if this might be the
cause. On one hand SSDs can still have some internal buffer to cache the
nearby data blocks, on the other hand it's really difficult to define how
far that buffer might extend now and few years from now. On top of that,
single SATA link is likely to be bottleneck in today's systems (esp with
Intel XPoint) to get the data into the RAM, so perhaps ripping off this
optimization for good and just round robin requests between all available
subdsks would be a better strategy going forward?

-Max