[Bug 261855] /etc/periodic/daily/800.scrub-zfs is using `zpool history`

Reply: bugzilla-noreply_a_freebsd.org: "[Bug 261855] /etc/periodic/daily/800.scrub-zfs is using `zpool history`"
Reply: bugzilla-noreply_a_freebsd.org: "[Bug 261855] /etc/periodic/daily/800.scrub-zfs is using `zpool history`"
Go to: [ bottom of page ] [ top of archives ] [ this month ]

From: <bugzilla-noreply_at_freebsd.org>
Date: Thu, 10 Feb 2022 08:33:14 UTC

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=261855

            Bug ID: 261855
           Summary: /etc/periodic/daily/800.scrub-zfs is using `zpool
                    history`
           Product: Base System
           Version: 13.0-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: bin
          Assignee: bugs@FreeBSD.org
          Reporter: lapo@lapo.it

Daily scrub periodic task is currently using `zpool history` to check the last
time a `zpool scrub` was started (note: not completed, or executed, just
started).

On my system, which has a lots of filesystems and hourly
snapshot/send/recv/destroy jobs in order to replicate it to/from another system
my `zpool history` only covers 48h of data and thus periodic scrub (configured
to 35 days as per default) gets executed every 3 days (this is the reason I
looked into it in the first place).

Right now the logic is more or less:
1. check last scrub start in `zpool history`
2. calculate diff, if < 35 just stop
3. check "scan:" line from `zpool status` and do different stuff

I would suggest to remove the usage of `zpool history` and replace it with a
parsing of the `zpool status` "scan:" line, which can be done with `LANG=C date
-j -f '%a %b %d %T %Y'`.

% zpool status | fgrep 'scan:'
  scan: scrub canceled on Thu Feb 10 08:40:04 2022
  scan: scrub repaired 0B in 00:00:26 with 0 errors on Sun Jan 23 19:09:50 2022

This can be easily integrated in the existing `case` by adding a new match like
`*"scrub repaired"*|*"scrub canceled"*)` and calculate the end of last scrub
based on that.

Caveat: `zpool status` gets reset each time a (manual) scrub or pause or cancel
is done. But the current method has caveats too (using start time instead of
end time, using a hisotry command which can not be very trustworthy in some
situations) so I think this would be an overall better solution.

What do you think?

I can prepare a diff for this.

-- 
You are receiving this mail because:
You are the assignee for the bug.