Re: TRIM question and zfs
- In reply to: mike tancsa : "TRIM question and zfs"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 18 Dec 2024 21:49:17 UTC
On Wed, Dec 18, 2024 at 1:26 PM mike tancsa <mike@sentex.net> wrote: > TL;DR does zpool trim <poolname> actually work as well as one expects / > needs ? > I'd expect it to trim the unused part of the drive(s), but that may or may not help in high wear situations. I had a very old server that was running as RELENG_12 for many years on > some SSDs which were now getting to EOL with 6yrs of work on them-- Wear > level showed it getting low for sure. I had migrated everything live > off the box, but for some reason, trying to do a zfs send on a volume > was REALLY slow. I am talking KB/s slow. It took a long time, but it > eventually got done. As there was nothing on this server in production, > I thought it a good exercise to try and upgrade it in the field. So > buildworld to 13 and then 14. I deleted some of the old unneeded files > and got down to just the zfs volume that was left on the pool so just > under 200G. I then did a zpool trim tank1, but didnt see any improved > performance at all. Still crazy slow. So I then did > > gpart backup <disk> > /tmp/disk-part.txt > > zpool offline tank1 <disk>p1 > trim -f /dev/<disk> > cat /tmp/disk-part.txt | gpart restore <disk> > zpool online tank1 <disk>p1 > zpool replace tank1 <disk>p1 <disk>p1 > > for all 3 <disk>s in the pool one by one. > > The first resilver took 13hrs, the second 8 or so and the last 13min. > After the final resilver was done, I could do a zfs send of the volume > pretty well at full speed with zpool iostat 1 showing close to a GB/s > reads. > > I know that zfs autotrim and trim just kinda keeps track of what can and > cant be deleted. But I would have thought the zpool trim would have > had some impact ? > It all depends on the drive history. Mostly TRIMing a drive is useful for reducing write amplification. Done frequently, this gives the drive's firmware more options when it needs to do the housekeeping it does to have blocks available to write. The increased choice lets it make better decisions and reduce the extra writes it has to do to keep the data fresh and provide free blocks for future writes. Sometimes, you may get lucky and this trimming will kick off the right sort of housekeeping and result in lots of free blocks being added to the pool it keeps internally so writes are faster. But you are seeing really poor read performance. That's usually caused by data that's ending the end of its useful life in the current blocks that it occupies which triggers transparent data recovery, but at such a high level it can no longer be "line speed" of the device, but is happening at software speeds, which can be quite a bit slower. By basically wiping and rewriting the drives with the resilvers, you've refreshed all the data so now none of it takes a long time to read. Not sure what 'old data' is on the drives, but that would also explain the faster resilver times too. > Questions: > > Does this mean that prior to deploying SSDs for use in a zfs pool, you > should do a full trim -f of the disk ? > Yes. Apart from offlining and doing a trim, resilver etc, is there a better > way to get back performance ? Or with a once a week trim prior to > scrub, will it be "good enough" ? > Weekly should suffice. However, if the problem is due to 'old data' decaying and the drive's reliability software not moving it aggressively enough to preserve performance, all the trims in the world won't help. It could be that the drives are too busy (though the aggregate numbers from smart aren't suggestive of that). The current temperature is good, but if the drives baked for a while for some reason, that could explain the degraded performance. > Is there a way to tell if a disk REALLY needs to be fully trimmed other > than approximating for slowing performance ? > You might be able to look at the current wear vs the promised lifetime of the drive. 6 years is out of warranty for sure, so it may just be they are too worn for anything needing any level of performance. But usually it's performance. And even then, there's no silver bullet. > I know these disks were super old, so maybe current SSDs dont have this > issue ? Last few years I have switched to Samsung EVOs and they dont > seem to have these problems, at least not yet in any obvious way. Not > sure why this particularly showed up in the zfs volume set, and other > normal datasets performed ok. > Yea. from the smart info, it looks like you've worn them out 1/3. You've written about 112TB to the drive based on 80TB of write traffic (if I'm doing the math right). This is a fairly good number.. There's not any real link errors to speak of (which is another way you can be slow)., nor have you been thermal throttling. So you've done 80 drive writes over 6 years (or 0.03 DWPD). This is well below ratings for most TLC drives of 0.37 DWPD in the datasheet (but that's only for 3 years). The drive should be good for about 400 drive writes total, and you are at 1/5 of that. But the wear indicators are closer to 1/3 (approximately 2x what the absolute wear values would indicate). The write amp is relatively low, suggesting that trimming wouldn't help all that much, though. It's at 1.4, which would bump total bytes written into the 1/3 lifetime range. Plus TLC writes tend to be quite a bit harder on the drive than SLC writes, so that makes the 1/3 wear numbers kinda make sense. None of these raw numbers suggests a good root-cause for the slowness, which is in line with 'bad data from the nand taking a while to recover'. It all has to do with the drive's write / power-on / temperature / etc history, and many key details of that are simply unobtanium though some hints at them are in the SMART data. Not enough for be to be sure, though why your drives degraded. Warner > ---Mike > > > disk > > smartctl 7.4 2023-08-01 r5530 [FreeBSD 14.2-STABLE amd64] (local build) > Copyright (C) 2002-23, Bruce Allen, Christian Franke, > www.smartmontools.org > > === START OF INFORMATION SECTION === > Model Family: WD Blue / Red / Green SSDs > Device Model: WDC WDS100T2B0A-00SM50 > Serial Number: 191011A00A72 > LU WWN Device Id: 5 001b44 8b89825ed > Firmware Version: 401000WD > User Capacity: 1,000,204,886,016 bytes [1.00 TB] > Sector Size: 512 bytes logical/physical > Rotation Rate: Solid State Device > Form Factor: 2.5 inches > TRIM Command: Available, deterministic, zeroed > Device is: In smartctl database 7.3/5528 > ATA Version is: ACS-4 T13/BSR INCITS 529 revision 5 > SATA Version is: SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s) > Local Time is: Wed Dec 18 15:23:10 2024 EST > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > AAM feature is: Unavailable > APM level is: 128 (minimum power consumption without standby) > Rd look-ahead is: Enabled > Write cache is: Enabled > DSN feature is: Unavailable > ATA Security is: Disabled, NOT FROZEN [SEC1] > Wt Cache Reorder: Unavailable > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > General SMART Values: > Offline data collection status: (0x00) Offline data collection activity > was never started. > Auto Offline Data Collection: > Disabled. > Self-test execution status: ( 0) The previous self-test routine > completed > without error or no self-test > has ever > been run. > Total time to complete Offline > data collection: ( 0) seconds. > Offline data collection > capabilities: (0x11) SMART execute Offline immediate. > No Auto Offline data collection > support. > Suspend Offline collection upon > new > command. > No Offline surface scan supported. > Self-test supported. > No Conveyance Self-test supported. > No Selective Self-test supported. > SMART capabilities: (0x0003) Saves SMART data before entering > power-saving mode. > Supports SMART auto save timer. > Error logging capability: (0x01) Error logging supported. > General Purpose Logging supported. > Short self-test routine > recommended polling time: ( 2) minutes. > Extended self-test routine > recommended polling time: ( 10) minutes. > > SMART Attributes Data Structure revision number: 4 > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE > 5 Reallocated_Sector_Ct -O--CK 100 100 --- - 0 > 9 Power_On_Hours -O--CK 100 100 --- - 47271 > 12 Power_Cycle_Count -O--CK 100 100 --- - 33 > 165 Block_Erase_Count -O--CK 100 100 --- - 906509291245 > 166 Minimum_PE_Cycles_TLC -O--CK 100 100 --- - 1 > 167 Max_Bad_Blocks_per_Die -O--CK 100 100 --- - 34 > 168 Maximum_PE_Cycles_TLC -O--CK 100 100 --- - 33 > 169 Total_Bad_Blocks -O--CK 100 100 --- - 534 > 170 Grown_Bad_Blocks -O--CK 100 100 --- - 0 > 171 Program_Fail_Count -O--CK 100 100 --- - 0 > 172 Erase_Fail_Count -O--CK 100 100 --- - 0 > 173 Average_PE_Cycles_TLC -O--CK 100 100 --- - 12 > 174 Unexpected_Power_Loss -O--CK 100 100 --- - 19 > 184 End-to-End_Error -O--CK 100 100 --- - 0 > 187 Reported_Uncorrect -O--CK 100 100 --- - 0 > 188 Command_Timeout -O--CK 100 100 --- - 0 > 194 Temperature_Celsius -O---K 075 044 --- - 25 (Min/Max > 22/44) > 199 UDMA_CRC_Error_Count -O--CK 100 100 --- - 0 > 230 Media_Wearout_Indicator -O--CK 007 007 --- - 0x074001140740 > 232 Available_Reservd_Space PO--CK 100 100 004 - 100 > 233 NAND_GB_Written_TLC -O--CK 100 100 --- - 12346 > 234 NAND_GB_Written_SLC -O--CK 100 100 --- - 90919 > 241 Host_Writes_GiB ----CK 253 253 --- - 80762 > 242 Host_Reads_GiB ----CK 253 253 --- - 19908 > 244 Temp_Throttle_Status -O--CK 000 100 --- - 0 > ||||||_ K auto-keep > |||||__ C event count > ||||___ R error rate > |||____ S speed/performance > ||_____ O updated online > |______ P prefailure warning > > General Purpose Log Directory Version 1 > SMART Log Directory Version 1 [multi-sector log support] > Address Access R/W Size Description > 0x00 GPL,SL R/O 1 Log Directory > 0x01 SL R/O 1 Summary SMART error log > 0x02 SL R/O 2 Comprehensive SMART error log > 0x03 GPL R/O 1 Ext. Comprehensive SMART error log > 0x04 GPL,SL R/O 8 Device Statistics log > 0x06 SL R/O 1 SMART self-test log > 0x07 GPL R/O 1 Extended self-test log > 0x10 GPL R/O 1 NCQ Command Error log > 0x11 GPL R/O 1 SATA Phy Event Counters log > 0x24 GPL R/O 2261 Current Device Internal Status Data log > 0x25 GPL R/O 2261 Saved Device Internal Status Data log > 0x30 GPL,SL R/O 9 IDENTIFY DEVICE data log > 0x80-0x9f GPL,SL R/W 16 Host vendor specific log > 0xde GPL VS 8 Device vendor specific log > > SMART Extended Comprehensive Error Log Version: 1 (1 sectors) > No Errors Logged > > SMART Extended Self-test Log Version: 1 (1 sectors) > No self-tests have been logged. [To run self-tests, use: smartctl -t] > > Selective Self-tests/Logging not supported > > SCT Commands not supported > > Device Statistics (GP Log 0x04) > Page Offset Size Value Flags Description > 0x01 ===== = = === == General Statistics (rev 1) == > 0x01 0x008 4 33 --- Lifetime Power-On Resets > 0x01 0x010 4 47271 --- Power-on Hours > 0x01 0x018 6 169371253578 --- Logical Sectors Written > 0x01 0x020 6 2639812949 --- Number of Write Commands > 0x01 0x028 6 41752136282 --- Logical Sectors Read > 0x01 0x030 6 89429189 --- Number of Read Commands > 0x07 ===== = = === == Solid State Device Statistics > (rev 1) == > 0x07 0x008 1 1 N-- Percentage Used Endurance Indicator > |||_ C monitored condition met > ||__ D supports DSN > |___ N normalized value > > Pending Defects log (GP Log 0x0c) not supported > > SATA Phy Event Counters (GP Log 0x11) > ID Size Value Description > 0x0001 4 0 Command failed due to ICRC error > 0x0002 4 0 R_ERR response for data FIS > 0x0005 4 0 R_ERR response for non-data FIS > 0x000a 4 7 Device-to-host register FISes sent due to a > COMRESET > >