Using SSDs as swap

Konstantin Belousov kostikbel at gmail.com
Sat Aug 8 11:41:14 UTC 2015


On Sat, Aug 08, 2015 at 01:23:03PM +0200, Willem Jan Withagen wrote:
> On 8-8-2015 12:38, Konstantin Belousov wrote:
> > On Sat, Aug 08, 2015 at 01:29:00PM +0300, Konstantin Belousov wrote:
> >> On Sat, Aug 08, 2015 at 12:06:06PM +0200, Willem Jan Withagen wrote:
> >>> one of the following commits just passed with this in the log, and it
> >>> triggered again a question I've been having for some time again already.
> >>>
> >>> ----
> >>> Log:
> >>>   Enable BIO_DELETE passthru in GELI, so TRIM/UNMAP can work as expected
> >>> when
> >>>   GELI is used on a SSD or inside virtual machine, so that guest can tell
> >>>   host that it is no longer using some of the storage.
> >>> -----
> >>>
> >>> In ZFS I slice my SSD's into log and caches, but on a a server with
> >>> little memory (which can't be grown) I use a partion on each ssd as swap
> >>> as well. So swappinging does not have to seek, and has faster loading
> >>> time. To allocate a few GB on aan SSD to swap is not really all that
> >>> painfull, given current sizes, but the speed difference with regular
> >>> spindels is impressive.
> >>>
> >>> But the questions are:
> >>> 1) Does the swap driver understand that backing-store needs a TRIM?
> >> No.
> >>
> >>> 1a) if not would it be useful, and what would it take to implement?
> >> One good thing is that it is simply the question of coding: the VM
> >> already has a place where it informs the swap pager that the page copy
> >> in swap is no longer needed. this is the vm_pager_page_unswapped() call
> >> and swap pager method swap_pager_unswapped(). swp_pager_meta_ctl() would
> >> need to issue BIO_DELETE to the backing storage.
> >>
> >> On the other hand, note that this would increase the amount of work
> >> performed, even for the swap volumes located on the rotating media,
> >> which is more typical and reasonable setup.
> >>
> >> I think an implementation and a knob to turn it off, or configure per
> >> swap partition, would be reasonable.
> > 
> > One additional thing: while BIO_DELETE is in progress, the swap block
> > cannot be marked free, since otherwise we could write other page and
> > get it obliterated with the TRIM. This can be done async, but the
> > consequence is that swap space would be released and usable some time
> > after the page-in.  This will affect loads which are close to OOM.
> 
> Sort of makes sense to me...
> 
> I take it that BIO_DELETE fires and returns before TRIM is completed?
> But then the SSD accepts writes to a TRIMmed block, but then mixes this
> up? Possibly deleting a write to a to be trimmed block? This sort of
> strikes me as odd, but then I do not know the full intricate details of
> TRIM on SSD
> 
> Would it be possible to be notified that a TRIM has completed, only then
> to actually free the swap sectors?
This is exactly what I wrote above.

> And then perhaps the swap bookkeeping does not yet accommodate for a
> possible extra state?
It does not need to.  The in-flight BIO_DELETE remembers the intermediate
state, the swap block should be freed only after the storage reported the
BIO_DELETE as finished.  It is exactly the same as UFS handles trimming
of the free blocks, the bitmap of the used/freed blocks is only updated
after the BIO_DELETE is finished, not when the inode drops reference to
the block.

> 
> Speaking about blocks.... Does Swap take into account that disks could
> be of a sectorsize other than 512 bytes. I would guess so, since we
> could have a 4K disk as swap disk, and doing read-modify-write for swap
> is sure going to kill performance.
swap performs i/o in the page-sized chunks at least, which are min 4k on
all supported platforms (even on arms, where we do not support smaller
pages AFAIK).


More information about the freebsd-fs mailing list