Fwd: The Morning Paper: NOVA - A log-structured file system for hybrid volatile/non-volatile main memories
George Neville-Neil
gnn at neville-neil.com
Sat May 7 14:55:34 UTC 2016
It's time for the project to start thinking about these issues IMHO.
Best,
George
Forwarded message:
> From: The Morning Paper <the-morning-paper at onelanday.co.uk>
> To: gnn at neville-neil.com
> Subject: The Morning Paper: NOVA - A log-structured file system for
> hybrid volatile/non-volatile main memories
> Date: Fri, 6 May 2016 05:14:40 +0000
>
> The implications of combined DRAM and NVMM memory for file system
> design.
> View this email in your browser
> (http://us9.campaign-archive1.com/?u=4188b6afbe9e5d43111fef4d4&id=26178dba42&e=ae5e599a57)
> This paper write-up is also available online at The Morning Paper.
> (http://blog.acolyer.org/2016/05/06/nova-a-log-structured-file-system-for-hybrid-volatilenon-volatile-main-memories)
>
>
> ** the morning paper
> ------------------------------------------------------------
>
>
> ** NOVA: A log-structured file system for hybrid volatile/non-volatile
> main memories
> ------------------------------------------------------------
>
> NOVA: A Log-structured file system for hybrid volatile/non-volatile
> main memories
> (http://cseweb.ucsd.edu/~swanson/papers/FAST2016NOVA.pdf) - Xu &
> Swanson 2016
>
> Another paper looking at the design implications of mixed DRAM and
> NVMM systems (it’s the future!), this time in the context of file
> systems. (NVMM = Non-volatile Main Memory).
>
> Hybrid DRAM/NVMM storage systems present a host of opportunities and
> challenges for system designers. These systems need to minimize
> software overhead if they are to fully exploit NVMM’s high
> performance and efficiently support more flexible access patterns, and
> at the same time they must provide the strong consistency guarantees
> that applications require and respect the limitations of emerging
> memories (e.g. limited program cycles).
>
> Why can’t we just take an existing file system and run it on top of
> a hybrid memory system? These file systems were built for the
> performance characteristics of disks (spinning or SSDs) - whereas NVMM
> and DRAM provide vastly improved performance. They where also built to
> rely on the consistency guarantees of disks (e.g. atomic sector
> updates), but memory provides different consistency guarantees from
> disks. One of the central issues here is the under-the-covers
> reordering of memory stores, and the need to explicitly flush data
> from CPU caches to compensate
> (https://blog.acolyer.org/2016/01/21/blurred-persistence/) . This can
> easily destroy any performance gains from NVMM if you’re not
> careful.
>
> To overcome all these limitations, we present the NOn-Volatile memory
> Accelerated (NOVA) log-structured file system. NOVA adapts
> conventional log-structured file system techniques to exploit the fast
> random access provided by hybrid memory systems. This allows NOVA to
> support massive concurrency, reduce log size, and minimize garbage
> collection costs while providing strong consistency guarantees for
> conventional file operations and mmap-based load/store accesses.
>
> All of this hard work pays off: “We find that NOVA is significantly
> faster than existing file systems in a wide range of applications and
> outperforms file systems that provide the same data consistency
> guarantees by between 3.1x and 13.5x in write-intensive workloads.”
>
> There is a lot of detailed information about NOVA’s implementation
> in the paper. Here I want to focus on the authors’ excellent
> discussion of what’s different about hybrid memory systems, and how
> they approached the high-level design of NOVA as a consequence.
>
>
> ** Challenges in designing for hybrid memory systems
> ------------------------------------------------------------
>
> Xu & Swanson outline three fundamental challenges when designing for
> hybrid memory systems:
> 1. Realising the performance potential of the hardware
> 2. Write reordering and its impact on consistency
> 3. Providing atomicity for operations
>
>
> ** Performance
> ------------------------------------------------------------
>
> The low latencies of NVMMs alters the trade-offs between hardware and
> software latency. In conventional storage systems, the latency of slow
> storage devices (e.g., disks) dominates access latency, so software
> efficiency is not critical. Previous work has shown that with fast
> NVMM, software costs can quickly dominate memory latency, squandering
> the performance that NVMMs could provide…
>
> It is possible to bypass the DRAM page cache and access NVMM directly
> using a technique called Direct Access (DAX), or eXecute In Place
> (XIP), avoiding extra copies between NVMM and DRAM in the storage
> stack.
>
> NOVA is a DAX file system, and we expect that all NVMM file systems
> will provide for these (or similar) features.
>
>
> ** Write re-ordering
> ------------------------------------------------------------
>
> Modern processors and their caching hierarchies may reorder store
> operations to improve performance. The CPU’s memory consistency
> protocol makes guarantees about the ordering of memory updates, but
> existing models (with the exception of research proposals [20, 46]) do
> not provide guarantees on when updates will reach NVMMs. As a result,
> a power failure may leave the data in an inconsistent state.
>
> It’s possible to explicitly flush caches and issue memory barriers
> to enforce write ordering. However, while an mfence will enforce order
> on memory operations before and after the barrier, it only guarantees
> all CPUs have the same view of the memory. It does not impose any
> constraints on the order of data writebacks to the NVMM.
>
> Intel has proposed new instructions to fix these problems, which
> include clflushopt, clwb and pcommit. “NOVA is built with these
> instructions in mind…”
>
>
> ** Atomicity
> ------------------------------------------------------------
>
> Existing file systems use a variety of techniques like journaling,
> shadow paging, or log-structuring to provide atomicity guarantees.
>
> A journaling (WAL) system records all updates to a journal before
> applying them, and in the case of a power failure replays the journal
> to restore the system to a consistent state. Shadow paging is a
> copy-on-write mechanism in which a new copy of affected pages is
> written to storage on a write, before swapping out any references to
> the old pages for the new ones. Log-structured file systems (LFS)
> buffer random writes in memory and then convert them into larger
> sequential writes to the disk. This frequent a steady supply of
> contiguous free regions of disk, which in turn entails frequent
> cleaning and compacting of the log to reclaim space.
>
> RAMCloud (https://blog.acolyer.org/2016/01/18/ramcloud/) is an example
> of a DRAM based storage system that keeps all its data in DRAM to
> service reads, and keeps a persistent version on disk. It uses log
> structure for both DRAM and disk.
>
>
> ** NOVA design principles
> ------------------------------------------------------------
>
> NOVA is a log-structured, POSIX file system that builds on the
> strengths of LFS and adapts them to take advantage of hybrid memory
> systems. Because it targets a different storage technology, NOVA looks
> very different from conventional log-structured file systems that are
> built to maximize disk bandwidth.
>
> Three observations influenced the design:
> 1. Logs that support atomic updates are easy to implement in NVMM, but
> are not efficient for search operations (e.g. directory lookup and
> file random access). Data structures that support fast search (e.g.
> trees) are more difficult to implement correctly and efficiently in
> NVMM.
> 2. The complexity of log cleaning in LFS comes from the need for
> contiguous free regions of storage. In NVMM however, random access is
> cheap and therefore we don’t need to write in contiguous regions and
> hence don’t need such complex cleaning protocols.
> 3. NVMMs support fast, highly concurrent random accesses, and
> therefore using multiple logs does not negatively impact performance.
>
> Based on this, NOVA:
> * Keeps logs in NVMM, and indexes (radix trees) in DRAM.
> * Gives each inode its own log, which allows concurrent updates across
> files without synchronization. During recovery, NOVA can replay
> multiple logs simultaneously.
> * Uses logging and lightweight journaling for complex atomic updates.
> NOVA’s log-structure provides cheaper atomic updates than journaling
> or shadow paging. “To atomically write to a log, NOVA first appends
> data to the log, and then atomically updates the log tail to commit
> the updates, thus avoiding both the duplicate writes overhead of
> journaling file systems and the cascading update costs of shadow
> paging systems.”
> * Implements the log as a singly linked list! The locality benefits of
> sequential logs are less important in NVMM, so NOVA uses a linked list
> of 4KB NVMM pages.
>
> Allowing for non-sequential log storage provides three advantages.
> First, allocating log space is easy since NOVA does not need to
> allocate large, contiguous regions for the log. Second, NOVA can
> perform log cleaning at fine-grained page-size granularity. Third,
> reclaiming log pages that contain only stale entries requires just a
> few pointer assignments.
>
> * Finally, NOVA does not log file data. NOVA uses copy-on-write for
> modified pages, and appends metadata about the write to the log.
>
> The high-level layout of the NOVA data structures looks like this:
>
> NOVA’s atomicity comes from a combination of:
> * 64-bit atomic updates - NOVA exploits processor support for 64-bit
> atomic writes to memory to directly modify metadata for some
> operations (e.g. a file’s atime for reads), and to commit updates to
> the log by updating the inode’s log tail pointer.
> * Logging in the inode’s log to record operation that modify a
> single node.
> * Lightweight journaling for directory operations that require changes
> to multiple nodes.
> * Enforced write ordering by: (1) committing data and log entries to
> NVMM before updating the log tail; (2) committing journal data to NVMM
> before propagating updates; and (3) committing new versions of data
> pages to NVMM before recycling stale ones. If NOVA is running on a
> system that supports the new clflushopt’ clwb, and pcommit
> instructions it will use these to enforce the write ordering,
> otherwise it uses movntq, “a non-temporal move instruction that
> bypasses the CPU cache hierarchy to perform direct writes to NVMM,”
> and a combination of clflush and mfence.
>
>
> ** Evaluation
> ------------------------------------------------------------
>
> Figure 6 shows how NOVA file system operation latency compares to
> other file system across different NVMM configurations.
>
> Note that NOVA is more sensitive to NVMM performance than the other
> file systems because NOVA’s software overheads are lower, and so
> overall performance more directly reflects the underlying memory
> performance.
>
> Figure 7 shows how NOVA compares to other file systems across four
> Filebench workloads: a file server, web proxy, web server, and varmail
> (emulates an email server).
>
> Overall, NOVA achieves the best performance in almost all cases, and
> provides data consistency guarantees that are as strong or stronger
> than the other file systems. The performance advantages of NOVA are
> largest on write-intensive workloads with large numbers of files.
>
> http://twitter.com/intent/tweet?text=How%20NVMM%20changes%20optimum%20file%20system%20design.:
> http%3A%2F%2Fblog.acolyer.org%2F2016%2F05%2F06%2Fnova-a-log-structured-file-system-for-hybrid-volatilenon-volatile-main-memories
> Tweet
> (http://twitter.com/intent/tweet?text=How%20NVMM%20changes%20optimum%20file%20system%20design.:
> http%3A%2F%2Fblog.acolyer.org%2F2016%2F05%2F06%2Fnova-a-log-structured-file-system-for-hybrid-volatilenon-volatile-main-memories)
> This email was brought to you by #themorningpaper
> (http://blog.acolyer.org) : an interesting/influential/important paper
> from the world of CS every weekday morning, as selected by Adrian
> Colyer
>
> ============================================================
>
> Copyright © 2016 One L and a Y Ltd, All rights reserved.
> You are receiving this email because you opted into email delivery
> for your copy of The Morning Paper.
>
> Our mailing address is:
> One L and a Y Ltd
> Unit 5755
> PO Box 6945
> London, England W1A 6US
> United Kingdom
> ** unsubscribe from this list
> (http://acolyer.us9.list-manage.com/unsubscribe?u=4188b6afbe9e5d43111fef4d4&id=de5773de0c&e=ae5e599a57&c=26178dba42)
> ** update subscription preferences
> (http://acolyer.us9.list-manage.com/profile?u=4188b6afbe9e5d43111fef4d4&id=de5773de0c&e=ae5e599a57)
More information about the freebsd-fs
mailing list