The 2.6.30 kernel is chock full of next-gen file systems. One such example is NILFS, a new log-structured file system that dramatically improves write performance.
It’s difficult to write storage articles at this time and not focus on the upcoming 2.6.30 kernel. Why? This kernel is loaded with a number of new file systems — some of which we’ve already covered, like ext4 and btrfs. Another of the hot new file systems that is in 2.6.30 is NILFS. This file system is definitely one that you should be testing.
NILFS2 (New Implementation of a Log-Structured File System Version 2) is a very promising new log-structured file system that has continuous snapshots and versioning of the entire file system. This means that you can recover files that were deleted or unintentionally modified as well as perform backups at any time from a snapshot without a performance penalty normally associated with creating snapshots. In addition, there is evidence that NILFS has extremely good performance on SSD drives.
Log-Structured File System?
Log-Structured File Systems are a bit different than other file systems with both good points and bad points. Rather than write to a tree structure such as a b-tree or an h-tree, either with or without a journal, a log-structured file system writes all data and metadata sequentially in a continuous stream that is called a log (actually it is a circular log).
The concept was developed by John Ousterhout of TCL fame and Fred Douglis. The motivation behind log-structured file systems is that typical file systems lay out data based on spatial locality for rotating media (hard drives). But rotating media tends to have slow seek times limiting write performance. In addition, it was presumed that most IO would become write dominated (this observation is supported by a study that was summarized in a recent article). So a log-structured file system takes a new approach and treats the file system as a circular log and writes sequentially to the “head” of the log (the beginning) never over writing the existing log. This means that seeks are kept to a minimum because everything is sequential, improving write performance.
A log-structured file system, because of its design, makes it very easy to create snapshots (in NILFS they are called checkpoints) of both the data and metadata. NILFS can then mount these checkpoints (or snapshots) along side the primary NILFS file system. From these checkpoints, you can recover erased files (if the checkpoint has a date and time prior to when the file was erased) or you can use it for backups or even disaster recovery images.
Another benefit of log-structured file systems is that recovering from a crash is easier than the more typical tree based file systems (e.g. ext2, ext3, etc.). After a log-structured file system crashes, when it is remounted it can reconstruct its state from the last consistent point in the log. It starts at the head of the circular log and backs up until the file system is consistent. This point should be very close to the head so little if any data or metadata will be lost. This process is extremely fast regardless of the size of the file system.
This bears repeating – a log-structured file system recovers from a crash extremely fast and the amount of time is independent of the size of the file system. In contrast, other file systems have to replay their journal and possibly even walk their data structures to make sure the file system is consistent (i.e. run “fsck”). Everyone who has run fsck on a very large file system knows how much time it can take.
One problematic aspect of log-structured file systems is that they need to include a fairly sophisticated capability of “garbage collection” to reclaim free space. Free space needs to be reclaimed from the tail of the log, primarily the old check points, so that the file system doesn’t become full when the head of the log wraps around to the tail. There are many techniques for reclaiming space, one is covered in the Wikipedia article about log-structured file systems. The garbage collection process reclaims space from the check points (snap shots) otherwise the file system would fill far too quickly.
A Log Structured File System for Linux – NILFS
The Nippon Telephone and Telegraph (NTT) CyberSpace Laboratories has been developing NILFS (also referred to as NILFS2 since it is the version 2 of the file system) for Linux. It is released under the GPL 2.0 license and is included in the 2.6.30 kernel. It spent a great deal of time in the -mm kernels and under went much testing since it’s initial announcement.
One of the most noticeable features of NILFS is that it can “continuously and automatically save instantaneous states of the file system without interrupting service”. NILFS refers to these as checkpoints. In contrast, other file systems such as ZFS, can provide snapshots but they have to suspend operation to perform the snapshot operation. NILFS doesn’t have to do this. The snapshots (checkpoints) are part of the file system design itself.
One of the really cool features of NILFS is that these checkpoints can actually be mounted along side the primary file system. This has many, many uses, one of which is to mount a checkpoint to recover files that were unintentionally erased.
In addition to being able to recover recently erased files and extremely fast crash recovery times, there are a number of other features of NILFS that are very attractive:
- The file size and inode numbers are stored as 64-bit fields
- File sizes of up to 8 EiB (Exbibyte – approximately an Exabyte)
- Block sizes that are smaller than a page size (i.e. 1KB-2KB). This can potentially make NILFS much faster for small files than other file systems.
- File and inode blocks use a B-tree (the use of B-trees in a log-structured file system stems from the implementation which use something called segments)
- NILFS uses 32-bit checksums (CRC32) on data and metadata for integrity assurance
- Correctly ordered data and meta-data writes
- Redundant superblock
- Read-ahead for meta data files as well as data files (helps read performance)
- Continuous check pointing which can be used for snapshots. These can be used for backups or they can even be used for recovering files.
Next: Checkpoints and Snapshots