The last article talked about the anatomy of SSDs and the origins of some of the their characteristics. In this article, we break down tuning storage and file systems for SSDs with an eye toward improving performance and helping overcome some of the platform's limitations.
In our last article, we did a deep-dive on the anatomy of SSDs, starting with the basics of the NAND Flash cells that are floating-gate transistors. The transistors are then combined to form pages, which are formed into blocks, which are formed into planes, which are formed into chips, which are formed into drives. As discusssed, floating-gate transistors have a few limitations:
- Very fast read performance
- Asymmetric read/write performance (reads are 2-3 orders of magnitude faster than writes)
- There are data retention limitations due to leakage and due to exercising the cells (i.e. using the erase/program cycles)
- Shrinking the dies to increase density increases the probability of data corruption from erase/program functions disturbing neighboring cells
- NAND Flash cells have a limited number of erase/program cycles before they can no longer retain data
- NAND Flash cells can read a byte at a time or read/write a page at a time, but an entire block must be erased if one cell in the block is erased
- Seek times for NAND Flash chips is much lower than hard drives
Floating-gate transistors are then used to create pages of transistors. Then these pages are combined into blocks and the blocks into planes. Then planes are combined to create chips and chips are used to create SSD drives. The benefits of SSD drives themselves have been discussed fairly pervasively:
- The seek times are extremely small because there are no mechanical parts. This gives SSD drives amazing IOPS performance.
- The performance is asymmetrical in reads and writes with reads being amazing fast and writes not so fast but still with very good performance.
- While not discussed in this article, because there are no moving parts in the drive there is no danger of the drive head impacting the platters causing the lose of data.
The article focused a bit more on the the limitations of SSD drives that are a result of the floating-gate transistors but are also a result of the design of the NAND Flash arrays as well. These limitations are:
- The performance is asymmetrical in reads and writes with reads being amazing fast and writes not so fast but still with very good performance (this is both a feature and a limitation for SSDs).
- Floating-gate transistors, and subsequently SSD drives, have a limited number of erase/program cycles after which they are incapable of storing any data. SLC cells have about a 100,000 cycle limit while MLC cells have about a 5,000-10,000 cycle limit.
- Due to the construction of the NAND Flash chips, data can only be erased in block units (512KB) but can be written in page (4KB) units.
As pointed out, some of these limitations give rise to problems in SSD drives. But SSD drive manufacturers are moving to address this problems as discussed in the article.
So we, the consumers and users of SSDs, have to live with both the benefits and limitations of SSD drives. However, as you likely realize, to get data from the application to the actual storage medium requires traversing several layers of software and possibly hardware. This article talks about options for configuring or tuning the software layers to improve the performance of SSD drives.
IO/Schedulers
IO Schedulers in the kernel have been discussed here in a previous article. IO schedulers have been developed over the years to better handle varied IO loads for multiple applications and multiple users as well as interactivity. There are also other aspects that go into the design of IO schedulers such as the fact that reads can be very important because applications will stall (also called “waiting” or “blocking”) until the read is satisfied because the application may need the data to proceed. In addition, some of the available IO schedulers have a concept called an elevator. An elevator simply orders the requests based on their physical location on the disk so that the seeks are in one direction as much as possible reducing the impact of disks seeks on performance.
If you recall, there are currently four IO schedulers in the Linux kernel.
- NOOP
- Anticipatory
- Deadline
- Completely Fair Queuing (CFQ)
CFQ is the default scheduler in the kernel at this time. It is a fairly complicated scheduler but achieves very reasonable performance for many workloads. It creates an IO queue for each process that is performing IO, but also adds the concepts of deadlines so IO requests won’t be delayed too long. If you recall it is possible for some IO operations, particularly reads, that are far out on the disk to get stranded through a combination of the IO scheduler and the elevator and may not be fulfilled for a long time or in a pathological case, they may not be fulfilled at all. Plus the CFQ scheduler develops the concept of an elevator creating an “elevator linus” to handle IO functions that may get stranded.
Recall that SSDs don’t really have any seek time because there are no mechanical parts. The seek time is very, very short because it’s all done electrically. So with SSDs it may be possible to remove or change aspects of the scheduler that attempt to minimize seek time. Consequently, using an IO scheduler that either doesn’t try to minimize seek time or control seek time or even considers seek time, could help IO performance with SSDs. Perhaps switching to the deadline scheduler or NOOP scheduler could produce better performance on SSDs. Going further, removing elevators from the scheduler could help as well. However, we know intuitively that which IO scheduler is better also depends upon the workload(s).
There are some simple examples of changing schedulers for SSDs mentioned on blogs and other web sites (an easy Google will turn up lots of questions and some suggestions). Perhaps the best information is a set of two blogs from an author that try different IO schedulers on an SSD trying to improve performance. The following are the details of the hardware/software used in the testing.
- Lenovo x301 laptop
- Lenovo SSD, PN 41W0518 that is an OEM of a Samsung SSD that is 1.8″ in size, uses MLC, has a SATA interface (3 Gbps), with an approximate read performance of 90 MB/s and a write performance of 70 MB/s.
- 2.6.28.2 kernel
The test involved a simple job of uncompressing and making the kernel. The first blog compared the time it took using the CFQ scheduler and the NOOP scheduler. Even though we know from our good benchmarking skills that the test should have been run a few times to get an estimate of the variation of the time, it was run only once. However, the author found that the NOOP scheduler had a 13% performance improvement over the CFQ scheduler.
In a second blog the same author changed the test to use “make -j 3″ in an effort to use all of the cores on his CPU (2 cores). This time the CFQ IO scheduler was better than the NOOP scheduler by about 23%.
Another study that is a bit more complete (i.e. they ran the tests several times to discover the variation in results) is from researchers at Texas A&M. The researchers found that the choice of the IO scheduler can have an impact on SSD performance that is somewhat opposite of what happens with traditional hard drives. In particular, using Postmark as the benchmark they found that for a hard drive, the NOOP scheduler gave the worst performance and the CFQ IO scheduler gave the best performance. The opposite was true for SSDs – NOOP had the best performance and CFQ had the worst performance.
Additionally, the paper did some extra testing of the three SSD drives and found that the sequential and random read performance is basically the same for SSDs. But they did find out that the random IO performance drops off drastically as the record size is decreased. More testing showed that the performance was limited by the block size of the SSD (recall that the chips are constructed from pages, blocks, and planes).
The paper continues by developing two modified IO schedulers based on the Anticipatory IO scheduler and the CFQ IO scheduler that are modified to take into account the block size of the drive and if neighboring writes are within the same block. Further testing reveals that different workloads work better with various IO schedulers including the two modified IO schedulers.
Changing the IO scheduler is a fairly simple task. The previous examples point out that some workloads can benefit from a change in IO scheduler for SSDs. But they also point out that changing the IO scheduler may not provide a performance benefit and may actually hurt performance depending upon your workload. So be sure to test with your workload before drawing any conclusions.
File System Tuning