Author Archive

Bcache Testing: Throughput

Get your wetsuit on, we’re going data diving. Throughput benchmarks using IOzone on a common SATA disk, an Intel X25-E SSD, and Bcache, using the SSD to cache a single drive.
Hard Drive Caching with SSDs

Caching is a concept used through computing. CPUs have several levels of cache; disk drives have cache; and the list goes on. Adding a small amount of high-speed data storage relative to a large amount of slower-speed storage can make huge improvements to performance. Enter two new kernel patches — bcache and flashcache — that leverage the power of SSDs.
Cool User File Systems: GlusterFS

One the coolest file systems in User Space has got to be GlusterFS. It has a very unique architecture that allows it to be configured for specific storage requirements and scenarios. It can be used as a high-performance parallel file system, or a cloud based file system, or even a simple NFS server. All of this in user-space. Could GlusterFS represent the future of file system development for Linux?
Cool User File Systems: ArchiveMount

Have you ever wanted to look inside a tar.gz file but without expanding it? Have you ever wanted to just dump files in a .tar.gz file without having to organize it and periodically tar and gzip this data? This article presents another REALLY useful user-space file system, archivemount. It allows you to mount archives such as .tar.gz files as a file system and interact with it using normal file/directory tools.
Cool User File Systems, Part 1: SSHFS

Userspace file systems are one of the coolest storage options in Linux. They allow really creative file systems to be developed without having to go through the kernel gauntlet. This article presents one of them, SSHFS, that allows you to remotely mount a file system using ssh (sftp).
Storage Management with an LVM GUI

Have you been looking for open-source storage management tools that are easy to use and provide a graphical representation of your storage. Alas, there are no comprehensive tools but there are graphical tools that you can pair with command-line wizardry, particularly LVM.
OCFS2: Unappreciated Linux File System

It’s common knowledge that Linux has a fair number of file systems. Some of these are unappreciated and can be very useful outside their “comfort zone”. OCFS2 is a clustered file system initially contributed by Oracle and can be a great back-end file system for general, shared storage needs.
User Space File Systems

Having file systems in the kernel has its pros and cons. Being able to write file systems in user-space also has some pros and cons, but FUSE (File System in Userspace) allows you to create some pretty amazing results. This article takes a very brief look at user-space file systems and FUSE.
Creating a NAS Box Using OpenFiler

In a recent walkthru we outlined the steps for taking an existing server and converting it into a NAS box. That article assumed that you already installed Linux on the server and you will maintain that installation (i.e. updates, security, etc.). This article takes examines an alternative: a dedicated NAS distribution called OpenFiler that allows you to very simply create a stand-alone NAS box that can be administered over the web.
2.6.34 is Out; Let’s Review

If you blinked you might have missed the announcement of the new 2.6.34 kernel. Things have been happening very quickly around file systems and storage in the recent kernels so it’s probably a good idea to review the kernels from 2.6.30 to 2.6.34 and see what developments have transpired.
Creating a NAS Box with an Existing System

Standalone Network Attached Storage (NAS) servers provide file level storage to heterogeneous clients, enabling shared storage. This article presents the basics of NAS units (NFS servers) and how you can create one from an existing system.
Saving Your Data Bacon with Write Barriers and Journal Check Summing

Mmmm…. bacon. This article examines two mechanisms to prevent data loss — write barriers and check summing. Both can be particularly important for drives with larger and larger caches. Pay attention: This can save your data bacon.
Smartmontools: Ya Mon!

Last article we introduced the SMART capabilities of hard drives (who knew your drives were SMART?). In this article smartmontools, an application for examining the SMART attributes and trigger self tests, is examined.
Introduction to SMART

Did you know your drive was SMART? Actually: Self-Monitoring, Analysis, and Reporting Technology. It can be used to gather information about your hard drives and offers some additional information about the status of your storage devices. It can also be used with other tools to help predict drive failure.
Storage Technology for the Home User

Sometimes you just have to get excited about what you can buy, hold in your hand, and use in your home machines. Let’s look at some cool storage technology that the average desktop user can tackle.
Ceph: The Distributed File System Creature from the Object Lagoon

Did you ever see one of those terrible Sci-Fi movies involving a killer Octopus? Ceph, while named after just such an animal, is not a creature about to eat an unlucky Spring Breaker, but a new parallel distributed file system. The client portion of Ceph just went into the 2.6.34 kernel so let’s learn a bit more about it.
Harping on Metadata Performance: New Benchmarks

Metadata performance is perhaps the most neglected facet of storage performance. In previous articles we’ve looked into how best to improve metadata performance without too much luck. Could that be a function of the benchmark? Hmmm…
IO Profiling of Applications: strace_analyzer

In the last couple of articles we have talked about using strace to help examine the IO profile of applications (including MPI applications; think HPC). But strace output can contain hundreds of thousands of lines. In this article we talk about the using a tool called strace_analyzer to help sift through the strace output.
Intro to IO Profiling of Applications

One of the sorely missing aspects of storage is analyzing and understanding the IO patterns of applications. This article will examine some techniques for performing IO profiling of an application to illustrate what information you can gain.
2.6.33 is Out! Say Good Bye to the Anticipatory Scheduler

It’s been a few days but the latest kernel, 2.6.33 is out. There are some changes that affect the storage world that you probably need to check out.
POSIX IO Must Die!

POSIX IO is becoming a serious impediment to IO performance and scaling. POSIX is one of the standards that enabled portable programs and POSIX IO is the portion of the standard surrounding IO. But as the world of storage evolves with greatly increasing capacities and greatly increasing performance, it is time for POSIX IO to evolve or die.
Geeking Out on SSD Hardware Developments

When you’re hot, you’re hot. And SSD’s are hot right now. Let’s review recent developments in SSD hardware and to see where the technology is headed. Prepare to drool over new hardware!
Size Can Matter: Throughput Performance with a Disk-Based Journal - Part 4

Turning from Metadata performance to throughput performance, we examines the impact of journal size on ext4 when the journal is disk-based. Dig into the numbers and see what you can do to improve throughput performance.
Size Can Matter: Would You Prefer the Hard Drive or the Ramdisk this Evening? Part 3

The past couple of weeks we ran the numbers on metadata performance for ramdisks and hard drive-based journals for ext4. Now let’s compare/contrast the two journal devices and see what trends emerge.
Size Can Matter: Ramdisk Journal Metadata Performance - Part 2

Previously, we examined the impact of journal size using a separate disk on metadata performance as measured by fdtree. In this follow-up we repeat the same test but use a ramdisk for the journal, thereby boosting the best performance. Or does it?
Size Can Matter: Improving Metadata Performance with Ext4 Journal Sizing - Part I

Recently we saw that the journal device location, unfortunately, didn’t make much of a difference on ext4 metadata performance. But can the size of the journal will have an impact on metadata performance? The first in a series of articles examining the journal size and performance.
And the Sign of the Beast is 6 (Gbps that is)

In the quest for more performance there are two new standards for SATA and SAS focused on doubling current throughput to 6 Gbps. While the standards may sound like a nice potential boost don’t expect individual hard drives to increase in performance.
Improving MetaData Performance of the Ext4 Journaling Device

In the never-ending quest for more performance, we examine three different journaling device options for ext4 with an eye toward improving metadata performance. Who doesn’t like speed?
Storage Highlights of 2009

It’s the end of the year and that means it’s time to either make predictions for the coming year or review the highlights from the past year. This article takes a look at the cool things that happened around storage in the past year and perhaps hints at some things in the coming year.
2.6.32 is Out! But a Word of Caution Around CFQ

Everyone loves a shiny new kernel. The latest one, 2.6.32, was released on Dec. 3 and there are some nice updates/fixes for file systems and IO in general. But there is a very important change for the CFQ IO scheduler that you need to understand.
Two Storage Trends From SuperComputing 2009

The SuperComputing Conference/Exhibition is always a great conference for learning about storage trends in the HPC world. This year the alert attendee could spot two emerging trends: smaller companies developing innovative storage solutions and the rise of flash storage units.
Cloud Storage Concepts and Challenges

Cloud Storage — while perhaps not the best label ever invented — holds promise for the massive future storage requirements looming on the horizon. And does it at a very good price/performance ratio. This article takes a quick look at the concepts and the challenges of Cloud Storage.
Introduction to iSCSI

iSCSI is one of the hottest topics in Storage because it allows you to create centralized SANs using TCP networks rather than Fibre Channel (FC) networks. Get a handle on the main iSCSI concepts and terminology.
Helping Out SSDs

The last article talked about the anatomy of SSDs and the origins of some of the their characteristics. In this article, we break down tuning storage and file systems for SSDs with an eye toward improving performance and helping overcome some of the platform’s limitations.
Anatomy of SSDs

SSDs (Solid-State Drives) are a hot topic right now for a number of reasons; not the least of which being their power to performance ratio. But to better understand SSDs you should first get a grip on how they are constructed and the features/limitations of these drives.
Pick Your Pleasure: RAID-0 mdadm Striping or LVM Striping?

A fairly common Linux storage question: Which is better for data striping, RAID-0 or LVM? Let’s take a look at these two tools and see how they perform data striping tasks.
Tuning CFQ - What Station is That?

The last article was a quick overview of the 4 schedulers in the Linux kernel. This article takes a closer look at the Completely Fair Queuing (CFQ) scheduler and how you can tune it.
I Have a Schedule to Keep - IO Schedulers

The Linux kernel has several different IO schedulers. This article provides an introduction to the concept of schedulers and what options exist for Linux.
IOzone Performance Exploration, Part 2: The Rest of the Crowd (Almost)

We finish off our IOzone performance exploration of the major Linux file systems. This time adding ext2, jfs, xfs, btrfs, and reiserfs. Let’s take a look at the numbers.
Deduping Storage Deduplication

One of the hottest topics in the enterprise storage world is deduplication. We take a look at the technology behind the concept and discuss where it is best applicable in your storage strategy.
I Feel the Need for Speed: Linux File System Throughput Performance, Part 1

While metadata performance is important, another critical metric for measuring file systems is throughput. We put three Linux file systems their paces with IOzone.
Metadata Performance Exploration Part 2: XFS, JFS, ReiserFS, ext2, and Reiser4

More performance: We add five file systems to our previous benchmark results to creating a “uber” article on metadata file system performance. We follow the “good” benchmarking guidelines presented in a previous article and examine the good, the bad and the interesting.
Metadata Performance of Four Linux File Systems

Using the principles of good benchmarking, we explore the metadata performance of four linux file systems using a simple benchmark, fdtree.
On-line Backups: Flexible Enough for Home & the Office

Backups are a technology or process that everyone — everyone! — needs to consider. This article looks at some on-line backup options for Linux that can apply to the spectrum of home to enterprise-class users.
Linux Software RAID - A Belt and a Pair of Suspenders

Linux comes with software-based RAID that can be used to provide either a performance boost or add a degree of data protection. This article gives a quick introduction to Linux software RAID and walks through how to create a simple RAID-1 array.
Lies, Damn Lies and File System Benchmarks

Benchmarking has become synonymous with marketeering to the point it is almost useless. This article takes a look at a very important paper that can demonstrate how bad it has become and makes recommendations on how to improve the situation.
Storage Pools and Snapshots with Logical Volume Management

Logical Volume Management (LVM) on Linux: A great tool for creating pools of storage hardware that can be divided, resized, or used for snapshots.
#!*A5%amp;j9 - How to Encrypt Your File System

Protecting your data has become more important than ever. Let’s look at some options for encrypting Linux file systems.
I Like My File Systems Chunky: UnionsFS and ChunkFS

Diving deeper into UnionFS: walking through how to create and manage large file systems using the principles of ChunkFS and UnionFS.
File System Evangelist and Thought Leader: An Interview with Valerie Aurora

Jeff Layton talks to Valerie Aurora, file system developer and open source evangelist, about a wide range of subjects including her background in file systems, ChunkFS, the Union file system and how the developer ecosystem can chip in.
Read/Write Compression: Combining UnionFS and SquashFS

Need to have write capability on your SquashFS compressed filesystem? UnionFS to the rescue!
From Russia with Love: POHMELFS - A New Distributed Storage Solution

There is a new file distributed file system in the staging area of the 2.6.30 kernel called POHMELFS. Sporting better performance than classic NFS, it’s definitely worth a look.
Ramdisks - Now We Are Talking Hyperspace!

Ramdisks can offer a level of performance that is simply amazing. More than just a tool for benchmarking, there are new devices that utilize ramdisks for a bit of the ultra-performance.
FS-Cache & CacheFS: Caching for Network File Systems

FS-Cache along with CacheFS is now in the 2.6.30 kernel and can be used for local caching of AFS and NFS.
SquashFS: Not Just for Embedded Systems

Who knew that compression could be so useful in file systems? SquashFS, typically used for embedded systems, can be a great fit for laptops, desktops and, yes, even servers.
NILFS: A File System to Make SSDs Scream

The 2.6.30 kernel is chock full of next-gen file systems. One such example is NILFS, a new log-structured file system that dramatically improves write performance.
FS_scan: Getting Detailed with Your Data

Need details on your file system’s data? FS_scan allows you dig deep into your storage, giving you the ability to perform trend analysis on the results.
How Old is that Data on the Hard Drive?

The vast of amount of data being stored in this day and age, naturally leads to files sitting unused for longer and longer periods of time. A new app, agedu, can quickly tell you what data on your filesystem is lying fallow.
Churning Butter(FS): An Interview with Chris Mason

The founder of btrfs talks about features, terabyte raid arrays and comparisons with ZFS.
Linux Don’t Need No Stinkin’ ZFS: BTRFS Intro & Benchmarks

ZFS may be locked into the Solaris operating system but “Butter FS” is on the horizon and it’s boasting more features and better performance.
From ext3 to ext4: An Interview with Theodore Ts’o

Jeff Layton talks with Theodore Ts’o about getting the best performance out of your file system, painless migration and the work still to do.
ext4 File System: Introduction and Benchmarks

Destined to become the default file system for the more popular Linux distributions, ext4 is out of experimental mode and gearing up for production environments. Here’s what you need to know.
Caos NSA and Perceus: All-in-one Cluster Software Stack

Silence the struggle around cluster software stack configuration. Caos NSA is a distribution that focuses on making things simple, easy to install and upgrade, and easy to manage.
NFS with Native Infiniband

NFS frees you from proprietary file systems and, coupled with Infiniband, is the only standard file system that can be used for high-peroformance distributed processing.
strace: The Friend You Never Knew You Had

While strace is often used for troubleshooting and debugging, you can also use strace to get started on examining the I/O pattern of your serial codes.
Parallel Platters: File Systems for HPC Clusters Part Three

In the last installment of our Parallel Platters series, Jeff Layton looks at the next generation of parallel file systems: Object Based File Systems.
Life, The Universe, and Your Cluster

Getting the most out of your cluster is always important. But how exactly is that done? Do you really need to dissect your code and analyze every instruction to get optimal performance? Do you need to build custom kernels? Not necessarily. By testing some basic assumptions, you may be able to eke ten-node performance out of an eight-node cluster. Here’s how.