Worldwide, data is growing at a tremendous rate. However, one recent study has pointed out that the size of files is not necessarily growing at the same rate; meaning the number of files is growing rapidly. How do we manage all of this data and files? While the answer to that question is complex, one place we can start is with Extended File Attributes.
Introduction
I think it’s a given that the amount of data is increasing at a fairly fast rate. We now have lots of multimedia on our desktops, and lots of files on our servers at work, and we’re starting to put lots of data into the cloud (e.g. Facebook). One question that affects storage design and performance is if these files are large or small and how many of them are there?
At this year’s FAST (USENIX Conference on File System and Storage Technologies) the best paper went to “A Study of Practical Deduplication” by William Bolosky from Microsoft Research, and Dutch Meyer from the University of British Columbia. While the paper didn’t really cover Linux (it covered Windows) and it was more focused on desktops, and it was focused on deduplication, it did present some very enlightening insights on file systems from 2000 to 2010. Some of the highlights from the paper are:
- The median file size isn’t changing
- The average file size is larger
- The average file system capacity has tripled from 2000 to 2010
To fully understand the difference between the first point and the third point you need to remember some basic statistics. The average file size is computed by summing the size of every file and dividing by the number of files. But the median file size is found by ordering the list from the smallest to largest of the file size of every file. The median file size is the one in the middle of the ordered list. So, with these working definitions, the three observations previously mentioned indicate that perhaps desktops have a few really large files that drive up the average file size but at the same time there are a number of small files that makes the median file size about the same despite the increase in the number of files and the increase in large files.
The combination of the observations previously mentioned mean that we have many more files on our desktops and we are adding some really large files and about the same number of small files.
Yes, it’s Windows. Yes, it’s desktops. But these observations are another good data point that tell us something about our data. That is, the number of files is getting larger while we are adding some very large files and a large number of small files. What does this mean for us? One thing that it means to me is that we need to pay much more attention to managing our data.
Data Management – Who’s on First?
One of the keys to data management is being able to monitor the state of your data which usually means monitoring the metadata. Fortunately, POSIX gives us some standard metadata for our files such as the following:
- File ownership (User ID and Group ID)
- File permissions (world, group, user)
- File times (atime, ctime, mtime)
- File size
- File name
- Is it a true file or a directory?
There are several others (e.g. links) which I didn’t mention here.
With this information we can monitor the basic state of our data. We can compute how quickly our data is changing (how many files have been modified, created, deleted in a certain period of time). We can also determine how our data is “aging” – that is how old is the average file, the median file, and we can do this for the entire file system tree or certain parts of it. In essence we can get a good statistical overview of the “state of our data”.
All of this capability is just great and goes far beyond anything that is available today. However, with the file system capacity increasing so rapidly and the median file size staying about the same, we have a lot more files to monitor. Plus we keep data around for longer than we ever have. Perhaps over time it is easy to forget what a file name means or what is contained in a cryptic file name. Since POSIX is good enough to give some basic metadata wouldn’t it be nice to have the ability to add our own metadata? Something that we control that would allow is to add information about the data?
Extended File Attributes
What many people don’t realize is that there actually is a mechanism for adding your own metadata to files that is supported by most Linux file systems. This is called Extended File Attributes. In Linux, many file systems support it such as the following: ext2, ext3, ext4, jfs, xfs, reiserfs, btrfs, ocfs2 (2.1 and greater), and squashfs (kernel 2.6.35 and greater or a backport to an older kernel). Some of the file systems have restrictions on extended file attributes, such as the amount of data that can be added, but they do allow for the addition of user controlled metadata.
Any regular file that uses one of the previously mentioned extended file attributes may have a list of extended file attributes. The attributes have a name and some associated data (the actual attribute). The name starts with what is called a namespace identifier (more on that later), followed by a dot “.”, and then followed by a null-terminated string. You can add as many names separated by dots as you like to create “classes” of attributes.
Currently on Linux there are four namespaces for extended file attributes:
- user
- trusted
- security
- system
This article will focus on the “user” namespace since it has no restrictions with regard to naming or contents. However, the “system” namespace could be used for adding metadata controlled by root.
The system namespace is used primarily by the kernel for access control lists (ACLs) and can only be set by root. For example, it will use names such as “system.posix_acl_access” and “system.posix_acl_default” for extended file attributes. The general wisdom is that unless you are using ACLs to store additional metadata, which you can do, you should not use the system namespace. However, I believe that the system namespace is a place for metadata controlled by root or metadata that is immutable with respect to the users.
The security namespace is used by SELinux. An example of a name in this namespace would be something such as “security.selinux”.
The user attributes are meant to be used by the user and any application run by the user. The user namespace attributes are protected by the normal Unix user permission settings on the file. If you have write permission on the file then you can set an extended attribute. To give you an idea of what you can do for “names” for the extended file attributes for this namespace, here are some examples:
- user.checksum.md5
- user.checksum.sha1
- user.checksum.sha256
- user.original_author
- user.application
- user.project
- user.comment
The first three example names are used for storing checksums about the file using three different checksum methods. The fourth example lists the originating author which can be useful in case multiple people have write access to the file or the original author leaves and the file is assigned to another user. The fifth name example can list the application that was used to generate the data such as output from an application. The sixth example lists the project that the data with which the data is associated. And the seventh example is the all-purpose general comment. From these few examples, you see that you can create some very useful metadata.
Tools for Extended File Attributes
There are several very useful tools for manipulating (setting, getting) extended attributes. These are usually included in the attr package that comes with most distributions. So be sure that this package is installed on the system.
The second thing you should check is that the kernel has attribute support. This should be turned on for almost every distribution that you might use, although there may be some very specialized ones that might not have it turned on. But if you build your own kernels (as yours truly does), be sure it is turned on. You can just grep the kernel’s “.config” file for any “ATTR” attributes.
The third thing is to make sure that the libattr package is installed. If you installed the attr package then this package should have been installed as well. But I like to be thorough and check that it was installed.
Then finally, you need to make sure the file system you are going to use with extended attributes is mounted with the user_xattr option.
Assuming that you have satisfied all of these criteria (they aren’t too hard), you can now use extended attributes! Let’s do some testing to show the tools and what we can do with them. Let’s begin by creating a simple file that has some dummy data in it.
$ echo "The quick brown fox" > ./test.txt
$ more test.txt
The quick brown fox
Now let’s add some extended attributes to this file.
$ setfattr -n user.comment -v "this is a comment" test.txt
This command sets the extended file attribute to the name “user.comment”. The option “-v” is the value of the attribute followed by that value. The final option for the command is the name of the file.
You can determine the extended attributes on a file with a simple command, getfattr as in the following example,
$ getfattr test.txt
# file: test.txt
user.comment
Notice that this only lists what extended attributes are defined for a particular file not the values of the attributes. Also notice that it only listed the “user” attributes since the command was done as a regular user. If you ran the command as root and there were system or security attributes assigned you would see those listed.
To see the values of the attributes you have to use the following command:
$ getfattr -n user.comment test.txt
# file: test.txt
user.comment="this is a comment"
With the “-n” option it will list the value of the extended attribute name that you specify.
If you want to remove an extended attribute you use the setfattr command but use the “-x” option such as the following:
$ setfattr -x user.comment test.txt
$ getfattr -n user.comment test.txt
test.txt: user.comment: No such attribute
You can tell that the extended attribute no longer exists because of the return from the setfattr command.
Summary
Without belaboring the point, the amount of data is growing at a very rapid rate even on our desktops. A recent study also pointed out that the number of files is also growing rapidly and that we are adding some very large files but also a large number of small files so that the average file size is growing while the median file size is pretty much staying the same. All of this data will result in a huge data management nightmare that we need to be ready to address.
One way to help address the deluge of data is to enable a rich set of metadata that we can use in our data management plan (whatever that is). An easy way to do this is to use extended file attributes. Most of the popular Linux file systems allow you to add to metadata to files, and in the case of xfs, you can pretty much add as much metadata as you want to the file.
There are four “namespaces” of extended file attributes that we can access. The one we are interested as users is the user namespace because if you have normal write permissions on the file, you can add attributes. If you have read permission on the file you can also read the attributes. But we could use the system namespace as administrators (just be careful) for attributes that we want to assign as root (i.e. users can’t change or query the attributes).
The tools to set and get extended file attributes come with virtually every Linux distribution. You just need to be sure they are installed with your distribution. Then you can set, retrieve, or erase as many extended file attributes as you wish.
Extended file attributes can be used to great effect to add metadata to files. It is really up to the user to do this since they understand the data and have the ability to add/change attributes. Extended attributes give a huge amount of flexibility to the user and creating simple scripts to query or search the metadata is fairly easy (an exercise left to the user). We can even create extended attributes as root so that the user can’t change or see them. This allows administrators to add really meaningful attributes for monitoring the state of the data on the file system. Extended file attributes rock!
Comments on "Extended File Attributes Rock!"
hoverboard real website
back to the future hoverboard real or fake
hoverboard 700 watt 10ah
hoverboard and safety 1st
Brand name positioning approach helps th brand making a certain placement on the market whereas
industrial branding aids the brand name to make its check in the business globe.
Also visit my page; branding business bitesize (Stephania)
However, if you are having trouble determining which advertising and marketing method is most reliable you could take into consideration asking consumers
to respond to study questions as well as provide details such as exactly
how they found out about the services or items your business deals.
my web blog :: business branding names (Jerrell)
The fire department gets ready for catastrophe – they don’t focus on it or obsess regarding
it. They assume, plan, obtain the most effective devices and rehearse their feedback so they
could relocate quickly when and also if required.
Take a look at my web site :: branding business studies online (Candra)
The quite last branding factor is making your local business memorable as well as this starts with selecting a brief web website address and picking
a color scheme to be made use of throughout your business, including on your web website.
My web page – business wales branding (Grover)
Using the previously discussed kinds of social media are certainly beneficial in the crafting company, but brand-new business owners can also achieve success by
selling their items on Etsy!
My web site: business quotes on branding (Shani)
Ensure that you maintain your daily routine force, chi. This observance is definitely necessary. Make an effort to esteem the authority of the words you choose. Don’t involve themselves in vulgar verbal communication or action.
Her lover de Villette was arrested and admitted to writing the forged letters, her absent husband was condemned
to the galleys for life, and the prostitute Nicole
Leguay d’Oliva was also rounded up. Sensationally,
the trial acquitted the Cardinal de Rohan and d’Oliva, while sentencing Jeanne to imprisonment and in addition to being
whipped and branded.
Look into my website; Joma Jewellery Daisy Daze Silver Crystal
Necklace (tppwiki.org)
Personally I haven’t got the patience for it and am completely happy to purchase chain for my designs, however I really admire people who do intensive wirework.
My weblog – joma jewellery twitter
Luckily, there are some really basic points that any type of homeowner
could do to maintain their A/C unit running successfully as well as possibly even much less often throughout the cozy months.
Feel free to visit my page air conditioning Capalaba (Renaldo)
If you’re making the bracelet out of embroidery thread, attempt using
two colours. You may additionally combine your favourite with
that of your pal’s to make it extra personalised and noteworthy.
Feel free to visit my weblog Joma Jewellery Stockists
(Vance)
Simulants of this gem embody ambroid, copal resin,
kauri gum, dammar, celluloid and plastic.
Feel free to visit my weblog: Joma Jewellery Daisy Daze Silver Crystal Necklace (Amie)
You can use one of the available online tools to find quality
keyword phrases that assure a high page rank in online search
engine results web page.
Here is my web-site :: search engine marketing firms (Tiffiny)
Area should be enhanced so that the material is obtained,
examined, and classified as an unique dispensation for the search engines.
Feel free to visit my homepage; best viral marketing videos (Tomoko)
With these cool new openers you may function your storage
door right out of your good telephone. Some fashions even remind you while you by accident leave the door open!
They damage down solids much more quickly compared to the anaerobic germs in conventional sewage-disposal
tanks, so cleaner water enters into the drainfield.
Here is my blog … septic systems for small lots ontario (Micahjesse.com)
Industry veterans suggest that 10,000 privately held stores might close
over the next 7-10 years.
Stop by my web page; New Hultquist Jewellery Collection (Sherita)
Usually posts some pretty interesting stuff like this. If you?re new to this site.
The guideline for composing Search Engine Optimization optimized title tag and header have actually
not changed but there there could be adjustment in Google
title tag formula.
Feel free to surf to my homepage – seo Shorncliffe – Eugenio,
Screen your success using Google Analytics as well as the Google metrics checker tool.
my web blog; Balmoral SEO (Roxie)
I am regular visitor, how are you everybody?
This article posted at this web page is truly nice.
Although web-sites we backlink to beneath are considerably not connected to ours, we feel they may be actually really worth a go by means of, so have a look.
Please pay a visit to the sites we stick to, including this one, because it represents our picks from the web.
I also feel like makers are doing a better work of taking care of customization requirements
for PC contractors and also especially at the full tower instance dimension.
Feel free to surf to my page … best gaming pc for under 1000; Jamey,
Every once inside a while we decide on blogs that we study. Listed below are the most recent web sites that we select.
Have acquired 2 residences and not worked for a year after an excellent job in Saudi following their rules and also regulations, it deserved it.
Also visit my weblog; oasis lodges ledbury
Do you have an old vehicle sitting round?