I’ve been away from my development laptop recently, and it’s actually given me the opportunity to come up with several new ideas. One of them is the netchannels design (which is the core of the userspace network stack), which I’ll detail on the Whitix wiki and some blog posts to come. The other is how filesystem metadata should be exposed in the virtual filesystem. Let me explain.
First of all, metadata is “data about data”. For a filesystem, it mainly relates to data about files. We’ve had standard attributes about files for over 40 years now; information like the file’s size, its permissions, and which user and group own it. We’ve got a set of standard system calls that cover all these different attributes, and they are well established. However, thinking about them more abstractly, they turn out to be (special kinds of) key value pairs. So why not extend the concept, and allow custom key value pairs (say, director for a movie file) to be stored along with the file itself?
Well, it’s possible! Filesystems, like ext3, support “extended attributes”. You can store a key and a value ( currently the size limit is the block size). Usually, you have to prefix the name of your attribute with a namespace, like “user” or “system”. Great! That surely means we can store a lot more information outside the file, where it isn’t locked into one of the many ten of thousands of file formats out there?
Well, problem is, nobody really uses it much.
Why? First of all, they’re relatively new to most people. In a lot of operating systems and filesystems, support was only recently introduced or made public. This means a lot of userspace programs can’t rely on them, since a lot of users don’t have extended attributes enabled (or are hardly used; NTFS on Windows is one example).
The second reason is that they’re not exposed that well. In many operating systems, there’s a separate set of system calls to deal with them. Also, they’re not easy to manipulate or list from the shell, since there’s another custom set of commands to list, create, delete and update them. Not good.
So, inspired by Reiser4, I believe Whitix should expose metadata openly via the filesystem. Imagine, to use some command line examples, creating a new attribute using ‘echo “Steven Spielberg” > /file/name@user.director’, or listing all of them . Now, before I expand upon the possibilities, there are several important differences between my proposal and that for Reiser4. First of all, files will not become directories, at least not without the tailing @. In that way, they still do operate in another namespace; you won’t be able to ‘cd’ into a file, so to speak.
Another thing is, you won’t be able to have directories in the extended attributes. They’re won’t be a metadata file named /file/name@a/b. It doesn’t make sense, as they’re won’t be that many attributes for each file; certainly not enough to warrant putting them into separate directories called ‘user’, ’system’ and the like. These features (or lack thereof!) hopefully avoid the downfalls and problems that were expressed with the Reiser4 proposal.
Now onto the cool stuff, and how it might work in a modern desktop environment. First of all, this really makes a lot of searching and finding much easier, especially on the command line. Searches like “find @user.director=’Steven Spielberg’ @user.file_type=’movie’”, or the natural language equivalent, “Find all films with director Steven Spielberg” become much more possible; movies in different formats have the same metadata.
Now the user won’t need to label all his files with these type of extended attributes manually. It’s really quite a chore, and really inefficient. Some of these attributes will be general (size, permissions etc. will be read-only and read-write attributes that are simulated by the VFS), filesystem-specific (a list of blocks under “system.block_map” for each file or directory perhaps?) and some will be created by software after downloading the file, like a web browser setting the mime-type, or any application that creates a file (to simplify things, there could be a core service that regularly scans the users’ files and updates the metadata).
At any rate, automatic metadata and attributes will be at the core of the Whitix desktop, and I’ll be talking soon about metadata will fit into this vision of mine.