Filesystem internals from user space

Aha!this is a topic which excite almost everybody related to computer field.Reason is very much inevitable,because it is the base on which lot many thing depends on the computer system.In this article I am going show you some very well known and used tools to get some internals about the filesystem specifically ext2/3/4 .So first when we create a device,means partitons ,those partitions are raw.So to keep data on it you need to have a filesystem on it to hold your data.right? for that first thing we do create a files system on that newly created device.It can be done through many utility software come along with the util-linux-ng package.Few names are very common i.e fdisk,gparted,sfdisk,cfdisk.

A few words about journaling:

Journaling file systems use a journal to buffer changes to the file system (which is also used in crash recovery) but can use different strategies for when and what is journaled. Three of the most common strategies are writeback, ordered, and data.

Writeback mode, only the metadata is journaled, and the data blocks are written directly to their location on the disk. This preserves the file system structure and avoids corruption, but data corruption can occur (for example, if the system crashes after the metadata is journaled but before the data block is written). To solve this problem, you can use ordered mode.

Ordered mode is metadata journaling only but writes the data before journaling the metadata. In this way, data and file system are guaranteed consistent after a recovery. Finally, data journaling can also be supported.

Data mode, both metadata and data are journaled. This mode offers the greatest protection against file system corruption and data loss but can suffer from performance degradation, as all data is written twice (first to the journal, then to the disk).

The journal commit policy can also differ in the various approaches. For example, is the journal committed when it nears full, or through a timeout?

I am not going to show you the device creation here as I assume you have already created and make filesystem on it( by running mkfs on it),check out the mkfs man page for different options and variants of it.Like for creating ext3 filesystem you might use mkfs.ext3…..

bhaskar@bhaskar-laptop_12:01:44_Mon Aug 30:~> whereis mkfs
mkfs: /sbin/mkfs.ext4 /sbin/mkfs.minix /sbin/mkfs.ext3 /sbin/mkfs.ext4dev /sbin/mkfs.cramfs /sbin/mkfs.bfs /sbin/mkfs /sbin/mkfs.ext2

See how many variant it has.Ok lets get the information of ext3 filesystem mounted on device.Here we go:

We have to have a package called e2fsprogs installed in our system ..although it basically installed by defaults.This package hold all the tools needed to check and spit out the information about the filesystem informations.

Superblock:

What is it? Nothing but the holding the metadata about the partition and reside in the very first block in the every partition.Now it hold below information within it:

File system like ext2. ext3 etc. Superblock contents the information about file system like –
* File system type
* Size
* Status
* Information about other metadata

So you can understand how important it is.Now all the electronic device is prone to failure as the filesystem too and superblock is prone getting corrupted every so very often.

– You can’t able to mount the filesystem, it will refuse to mount
– Filesystem gets hang
– Sometimes though you are able to mount that filesystem, but strange behavior occures.

So here comes the basic thing if you can’t mount the partition then how can you work on??right.Fortunately superblock spread in different location on the disk,means has got the copy of fisrt superblock somewhere else.How do you able to find those block??Here is the way to find the alternative superblock:

bhaskar@bhaskar-laptop_12:05:31_Mon Aug 30:~> sudo dumpe2fs /dev/sda3 | grep superblock
[sudo] password for bhaskar:
dumpe2fs 1.41.3 (12-Oct-2008)
Primary superblock at 0, Group descriptors at 1-1
Backup superblock at 32768, Group descriptors at 32769-32769
Backup superblock at 98304, Group descriptors at 98305-98305
Backup superblock at 163840, Group descriptors at 163841-163841
Backup superblock at 229376, Group descriptors at 229377-229377
Backup superblock at 294912, Group descriptors at 294913-294913
Backup superblock at 819200, Group descriptors at 819201-819201
Backup superblock at 884736, Group descriptors at 884737-884737
Backup superblock at 1605632, Group descriptors at 1605633-1605633
Backup superblock at 2654208, Group descriptors at 2654209-2654209

Yes we use a tool called dumpe2fs ,which will come with the package I mentioned earlier.So the superblock is kept in different places.In the event of superblock corruption you might copy other superblock(backup location) into the main portion.Let me show you how you can do that:

bhaskar@bhaskar-laptop_12:26:58_Mon Aug 30:~> sudo /e2fsck -f -b  32768 /dev/sda3

Now a bit of explanation.The “-b” option is to provide the alternative superblock to replace the corrupted one.And the device mentioned holding the corrupted superblock,in this case it is /dev/sda3.Clear??

Ok lets move on ,I want know about the filesystem metadata of a particular device how do I do that??Here is way to do it:

bhaskar@bhaskar-laptop_12:31:20_Mon Aug 30:~> sudo tune2fs -l /dev/sda3
tune2fs 1.41.3 (12-Oct-2008)
Filesystem volume name:   <none>
Last mounted on:          <not available>
Filesystem UUID:          3cccbf0e-0354-43b4-b89a-ceee1fcadb31
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal resize_inode dir_index filetype needs_recovery sparse_super large_file
Filesystem flags:         signed_directory_hash
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              1710240
Block count:              3417828
Reserved block count:     170891
Free blocks:              2352985
Free inodes:              1562077
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      834
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         16288
Inode blocks per group:   509
Filesystem created:       Thu Feb  4 15:19:47 2010
Last mount time:          Mon Aug 30 16:35:08 2010
Last write time:          Mon Aug 30 16:35:08 2010
Mount count:              14
Maximum mount count:      26
Last checked:             Sun Jul 25 22:57:56 2010
Check interval:           15552000 (6 months)
Next check after:         Fri Jan 21 22:57:56 2011
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               128
Journal inode:            8
First orphan inode:       1010177
Default directory hash:   tea
Directory Hash Seed:      76073fa9-2f5a-4246-926a-b384bae24c6a
Journal backup:           inode blocks

As you can see in the above output so many internals are revealed!! kindly go through it so you can get a grasp of it or might understand what that option for.

I want to add acl(Access Control List) on the filesystem as it doesn’t have at this moment( refer to above output).ACL is beast,it add security permission in the file system.Please do not intermingle with SELinux permission with it.It’s an added layer or extra protection or preciseness you filesystem might get.Those who sits on RHEL/CentOS this ACL thing can naturally come to them as it glue with this distribution.

Now once go through the manual of tune2fs and find -o option and pass the device which holds the filesystem to add acl. So here we go:

bhaskar@bhaskar-laptop_12:40:53_Mon Aug 30:~> sudo tune2fs -o acl /dev/sda3
tune2fs 1.41.3 (12-Oct-2008)
bhaskar@bhaskar-laptop_12:41:50_Mon Aug 30:~> sudo tune2fs -l /dev/sda3 | grep acl
Default mount options:    acl

You can see I have added the acl option to the specific device filesystem.Best possible way to find that it work just to remount it.Then create a file system and check .And yes I did reboot.

Now I am going to create a file under my home dir and check acl permission for that file.Right..here we go:

bhaskar@bhaskar-laptop_12:57:30_Mon Aug 30:~> touch aclcheck
bhaskar@bhaskar-laptop_13:12:39_Mon Aug 30:~> ls -al aclcheck
-rw-r–r– 1 bhaskar bhaskar 0 2010-08-30 13:12 aclcheck
bhaskar@bhaskar-laptop_13:12:47_Mon Aug 30:~> getfacl aclcheck
# file: aclcheck
# owner: bhaskar
# group: bhaskar
user::rw-
group::r–
other::r–

As yon can see the output of getfacl…on debian you need to get acl package through aptitude. Then only you get this userspace tools.

This kind of security system greatly help sharing system resources with the outside world.You can set the acl permission through a binary called setfacl .

bhaskar@bhaskar-laptop_13:21:54_Mon Aug 30:~> setfacl -m u:bhaskar:rw aclcheck

bhaskar@bhaskar-laptop_13:22:16_Mon Aug 30:~> getfacl aclcheck
# file: aclcheck
# owner: bhaskar
# group: bhaskar
user::rw-
user:bhaskar:rw-
group::r–
mask::rw-
other::r–

So what I did? I have used sefacl command to set the file permission . Let me put across few example stright out of the manual:

Granting an additional user read access
setfacl -m u:lisa:r file

Revoking write access from all groups and all named users (using the effective rights mask)
setfacl -m m::rx file

Removing a named group entry from a file’s ACL
setfacl -x g:staff file

Copying the ACL of one file to another
getfacl file1 | setfacl –set-file=- file2

Copying the access ACL into the Default ACL
getfacl –access dir | setfacl -d -M- dir

Cool …right?

Now you need to which filesystem have been built with the kernel through a /proc virtual filesystem entry like this:

bhaskar@bhaskar-laptop_13:36:23_Mon Aug 30:~> sudo cat /proc/filesystems
nodev   sysfs
nodev   rootfs
nodev   bdev
nodev   proc
nodev   cgroup
nodev   cpuset
nodev   debugfs
nodev   securityfs
nodev   sockfs
nodev   pipefs
nodev   anon_inodefs
nodev   tmpfs
nodev   inotifyfs
nodev   devpts
nodev   ramfs
nodev   hugetlbfs
nodev   mqueue
nodev   usbfs
ext3
nodev   rpc_pipefs
nodev   nfsd

If you want add more filesystm file then you need to fuse that filesystem entry by rebuilding the kernel .And your entry should be enlisted by the /proc/filesystem file.

You can get a actually mounted system through a file called /etc/mtab or /proc/mounts

bhaskar@bhaskar-laptop_13:36:32_Mon Aug 30:~> sudo cat /etc/mtab
/dev/sda3 / ext3 rw,errors=remount-ro 0 0
tmpfs /lib/init/rw tmpfs rw,nosuid,mode=0755 0 0
proc /proc proc rw,noexec,nosuid,nodev 0 0
sysfs /sys sysfs rw,noexec,nosuid,nodev 0 0
procbususb /proc/bus/usb usbfs rw 0 0
udev /dev tmpfs rw,mode=0755 0 0
tmpfs /dev/shm tmpfs rw,nosuid,nodev 0 0
devpts /dev/pts devpts rw,noexec,nosuid,gid=5,mode=620 0 0
/dev/mapper/bhaskarlaptop-data /lvm ext3 rw 0 0
nfsd /proc/fs/nfsd nfsd rw 0 0

OR

bhaskar@bhaskar-laptop_13:39:44_Mon Aug 30:~> sudo cat /proc/mounts
rootfs / rootfs rw 0 0
none /sys sysfs rw,nosuid,nodev,noexec 0 0
none /proc proc rw,nosuid,nodev,noexec 0 0
udev /dev tmpfs rw,size=10240k,mode=755 0 0
/dev/sda3 / ext3 rw,errors=remount-ro,acl,data=ordered 0 0
tmpfs /lib/init/rw tmpfs rw,nosuid,mode=755 0 0
usbfs /proc/bus/usb usbfs rw,nosuid,nodev,noexec 0 0
tmpfs /dev/shm tmpfs rw,nosuid,nodev 0 0
devpts /dev/pts devpts rw,nosuid,noexec,gid=5,mode=620 0 0
/dev/mapper/bhaskarlaptop-data /lvm ext3 rw,errors=continue,data=ordered 0 0
nfsd /proc/fs/nfsd nfsd rw 0 0

For more enthusiastic readers I  would recommend to look into the debugfs manual page,as it provide plethora of option to examine the filesystem.I believe that I have covered regarding LVM filesystem in my other post.

Hope this will help.

Cheers!

Bhaskar

About unixbhaskar
GNU/Linux Consultant

2 Responses to Filesystem internals from user space

  1. Pingback: Links 31/8/2010: Linux Developer Community From Wind River, Multitouch Tablet | Techrights

  2. Pingback: 6 Original Superblock

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: