Next Previous Contents

10. On the inode and the inode tables

An inode is a main resource in the ext2 filesystem. It is used for various purposes, but the main two are:

Each file, for example, will allocate one inode from the filesystem resources.

An ext2 filesystem has a total number of available inodes which is determined while creating the filesystem. When all the inodes are used, for example, you will not be able to create an additional file even though there will still be free blocks on the filesystem.

Each inode takes up 128 bytes in the filesystem. By default, mke2fs reserves an inode for each 4096 bytes of the filesystem space.

The inodes are placed in several tables, each of which contains the same number of inodes and is placed at a different blocks group. The goal is to place inodes and their related files in the same blocks group because of locality arguments.

The number of inodes in a blocks group is available in the superblock variable s_inodes_per_group. For example, if there are 2000 inodes per group, group 0 will contain the inodes 1-2000, group 2 will contain the inodes 2001-4000, and so on.

Each inode table is accessed from the group descriptor of the specific blocks group which contains the table.

Follows the structure of an inode in Ext2fs:


struct ext2_inode {
        __u16   i_mode;         /* File mode */
        __u16   i_uid;          /* Owner Uid */
        __u32   i_size;         /* Size in bytes */
        __u32   i_atime;        /* Access time */
        __u32   i_ctime;        /* Creation time */
        __u32   i_mtime;        /* Modification time */
        __u32   i_dtime;        /* Deletion Time */
        __u16   i_gid;          /* Group Id */
        __u16   i_links_count;  /* Links count */
        __u32   i_blocks;       /* Blocks count */
        __u32   i_flags;        /* File flags */
        union {
                struct {
                        __u32  l_i_reserved1;
                } linux1;
                struct {
                        __u32  h_i_translator;
                } hurd1;
                struct {
                        __u32  m_i_reserved1;
                } masix1;
        } osd1;                         /* OS dependent 1 */
        __u32   i_block[EXT2_N_BLOCKS];/* Pointers to blocks */
        __u32   i_version;      /* File version (for NFS) */
        __u32   i_file_acl;     /* File ACL */
        __u32   i_dir_acl;      /* Directory ACL */
        __u32   i_faddr;        /* Fragment address */
        union {
                struct {
                        __u8    l_i_frag;       /* Fragment number */
                        __u8    l_i_fsize;      /* Fragment size */
                        __u16   i_pad1;
                        __u32   l_i_reserved2[2];
                } linux2;
                struct {
                        __u8    h_i_frag;       /* Fragment number */
                        __u8    h_i_fsize;      /* Fragment size */
                        __u16   h_i_mode_high;
                        __u16   h_i_uid_high;
                        __u16   h_i_gid_high;
                        __u32   h_i_author;
                } hurd2;
                struct {
                        __u8    m_i_frag;       /* Fragment number */
                        __u8    m_i_fsize;      /* Fragment size */
                        __u16   m_pad1;
                        __u32   m_i_reserved2[2];
                } masix2;
        } osd2;                         /* OS dependent 2 */
};

10.1 The allocated blocks

The basic functionality of an inode is to group together a series of allocated blocks. There is no limitation on the allocated blocks - Each block can be allocated to each inode. Nevertheless, block allocation will usually be done in series to take advantage of the locality principle.

The inode is not always used in that way. I will now explain the allocation of blocks, assuming that the current inode type indeed refers to a list of allocated blocks.

It was found experimently that many of the files in the filesystem are actually quite small. To take advantage of this effect, the kernel provides storage of up to 12 block numbers in the inode itself. Those blocks are called direct blocks. The advantage is that once the kernel has the inode, it can directly access the file's blocks, without an additional disk access. Those 12 blocks are directly specified in the variables i_block[0] to i_block[11].

i_block[12] is the indirect block - The block pointed by i_block[12] will not be a data block. Rather, it will just contain a list of direct blocks. For example, if the block size is 1024 bytes, since each block number is 4 bytes long, there will be place for 256 indirect blocks. That is, block 13 till block 268 in the file will be accessed by the indirect block method. The penalty in this case, compared to the direct blocks case, is that an additional access to the device is needed - We need two accesses to reach the required data block.

In much the same way, i_block[13] is the double indirect block and i_block[14] is the triple indirect block.

i_block[13] points to a block which contains pointers to indirect blocks. Each one of them is handled in the way described above.

In much the same way, the triple indirect block is just an additional level of indirection - It will point to a list of double indirect blocks.

10.2 The i_mode variable

The i_mode variable is used to determine the inode type and the associated permissions. It is best described by representing it as an octal number. Since it is a 16 bit variable, there will be 6 octal digits. Those are divided into two parts - The rightmost 4 digits and the leftmost 2 digits.

The rightmost 4 octal digits

The rightmost 4 digits are bit options - Each bit has its own purpose.

The last 3 digits (Octal digits 0,1 and 2) are just the usual permissions, in the known form rwxrwxrwx. Digit 2 refers to the user, digit 1 to the group and digit 2 to everyone else. They are used by the kernel to grant or deny access to the object presented by this inode.

A smarter permissions control is one of the enhancements planned for Linux 1.3 - The ACL (Access Control Lists). Actually, from browsing of the kernel source, some of the ACL handling is already done.

Bit number 9 signals that the file (I'll refer to the object presented by the inode as file even though it can be a special device, for example) is set VTX. I still don't know what is the meaning of "VTX".

Bit number 10 signals that the file is set group id - I don't know exactly the meaning of the above either.

Bit number 11 signals that the file is set user id, which means that the file will run with an effective user id root.

The leftmost two octal digits

Note the the leftmost octal digit can only be 0 or 1, since the total number of bits is 16.

Those digits, as opposed to the rightmost 4 digits, are not bit mapped options. They determine the type of the "file" to which the inode belongs:

10.3 Time and date

Linux records the last time in which various operations occured with the file. The time and date are saved in the standard C library format - The number of seconds which passed since 00:00:00 GMT, January 1, 1970. The following times are recorded:

10.4 i_size

i_size contains information about the size of the object presented by the inode. If the inode corresponds to a regular file, this is just the size of the file in bytes. In other cases, the interpretation of the variable is different.

10.5 User and group id

The user and group id of the file are just saved in the variables i_uid and i_gid.

10.6 Hard links

Later, when we'll discuss the implementation of directories, it will be explained that each directory entry points to an inode. It is quite possible that a single inode will be pointed to from several directories. In that case, we say that there exist hard links to the file - The file can be accessed from each of the directories.

The kernel keeps track of the number of hard links in the variable i_links_count. The variable is set to "1" when first allocating the inode, and is incremented with each additional link. Deletion of a file will delete the current directory entry and will decrement the number of links. Only when this number reaches zero, the inode will be actually deallocated.

The name hard link is used to distinguish between the alias method described above, to another alias method called symbolic linking, which will be described later.

10.7 The Ext2fs extended flags

The ext2 filesystem associates additional flags with an inode. The extended attributes are stored in the variable i_flags. i_flags is a 32 bit variable. Only the 7 rightmost bits are defined. Of them, only 5 bits are used in version 0.5a of the filesystem. Specifically, the undelete and the compress features are not implemented, and are to be introduced in Linux 1.3 development.

The currently available flags are:

10.8 Symbolic links

The hard links presented above are just another pointers to the same inode. The important aspect is that the inode number is fixed when the link is created. This means that the implementation details of the filesystem are visible to the user - In a pure abstract usage of the filesystem, the user should not care about inodes.

The above causes several limitations:

Symbolic link, on the other hand, is analyzed at run time. A symbolic link is just a pathname which is accessible from an inode. As such, it "speaks" in the language of the abstract filesystem. When the kernel reaches a symbolic link, it will follow it in run time using its normal way of reaching directories.

As such, symbolic link can be made across different filesystems and a replacement of a file with a new version will automatically be active on all its symbolic links.

The disadvantage is that hard link doesn't consume space except to a small directory entry. Symbolic link, on the other hand, consumes at least an inode, and can also consume one block.

When the inode is identified as a symbolic link, the kernel needs to find the path to which it points.

Fast symbolic links

When the pathname contains up to 64 bytes, it can be saved directly in the inode, on the i_block[0] - i_block[15] variables, since those are not needed in that case. This is called fast symbolic link. It is fast because the pathname resolution can be done using the inode itself, without accessing additional blocks. It is also economical, since it allocates only an inode. The length of the pathname is stored in the i_size variable.

Slow symbolic links

Starting from 65 bytes, additional block is allocated (by the use of i_block[0]) and the pathname is stored in it. It is called slow because the kernel needs to read additional block to resolve the pathname. The length is again saved in i_size.

10.9 i_version

i_version is used with regard to Network File System. I don't know its exact use.

10.10 Reserved variables

As far as I know, the variables which are connected to ACL and fragments are not currently used. They will be supported in future versions.

Ext2fs is being ported to other operating systems. As far as I know, at least in linux, the os dependent variables are also not used.

10.11 Special reserved inodes

The first ten inodes on the filesystem are special inodes:


Next Previous Contents