An inode is a main resource in the ext2 filesystem. It is used for various purposes, but the main two are:
Each file, for example, will allocate one inode from the filesystem resources.
An ext2 filesystem has a total number of available inodes which is determined while creating the filesystem. When all the inodes are used, for example, you will not be able to create an additional file even though there will still be free blocks on the filesystem.
Each inode takes up 128 bytes in the filesystem. By default, mke2fs
reserves an inode for each 4096 bytes of the filesystem space.
The inodes are placed in several tables, each of which contains the same number of inodes and is placed at a different blocks group. The goal is to place inodes and their related files in the same blocks group because of locality arguments.
The number of inodes in a blocks group is available in the superblock variable
s_inodes_per_group. For example, if there are 2000 inodes per group,
group 0 will contain the inodes 1-2000, group 2 will contain the inodes
2001-4000, and so on.
Each inode table is accessed from the group descriptor of the specific blocks group which contains the table.
Follows the structure of an inode in Ext2fs:
struct ext2_inode {
__u16 i_mode; /* File mode */
__u16 i_uid; /* Owner Uid */
__u32 i_size; /* Size in bytes */
__u32 i_atime; /* Access time */
__u32 i_ctime; /* Creation time */
__u32 i_mtime; /* Modification time */
__u32 i_dtime; /* Deletion Time */
__u16 i_gid; /* Group Id */
__u16 i_links_count; /* Links count */
__u32 i_blocks; /* Blocks count */
__u32 i_flags; /* File flags */
union {
struct {
__u32 l_i_reserved1;
} linux1;
struct {
__u32 h_i_translator;
} hurd1;
struct {
__u32 m_i_reserved1;
} masix1;
} osd1; /* OS dependent 1 */
__u32 i_block[EXT2_N_BLOCKS];/* Pointers to blocks */
__u32 i_version; /* File version (for NFS) */
__u32 i_file_acl; /* File ACL */
__u32 i_dir_acl; /* Directory ACL */
__u32 i_faddr; /* Fragment address */
union {
struct {
__u8 l_i_frag; /* Fragment number */
__u8 l_i_fsize; /* Fragment size */
__u16 i_pad1;
__u32 l_i_reserved2[2];
} linux2;
struct {
__u8 h_i_frag; /* Fragment number */
__u8 h_i_fsize; /* Fragment size */
__u16 h_i_mode_high;
__u16 h_i_uid_high;
__u16 h_i_gid_high;
__u32 h_i_author;
} hurd2;
struct {
__u8 m_i_frag; /* Fragment number */
__u8 m_i_fsize; /* Fragment size */
__u16 m_pad1;
__u32 m_i_reserved2[2];
} masix2;
} osd2; /* OS dependent 2 */
};
The basic functionality of an inode is to group together a series of allocated blocks. There is no limitation on the allocated blocks - Each block can be allocated to each inode. Nevertheless, block allocation will usually be done in series to take advantage of the locality principle.
The inode is not always used in that way. I will now explain the allocation of blocks, assuming that the current inode type indeed refers to a list of allocated blocks.
It was found experimently that many of the files in the filesystem are
actually quite small. To take advantage of this effect, the kernel provides
storage of up to 12 block numbers in the inode itself. Those blocks are
called direct blocks. The advantage is that once the kernel has the
inode, it can directly access the file's blocks, without an additional disk
access. Those 12 blocks are directly specified in the variables
i_block[0] to i_block[11].
i_block[12] is the indirect block - The block pointed by
i_block[12] will not be a data block. Rather, it will just contain a
list of direct blocks. For example, if the block size is 1024 bytes, since
each block number is 4 bytes long, there will be place for 256 indirect
blocks. That is, block 13 till block 268 in the file will be accessed by the
indirect block method. The penalty in this case, compared to the
direct blocks case, is that an additional access to the device is needed -
We need two accesses to reach the required data block.
In much the same way, i_block[13] is the double indirect block
and i_block[14] is the triple indirect block.
i_block[13] points to a block which contains pointers to indirect
blocks. Each one of them is handled in the way described above.
In much the same way, the triple indirect block is just an additional level of indirection - It will point to a list of double indirect blocks.
The i_mode variable is used to determine the inode type and the
associated permissions. It is best described by representing it as an
octal number. Since it is a 16 bit variable, there will be 6 octal digits.
Those are divided into two parts - The rightmost 4 digits and the leftmost 2
digits.
The rightmost 4 digits are bit options - Each bit has its own
purpose.
The last 3 digits (Octal digits 0,1 and 2) are just the usual permissions,
in the known form rwxrwxrwx. Digit 2 refers to the user, digit 1 to
the group and digit 2 to everyone else. They are used by the kernel to grant
or deny access to the object presented by this inode.
A smarter permissions control is one of the enhancements planned for
Linux 1.3 - The ACL (Access Control Lists). Actually, from browsing of the
kernel source, some of the ACL handling is already done.
Bit number 9 signals that the file (I'll refer to the object presented by
the inode as file even though it can be a special device, for example) is
set VTX. I still don't know what is the meaning of "VTX".
Bit number 10 signals that the file is set group id - I don't know
exactly the meaning of the above either.
Bit number 11 signals that the file is set user id, which means that
the file will run with an effective user id root.
Note the the leftmost octal digit can only be 0 or 1, since the total number of bits is 16.
Those digits, as opposed to the rightmost 4 digits, are not bit mapped options. They determine the type of the "file" to which the inode belongs:
01 - The file is a FIFO.02 - The file is a character device.04 - The file is a directory.06 - The file is a block device.10 - The file is a regular file.12 - The file is a symbolic link.14 - The file is a socket.
Linux records the last time in which various operations occured with the file. The time and date are saved in the standard C library format - The number of seconds which passed since 00:00:00 GMT, January 1, 1970. The following times are recorded:
i_ctime - The time in which the inode was last allocated. In
other words, the time in which the file was created.i_mtime - The time in which the file was last modified.i_atime - The time in which the file was last accessed.i_dtime - The time in which the inode was deallocated. In
other words, the time in which the file was deleted.
i_size contains information about the size of the object presented by
the inode. If the inode corresponds to a regular file, this is just the size
of the file in bytes. In other cases, the interpretation of the variable is
different.
The user and group id of the file are just saved in the variables
i_uid and i_gid.
Later, when we'll discuss the implementation of directories, it will be
explained that each directory entry points to an inode. It is quite
possible that a single inode will be pointed to from several
directories. In that case, we say that there exist hard links to the
file - The file can be accessed from each of the directories.
The kernel keeps track of the number of hard links in the variable
i_links_count. The variable is set to "1" when first allocating the
inode, and is incremented with each additional link. Deletion of a file will
delete the current directory entry and will decrement the number of links.
Only when this number reaches zero, the inode will be actually deallocated.
The name hard link is used to distinguish between the alias method
described above, to another alias method called symbolic linking,
which will be described later.
The ext2 filesystem associates additional flags with an inode. The extended
attributes are stored in the variable i_flags. i_flags is a 32
bit variable. Only the 7 rightmost bits are defined. Of them, only 5 bits
are used in version 0.5a of the filesystem. Specifically, the
undelete and the compress features are not implemented, and
are to be introduced in Linux 1.3 development.
The currently available flags are:
undelete feature in future Ext2fs developments.Remy Card) to check if the file should not be dumped.
The hard links presented above are just another pointers to the same
inode. The important aspect is that the inode number is fixed when
the link is created. This means that the implementation details of the
filesystem are visible to the user - In a pure abstract usage of the
filesystem, the user should not care about inodes.
The above causes several limitations:
Symbolic link, on the other hand, is analyzed at run time. A
symbolic link is just a pathname which is accessible from an inode.
As such, it "speaks" in the language of the abstract filesystem. When the
kernel reaches a symbolic link, it will follow it in run time using
its normal way of reaching directories.
As such, symbolic link can be made across different filesystems and a
replacement of a file with a new version will automatically be active on all
its symbolic links.
The disadvantage is that hard link doesn't consume space except to a small directory entry. Symbolic link, on the other hand, consumes at least an inode, and can also consume one block.
When the inode is identified as a symbolic link, the kernel needs to find the path to which it points.
When the pathname contains up to 64 bytes, it can be saved directly in the
inode, on the i_block[0] - i_block[15] variables, since those are not
needed in that case. This is called fast symbolic link. It is fast
because the pathname resolution can be done using the inode itself, without
accessing additional blocks. It is also economical, since it allocates only
an inode. The length of the pathname is stored in the i_size
variable.
Starting from 65 bytes, additional block is allocated (by the use of
i_block[0]) and the pathname is stored in it. It is called slow
because the kernel needs to read additional block to resolve the pathname.
The length is again saved in i_size.
i_version is used with regard to Network File System. I don't know
its exact use.
As far as I know, the variables which are connected to ACL and fragments are not currently used. They will be supported in future versions.
Ext2fs is being ported to other operating systems. As far as I know, at least in linux, the os dependent variables are also not used.
The first ten inodes on the filesystem are special inodes:
bad blocks inode - I believe that its data
blocks contain a list of the bad blocks in the filesystem, which
should not be allocated.root inode - The inode of the root directory.
It is the starting point for reaching a known path in the filesystem.acl index inode. Access control lists are
currently not supported by the ext2 filesystem, so I believe this
inode is not used.acl data inode. Of course, the above applies
here too.boot loader inode. I don't know its
usage.undelete directory inode. It is also a
foundation for future enhancements, and is currently not used.reserved and currently not used.