Ext4magic: Inode - Directory - Journal - Install - Time_Options - Histogram - Scenarios - Tips&Tricks - Manpage - Expert-Mode |
Contents |
Simplified you could imagine the file system journal a task book of the file system. There is written what needs to be changed on the file system. Immediately before the change is made to the file system itself, the change is here written directly to disk. If the change in the file system then takes itself, then in the task book this action is marked as completed.
Crash the machine during a write operation, or is preventing a other problems which the operating system, then can not complete a write request to really to drive. So it's possibly the file system data unfinished written and the file data blocks do not match with the meta data. The result: The file system is damaged and need repair. For ext2, which still has no journal, take this file system check is a long time, greater the file system is, the repair takes longer. After a crash, have now several large file systems are checked before they can be mounted and used. The boot process will slow down to minutes, hours ....
In a journal file system the repair function see now unexecuted orders in the journal. These orders are now easily processed, in which the journal data stored in the journal blocks are copied to the correct location in the file system. So the file system is restored to the consistent state. The file system meta data are consistent and compatible. The file system can be mounted and resumed. The single check and repair of the file system takes so only seconds, the computer is quickly ready for operation after a crash.
The data which included in the journal, are copies
of the file system meta data, such as inode blocks, tables of used
inode an blocks, and also directory data blocks and some more. A
journal can be external to a different block device, for example:
on a different disk. (This is supported by ext4magic But it's
still untested). But mostly, the journal is used as a invisible
file at inode number 8 and has a fixed file size. A typical and
quite frequently encountered size is 128MByte for a journal of a
medium or large file system with a block size of 4096 Bytes.
This journal has a small administrative head (1 or 2 blocks) and the remaining blocks will be used as a ring buffer for the journal data. On a journal of 128MByte size and a block size of 4KByte are available 32767 blocks for the journal data.
Depending on the amount of a certain number of changes to be written, the file system blocks packed together into a kind of virtual packet and sequentially written to the journal. Such a package is called transaction and has a serial number in the journal. The first block of a transaction is a table "in this transaction which journal block is a copy of which file system block". A transaction can be only one block copy, but can also include many block copies, depending on how much things have just changed the file system.
ext4magic has a function to display the contents of the journals and the transaction numbers.
ROBI@LINUX:~ # ext4magic /dev/sda3 -T -x | head -15 Filesystem in use: /dev/sda3 Using internal Journal at Inode 8 Found 24318 copy of Filesystemblock in Journal FS-Block Journal Transact Time in sec Time of Transaction 528253 2 160836 1274608139 Sun May 23 11:48:59 2010 526364 3 160836 1274608142 Sun May 23 11:49:02 2010 528252 4 160836 1274608144 Sun May 23 11:49:04 2010 530214 5 160836 1274608139 Sun May 23 11:48:59 2010 528644 6 160836 1274608143 Sun May 23 11:49:03 2010 529252 7 160836 1274608139 Sun May 23 11:48:59 2010 528109 8 160836 1274608139 Sun May 23 11:48:59 2010 527224 9 160836 1274608139 Sun May 23 11:48:59 2010 531243 10 160836 1274608139 Sun May 23 11:48:59 2010
This was cut off here in the output, there are over 32000 blocks in the journal, this is a long list.
Column: the file system block number from which the copy is
Column: the journal block number of this block copy
Column: the transaction number for this copy
Column: (only for inode blocks), the largest timestamp of any inode in this block
Column: the time from the previous column in a different format
The times only available for optional use of the "-x" option.
ext4magic can also search selectiv for specific block
copies.
example: all copies of a specific inode or a all
inode of a specific file
ROBI@LINUX:~ # ext4magic /dev/sda3 -T -x -f etc/passwd
Filesystem in use: /dev/sda3
Using internal Journal at Inode 8
Inode found "etc/passwd" 565
Inode 565 is at group 0, block 1092, offset 1024
Transactions of Filesystemblock 1092 in Journal
FS-Block Journal Transact Time in sec Time of Transaction
1092 534 160841 1274608166 Sun May 23 11:49:26 2010
1092 611 160842 1274608168 Sun May 23 11:49:28 2010
1092 9282 160664 1274559675 Sat May 22 22:21:15 2010
1092 9468 160673 1274559723 Sat May 22 22:22:03 2010
1092 9922 160716 1274560203 Sat May 22 22:30:03 2010
1092 21004 140650 1272749403 Sat May 1 23:30:03 2010
1092 21621 140659 1272749466 Sat May 1 23:31:06 2010
1092 21697 140663 1272749467 Sat May 1 23:31:07 2010
In theory, assuming 4 KByte block size, journal size 128 MByte and an inode size of 265 Bytes: there would be theoretically up to 500000 inode copies. A 24 GByte ext3/4 file system created with the default values has already three times of this amount of inodes. However, have inode in a ext3/4 file system and used inodes are two things. The number of used files determines how many inode be really used. But is not only stored inode blocks in the Journal. Also other administrative blocks of the file system have copies there. Often are of particular files many inode copies exist, for example: if the file has changed many times.
The journal is used as a ring buffer, this means, is no more space available on the journal, the oldest data are overwritten, and so on.
Any change of a file cause a copy of the inode block
which contains the inode of these file. Changes are for example:
creation or deletion of files, modify and edding content, rename or
move, modify the owner or permissions. But that's not all.
If
the file system used with the default options, then any reading of
a file change the atime timestamp of this file. And this change the
inode data, and even then a copy of the inode block writen to
journal. And that's a lot, each file and each directory that will
be used must be read. Much program use a lot of configuration files
or need and use a variety of other programs and files. Each of the
touched files then changes the atime. Thus, many inode block copies
written to the journal, without that the user intentionally caused
this. For example, the command "find" reads
recursiv all directory inode, or a backup read all files of the
file system and thus produces a current copy of everey used inode
in the journal or a "grep" read all files in the
directory, or a background process automatically indexes all the
multimedia files of a user home, etc. When a inode change, is
always written the entire inode block. (a block has typically 16
inode) Thus, the journal always contain many inode copies of
visited areas of the file system.
Tested with the data of a backup from a newly installed openSUSE (over 100000 files on 6000 directories, installation size 3.5 GByte) tested on a new 130 GByte file system, ext4, 4 KByte block size and 128 MByte journal size.
To her we are always assume the journal operates as a
ring buffer and overwrites only the oldest data. That's true but
only partly. This is only true so long as the file system is
mounted.
By a new mount the journal is not written further at
the last position. Then always the journal is written at the
beginning. This will overwrite journal data although even older
data in the middle or at the end of the journal are available. When
a system will mount or reboot at short intervals, and during the
mount only a few data are written to the journal, then always only
the blocks at the beginning of the journal are used. This will
always overwrite the data from the last mount. At the end of the
journal are very old journal data and from a larger range of time
no journal data are available. This creates holes in time and
fragments the usable journal data. The capacity for useful related
information of the journal is then low. Then ext4magic can not find
all files and directories of the last days, but can recover very
very old inode copies. (but often with a wrong file content)
This
is intensified by the fact, if the first block of an old
transaction is overwritten, all blocks of this old transaction are
invalid. It is impossible to determine from which block this is are
copies. For large overwritten transactions many old journals blocks
are useless.
Ext4magic: Inode - Directory - Journal - Install - Time_Options - Histogram - Scenarios - Tips&Tricks - Manpage - Expert-Mode |