Ext4magic-Journal



Ext4magic: Inode - Directory - Journal - Install - Time_Options - Histogram - Scenarios - Tips&Tricks - Manpage - Expert-Mode





Contents

The journal of ext3 ext4 file system

Simplified you could imagine the file system journal a task book of the file system. There is written what needs to be changed on the file system. Immediately before the change is made to the file system itself, the change is here written directly to disk. If the change in the file system then takes itself, then in the task book this action is marked as completed.



The purpose and function role of the journal

Crash the machine during a write operation, or is preventing a other problems which the operating system, then can not complete a write request to really to drive. So it's possibly the file system data unfinished written and the file data blocks do not match with the meta data. The result: The file system is damaged and need repair. For ext2, which still has no journal, take this file system check is a long time, greater the file system is, the repair takes longer. After a crash, have now several large file systems are checked before they can be mounted and used. The boot process will slow down to minutes, hours ....

In a journal file system the repair function see now unexecuted orders in the journal. These orders are now easily processed, in which the journal data stored in the journal blocks are copied to the correct location in the file system. So the file system is restored to the consistent state. The file system meta data are consistent and compatible. The file system can be mounted and resumed. The single check and repair of the file system takes so only seconds, the computer is quickly ready for operation after a crash.


The data which included in the journal, are copies of the file system meta data, such as inode blocks, tables of used inode an blocks, and also directory data blocks and some more. A journal can be external to a different block device, for example: on a different disk. (This is supported by ext4magic But it's still untested). But mostly, the journal is used as a invisible file at inode number 8 and has a fixed file size. A typical and quite frequently encountered size is 128MByte for a journal of a medium or large file system with a block size of 4096 Bytes.

This journal has a small administrative head (1 or 2 blocks) and the remaining blocks will be used as a ring buffer for the journal data. On a journal of 128MByte size and a block size of 4KByte are available 32767 blocks for the journal data.



Declaration of transaction

Depending on the amount of a certain number of changes to be written, the file system blocks packed together into a kind of virtual packet and sequentially written to the journal. Such a package is called transaction and has a serial number in the journal. The first block of a transaction is a table "in this transaction which journal block is a copy of which file system block". A transaction can be only one block copy, but can also include many block copies, depending on how much things have just changed the file system.

ext4magic has a function to display the contents of the journals and the transaction numbers.

 ROBI@LINUX:~ # ext4magic /dev/sda3 -T -x  | head -15
Filesystem in use: /dev/sda3

Using internal Journal at Inode 8

Found 24318 copy of Filesystemblock in Journal
FS-Block         Journal        Transact        Time in sec     Time of Transaction
      528253           2          160836        1274608139      Sun May 23 11:48:59 2010
      526364           3          160836        1274608142      Sun May 23 11:49:02 2010
      528252           4          160836        1274608144      Sun May 23 11:49:04 2010
      530214           5          160836        1274608139      Sun May 23 11:48:59 2010
      528644           6          160836        1274608143      Sun May 23 11:49:03 2010
      529252           7          160836        1274608139      Sun May 23 11:48:59 2010
      528109           8          160836        1274608139      Sun May 23 11:48:59 2010
      527224           9          160836        1274608139      Sun May 23 11:48:59 2010
      531243          10          160836        1274608139      Sun May 23 11:48:59 2010

This was cut off here in the output, there are over 32000 blocks in the journal, this is a long list.

The meaning of the columns
  1. Column: the file system block number from which the copy is

  2. Column: the journal block number of this block copy

  3. Column: the transaction number for this copy

  4. Column: (only for inode blocks), the largest timestamp of any inode in this block

  5. Column: the time from the previous column in a different format

The times only available for optional use of the "-x" option.



ext4magic can also search selectiv for specific block copies.
example: all copies of a specific inode or a all inode of a specific file

ROBI@LINUX:~ # ext4magic /dev/sda3 -T -x  -f etc/passwd
Filesystem in use: /dev/sda3

Using internal Journal at Inode 8
Inode found "etc/passwd"   565
Inode 565 is at group 0, block 1092, offset 1024

Transactions of Filesystemblock 1092 in Journal
FS-Block         Journal        Transact        Time in sec     Time of Transaction
        1092         534          160841        1274608166      Sun May 23 11:49:26 2010
        1092         611          160842        1274608168      Sun May 23 11:49:28 2010
        1092        9282          160664        1274559675      Sat May 22 22:21:15 2010
        1092        9468          160673        1274559723      Sat May 22 22:22:03 2010
        1092        9922          160716        1274560203      Sat May 22 22:30:03 2010
        1092       21004          140650        1272749403      Sat May  1 23:30:03 2010
        1092       21621          140659        1272749466      Sat May  1 23:31:06 2010
        1092       21697          140663        1272749467      Sat May  1 23:31:07 2010





How many data are in such a journal

In theory, assuming 4 KByte block size, journal size 128 MByte and an inode size of 265 Bytes: there would be theoretically up to 500000 inode copies. A 24 GByte ext3/4 file system created with the default values has already three times of this amount of inodes. However, have inode in a ext3/4 file system and used inodes are two things. The number of used files determines how many inode be really used. But is not only stored inode blocks in the Journal. Also other administrative blocks of the file system have copies there. Often are of particular files many inode copies exist, for example: if the file has changed many times.

The journal is used as a ring buffer, this means, is no more space available on the journal, the oldest data are overwritten, and so on.



What does trigger the generation of inode copies

Any change of a file cause a copy of the inode block which contains the inode of these file. Changes are for example: creation or deletion of files, modify and edding content, rename or move, modify the owner or permissions. But that's not all.
If the file system used with the default options, then any reading of a file change the atime timestamp of this file. And this change the inode data, and even then a copy of the inode block writen to journal. And that's a lot, each file and each directory that will be used must be read. Much program use a lot of configuration files or need and use a variety of other programs and files. Each of the touched files then changes the atime. Thus, many inode block copies written to the journal, without that the user intentionally caused this. For example, the command "find" reads recursiv all directory inode, or a backup read all files of the file system and thus produces a current copy of everey used inode in the journal or a "grep" read all files in the directory, or a background process automatically indexes all the multimedia files of a user home, etc. When a inode change, is always written the entire inode block. (a block has typically 16 inode) Thus, the journal always contain many inode copies of visited areas of the file system.



Small example

Tested with the data of a backup from a newly installed openSUSE (over 100000 files on 6000 directories, installation size 3.5 GByte) tested on a new 130 GByte file system, ext4, 4 KByte block size and 128 MByte journal size.



Holes in the journal

To her we are always assume the journal operates as a ring buffer and overwrites only the oldest data. That's true but only partly. This is only true so long as the file system is mounted.
By a new mount the journal is not written further at the last position. Then always the journal is written at the beginning. This will overwrite journal data although even older data in the middle or at the end of the journal are available. When a system will mount or reboot at short intervals, and during the mount only a few data are written to the journal, then always only the blocks at the beginning of the journal are used. This will always overwrite the data from the last mount. At the end of the journal are very old journal data and from a larger range of time no journal data are available. This creates holes in time and fragments the usable journal data. The capacity for useful related information of the journal is then low. Then ext4magic can not find all files and directories of the last days, but can recover very very old inode copies. (but often with a wrong file content)
This is intensified by the fact, if the first block of an old transaction is overwritten, all blocks of this old transaction are invalid. It is impossible to determine from which block this is are copies. For large overwritten transactions many old journals blocks are useless.





Nice to know about the journal



What kind of journal data can be processed by ext4magic





Ext4magic: Inode - Directory - Journal - Install - Time_Options - Histogram - Scenarios - Tips&Tricks - Manpage - Expert-Mode