In post that explains the HDFS architecture, we saw that HDFS namespace is stored and maintained by NameNode.
What is HDFS Namespace?
Namespace is a hierarchy of directories, files and blocks in HDFS. It supports file system operations such as creation, modification, deletion and listing of files and directories.
One of the important features of HDFS is it’s high availability and it’s fault-tolerance and it’s ability to get back to a working state immediately in the event of a failure. But, how does it happen? How does HDFS make itself resilient to failures and bounces back? Let us see what happens internally which makes it so good at handling failures.
NameNode maintains two kinds of important logs/files that come handy in the event of any corruption or failure.
- EditLog – It is a transactional log maintained by the NameNode which records every single change in the file system. For example, if a new directory is created, an entry of it will be added to the EditLog. If a file is created, renamed or deleted, it’s entry will be added to the EditLog. Changing the replication factor of a file causes a new record to be inserted into the EditLog. Basically, EditLog will contain all the entries with respect to any kind of change or modification that happens in the file system. The NameNode uses a file in its local host OS file system to store the EditLog.
- FSImage – The entire file system namespace, including the mapping of blocks to files and file system properties, is stored in a file called the FsImage. The FsImage is stored as a file in the NameNode’s local file system too.
The NameNode keeps an image of the entire file system namespace and file Blockmap in memory. When the NameNode starts up, it reads the FsImage and EditLog from disk, applies all the transactions from the EditLog to the in-memory representation of the FsImage, and flushes out this new version into a new FsImage on disk. It can then truncate the old EditLog because its transactions have been applied to the persistent FsImage. This process is called a checkpoint.
Apart from this, there is another report, called as the Blockreport, which is maintained and sent by the DataNode to the NameNode. When a DataNode starts up, it scans through its local file system, generates a list of all HDFS data blocks that correspond to each of these local files and sends this report to the NameNode. This report is the Blockreport.
The EditLog and FSImage are very important files and can be helpful to bring back the system when a NameNode fails. As mentioned earlier, these files are stored on the NameNode. But, a corruption of these files or failure of a NameNode, can bring the entire system to a halt. For this reason, NameNode is configured to maintain multiple copies of these two files. Also, copies of these are stored on the backup NameNode as well to be used in case of the failure of NameNode. Any update to either the FsImage or EditLog causes each of the copies of FsImages and EditLogs to get updated synchronously. In the event of a failure of the NameNode, the latest consistent FsImage and EditLog are used when the NameNode restarts.
Along with these, the recent releases of HDFS support HDFS Snapshots.
- HDFS Snapshots are read-only point-in-time copies of the file system. Snapshots can be taken on a subtree of the file system or the entire file system.
- Some common use cases of snapshots are data backup, protection against user errors and disaster recovery.
- Snapshot creation is instantaneous: the cost is O(1) excluding the inode lookup time.
- Additional memory is used only when modifications are made relative to a snapshot: memory usage is O(M), where M is the number of modified files/directories.
- Blocks in DataNodes are not copied: the snapshot files record the block list and the file size. There is no data copying.
- Snapshots do not adversely affect regular HDFS operations: modifications are recorded in reverse chronological order so that the current data can be accessed directly. The snapshot data is computed by subtracting the modifications from the current data.
For more details on Snapshots, I suggest you to go through it’s page on Apache website.