Transition Styles

Please select from different transitions:
None - Fade - Slide - Convex - Concave - Zoom

Themes

Please select from a few themes built in:
Black (default) - White - League - Sky - Beige - Simple
Serif - Blood - Night - Moon - Solarized

Hadoop Distributed File System

Created by

Ganesh Thakur / @dx
Prakash Pandey / @pcp
Viresh Dhawan / @vd

OUTLINE

Basic Features: HDFS

Architecture

Data Organisation

Basic Features: HDFS

Highly fault-tolerant

High throughput

Suitable for applications with large data sets

Streaming access to file system data

Can be built out of commodity hardware

Fault tolerance

A HDFS instance may consist of thousands of server machines, each storing part of the file system’s data.

Since we have huge number of components and that each component has non-trivial probability of failure means that there is always some component that is non-functional

Detection of faults and quick, automatic recovery from them is a core architectural goal of HDFS

Data Characteristics

Streaming data access

Applications need streaming access to data

Batch processing rather than interactive user access

Large data sets and files: gigabytes to terabytes size

High aggregate data bandwidth

Scale to hundreds of nodes in a cluster

Tens of millions of files in a single instance

Architecture

Namenode and Datanodes

Master/slave architecture

HDFS cluster consists of a single Namenode, a master server that manages the file system namespace and regulates access to files by clients

There are a number of DataNodes usually one per node in a cluster

The DataNodes manage storage attached to the nodes that they run on

HDFS exposes a file system namespace and allows user data to be stored in files

A file is split into one or more blocks and set of blocks are stored in DataNodes

DataNodes: serves read, write requests, performs block creation, deletion, and replication upon instruction from Namenode

HDFS Architecture

File system Namespace

Hierarchical file system with directories and files
Create, remove, move, rename etc.
Namenode maintains the file system
Any meta information changes to the file system recorded by the Namenode.
An application can specify the number of replicas of the file needed: replication factor of the file. This information is stored in the Namenode.

Replica Placement

The placement of the replicas is critical to HDFS reliability and performance.
Optimizing replica placement distinguishes HDFS from other distributed file systems.
Rack-aware replica placement:
Goal: improve reliability, availability and network bandwidth utilization
Research topic
Many racks, communication between racks are through switches

Network bandwidth between machines on the same rack is greater than those in different racks.
Namenode determines the rack id for each DataNode.
Replicas are typically placed on unique racks
Simple but non-optimal
Writes are expensive
Replication factor is 3
Another research topic?
Replicas are placed: one on a node in a local rack, one on a different node in the local rack and one on a node in a different rack.
1/3 of the replica on a node, 2/3 on a rack and 1/3 distributed evenly across remaining racks.

Filesystem Metadata

The HDFS namespace is stored by Namenode.
Namenode uses a transaction log called the EditLog to record every change that occurs to the filesystem meta data.
For example, creating a new file.
Change replication factor of a file
EditLog is stored in the Namenode’s local filesystem
Entire filesystem namespace including mapping of blocks to files and file system properties is stored in a file FsImage. Stored in Namenode’s local filesystem

Namenode

Keeps image of entire file system namespace and file Blockmap in memory.
4GB of local RAM is sufficient to support the above data structures that represent the huge number of files and directories.
When the Namenode starts up it gets the FsImage and Editlog from its local file system, update FsImage with EditLog information and then stores a copy of the FsImage on the filesytstem as a checkpoint.
Periodic checkpointing is done. So that the system can recover back to the last checkpointed state in case of a crash.

Datanode

A Datanode stores data in files in its local file system.
Datanode has no knowledge about HDFS filesystem
It stores each block of HDFS data in a separate file.
Datanode does not create all files in the same directory.
It uses heuristics to determine optimal number of files per directory and creates directories appropriately:
Research issue?
When the filesystem starts up it generates a list of all HDFS blocks and send this report to Namenode: Blockreport.

Data Organization

Data Blocks

HDFS support write-once-read-many with reads at streaming speeds.
A typical block size is 64MB (or even 128 MB).
A file is chopped into 64MB chunks and stored

Staging

A client request to create a file does not reach Namenode immediately.
HDFS client caches the data into a temporary file. When the data reached a HDFS block size the client contacts the Namenode.
Namenode inserts the filename into its hierarchy and allocates a data block for it.
The Namenode responds to the client with the identity of the Datanode and the destination of the replicas (Datanodes) for the block.
Then the client flushes it from its local memory.

Replication Pipelining

When the client receives response from Namenode, it flushes its block in small pieces (4K) to the first replica, that in turn copies it to the next replica and so on.
Thus data is pipelined from Datanode to the next

References

The Hadoop Distributed File System: Architecture and Design by Apache Foundation Inc