Lecture Administration and visualization: Chapter 2.2 - Hadoop distributed file system (HDFS)

Lecture "Administration and visualization: Chapter - Hadoop distributed file system (HDFS)" provides students with content about: Overview of HDFS; HDFS main design principles; HDFS Architecture; Functions of a namenode; . Please refer to the detailed content of the lecture! | Chapter 2 Hadoop distributed file system HDFS Overview of HDFS Provides inexpensive and reliable storage for massive amounts of data Designed for Big files 100 MB to several TBs file sizes Write once read many times Appending only Running on commodity hardware Hierarchical UNIX style file systems . hust soict UNIX style file ownership and permissions 3 HDFS main design principles I O pattern Append only à reduce synchronization Data distribution File is splitted in big chunks 64 MB à reduce metadata size à reduce network communication Data replication Each chunk is usually replicated in 3 different nodes Fault tolerance Data node re-replication Name node Secondary Namenode Standby Active Namenodes HDFS Architecture Master slave architecture HDFS master Namenode Manage namespace and metadata Monitor Datanode HDFS slaves Datanodes Handle read write the actual data chunks Chunks are local files in the local file systems 5 Functions of a Namenode Manages File System Namespace Maps a file name to a set of blocks Maps a block to the Datanodes where it resides Cluster Configuration Management Replication Engine for Blocks Namenode metadata Metadata in memory The entire metadata is in main memory No demand paging of metadata Types of metadata List of files List of Blocks for each file List of Datanodes for each block File attributes . creation time replication factor A Transaction Log Records file creations file deletions etc Datanode A Block Server Stores data in the local file system . ext3 Stores metadata of a block . CRC Serves data and metadata to Clients Block Report Periodically sends a report of all existing blocks to the Namenode Facilitates Pipelining of Data Forwards data to other specified Datanodes Heartbeat Datanodes send heartbeat to the Namenode Once every 3 seconds Namenode uses heartbeats to detect Datanode failure Data replication Chunk placement Current Strategy One replica on local node Second replica on a remote rack Third .

Không thể tạo bản xem trước, hãy bấm tải xuống
TÀI LIỆU MỚI ĐĂNG
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.