This distributed file system is designed to efficiently store, manage, and retrieve large files across multiple machines in a network.
The architecture of YADFS involves the implementation of data nodes and name nodes for the storage and management of data. The key components are:
- Manages metadata about files and directories.
- Maintains a namespace hierarchy and file-to-block mapping.
- Monitors the health and availability of Data Nodes.
- Periodically pings each Data Node.
- Data Nodes acknowledge the Name Node ping.
- Responsible for handling the reading and writing of data.
- Provides API for write and read operations on data blocks.
- Considers replication across Data Nodes with a replication factor of 3.
- Data is stored in the form of files within folders.
- At least one root folder exists.
- Users can view a virtual tree of all folders and files.
- Data blocks are used to store and manage large files efficiently.
- Each file is divided into fixed-size blocks distributed across Data Nodes.
- Metadata tracks the location of each block.
- Implements mechanisms to handle Data Node failures.
- Maintains multiple replicas of data blocks.
- Detects failed nodes and redistributes data blocks to healthy nodes.
Develop a command line interface or web interface for interacting with YADFS, supporting the following actions:
- Create, delete, move, and copy directories and files.
- List files and directories within a directory.
- Traverse directories.
- Upload and download files from YADFS.
-
File Splitting: - The client divides the file into fixed-size blocks.
-
Block Creation: - The client assigns a unique identifier to each block.
-
Uploading Blocks: - Client sends data blocks to Data Nodes.
-
Replication: - System creates replicas for fault tolerance.
-
Metadata Update: - Client updates metadata with file information.
-
Namespace Resolution: - Client and file system determine block storage.
-
Client Acknowledgment: - Client receives acknowledgements from Data Nodes.
-
Client Request: - Client requests file download from DFS.
-
Metadata Retrieval: - Client retrieves file information from Name Node.
-
Block Location Retrieval: - Client learns data block locations.
-
Data Block Retrieval: - Client retrieves data blocks from Data Nodes.
-
Data Transfer: - Data Nodes transfer blocks to the client.
-
Reassembly: - Client reassembles blocks into the original file.
-
File Completion Check: - Client checks successful retrieval of all data blocks.
-
Cleanup: - Client may delete temporary data and close connections.
-
Client Request: - Client sends operation request to Name Node.
-
NameNode Verification: - Name Node verifies the validity of the operation.
-
NameNode Operation: - Name Node operates and updates metadata.
-
Client Response: - The client receives a response from Name Node regarding operation status.