Which issues can I expect to pop up with Hadoop Distributed File System (HDFS) and NameNode in Hadoop architecture?
According to advanced Hadoop users and vendors that offer alternatives to HDFS, there are a number of issues you can expect. The one cited most often is that NameNode represents a single point of failure. When it goes offline, the cluster shuts down and has to be restarted at the beginning of the process that was running at the time of the failure. The Apache Hadoop community is working to address this problem, and Version 2.0 of Hadoop includes manual failover to a standby NameNode without the need to restart the cluster. A newer release is expected to include automated NameNode failover. Vendors are also coming to market with fixes such as a NameNode failover mode in HDFS, as well as file-system alternatives that don't use a NameNode function (that means no NameNode to fail).
This was first published in November 2012