I'm trying to remove two data nodes, out of a three-node BI 2.0 cluster.
I've updated hadoop-conf/hdfs-site.xml, setting dfs.replication.min=1 and dfs-replication=1.
However, when I try to remove a node with "removenode.sh hadoop host51", for instance, it complains that the number of remaining slaves is less than the number of expected HDFS replicas (3).
I restarted the cluster after updating hdfs-site.xml several times, but the same error persists.
I also ran "hadoop dfs -setrep -w 1 /", to make sure all files are set to 1 replica, with no luck.
Is there any other file to be updated, or any other step to be followed?
This topic has been locked.
3 replies Latest Post - 2013-02-15T19:08:54Z by jlerm
Pinned topic Unable to remove data nodes
Answered question This question has been answered.
Unanswered question This question has not been answered yet.
Updated on 2013-02-15T19:08:54Z at 2013-02-15T19:08:54Z by jlerm
SystemAdmin 110000D4XK603 PostsACCEPTED ANSWER
Re: Unable to remove data nodes2013-02-14T06:21:57Z in response to jlermHi Julius,
When you are modifying Hadoop configuration files within BigInsights install, please consult this page for more details and the proper instructions.
Going back to your original question. Here is your checklist:
What roles or components are running on "host51"? Is it only Hadoop (datanode and tasktracker) or there are more?
After you have followed the exact procedure I have given above to modify and propagate your Hadoop configuration changes, are you still have issue removing your datanode?
Outside of the these questions, it is not normal or typical to remove datanodes below the minimal HDFS replica count. You will have an increase risk of data loss.
Hope this helps.
Re: Unable to remove data nodes2013-02-15T19:08:54Z in response to SystemAdminI forgot to address your point regarding the minimal replica count.
I was trying this out on my own laptop, with a set of VMs.
I had installed BigInsights on 3 VMs, and I wanted to remove the two data nodes so I don't have to bring them all up for the next things I want to work on. Basically I no longer need a distributed configuration, a single node is enough.
I understand that this should never be done in a real world environment.
Thanks a lot,