Install Hadoop 2.x or newer on Windows

Try to install Hadoop 2.5.0 on Windows based on the article and had a lot of problems than tried it on MacOX.

Article on Apache Website -  http://wiki.apache.org/hadoop/Hadoop2OnWindows


These are some errors I got while giving it a try:


Solution :
The binary distribution of Apache Hadoop 2.2.0 release does not contain some windows native components (likewinutils.exe, hadoop.dll etc). These are required (not optional) to run Hadoop on Windows. For this post only, the author of post below link have separately built hadoop-common-project to generate all the required native components. (Assuming you are using default Apache Hadoop 2.2.0 binary distribution) just download and copy all the files of hadoop-common-2.2.0/bin folder and paste them to /bin folder. And Enjoy!!!



Error : Connection Refused

Solution : 
You get a ConnectionRefused Exception when there is a machine at the address specified, but there is no program listening on the specific TCP port the client is using -and there is no firewall in the way silently dropping TCP connection requests. If you do not know what a TCP connection request is, please consult the specification.

Unless there is a configuration error at either end, a common cause for this is the Hadoop service isn't running.

This stack trace is very common when the cluster is being shut down -because at that point Hadoop services are being torn down across the cluster, which is visible to those services and applications which haven't been shut down themselves. Seeing this error message during cluster shutdown is not anything to worry about.

If the application or cluster is not working, and this message appears in the log, then it is more serious.
Check the hostname the client using is correct
Check the IP address the client is trying to talk to for the hostname is correct.
Make sure the destination address in the exception isn't 0.0.0.0 -this means that you haven't actually configured the client with the real address for that

service, and instead it is picking up the server-side property telling it to listen on every port for connections.
Check that there isn't an entry for your hostname mapped to 127.0.0.1 or 127.0.1.1 in /etc/hosts (Ubuntu is notorious for this)
Check the port the client is trying to talk to using matches that the server is offering a service on.


On the server, try a telnet localhost <port> to see if the port is open there.


On the client, try a telnet <server> <port> to see if the port is accessible remotely.
Try connecting to the server/port from a different machine, to see if it just the single client misbehaving.
If you are using a Hadoop-based product from a third party, including those from Cloudera, Hortonworks, Intel, EMC and others -please use the support channels provided by the vendor.


Please do not file bug reports related to your problem, as they will be closed as Invalid




Solution :
Problem could be that NN was formatted after cluster was set up and DN were not, so slaves are still referring to old NN.

We have to delete and recreate the folder /home/hadoop/dfs/data on local FS for DN.
Check your hdfs-site.xml file to see where dfs.data.dir is pointing to
and delete that folder
and then restart the DN daemon on machine

Above steps should create folder and resolve the problem.


Suppose the installation is finished perfectly and the node set appropriately, the steps for running job is as below:

Step 1 : Run setting command 

c:\deploy\etc\hadoop\hadoop-env.cmd

Step 2: Run HDFS deamon

%HADOOP_PREFIX%\sbin\start-dfs.cmd

Step 3: Run YARN deamon

%HADOOP_PREFIX%\sbin\start-yarn.cmd

Step 4: Run Test job (Finally!!!)

%HADOOP_PREFIX%\bin\yarn jar %HADOOP_PREFIX%\share\hadoop\mapreduce\hadoop-mapreduce-example
s-2.5.0.jar wordcount /myfile.txt /out




















It is running the task....




















Check the result while finished...