Hadoop Installation in Linux

0
1014

Hadoop

Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.In this article we will known about the  Hadoop installation in Linux.

Installing Java

First Change the directory to where the Java to be installed.

[root@localhost ~]# cd /opt/
[root@localhost opt]#wget http://download.oracle.com/otn-pub/java/jdk/8u91-b14/jdk-8u91-linux-x64.tar.gz
[root@localhost opt]#tar xzf jdk-8u91-linux-x64.tar.gz
[root@localhost opt]# cd jdk1.8.0_91/
[root@localhost jdk1.8.0_91]# alternatives --install /usr/bin/java java /opt/jdk1.8.0_91/bin/java 2
[root@localhost jdk1.8.0_91]# alternatives --config java

There are 4 programs which provide 'java'.

  Selection    Command
-----------------------------------------------
   1      /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.101-2.6.6.1.el7_2.x86_64/jre/bin/java
*  2      /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.91-0.b14.el7_2.x86_64/jre/bin/java
 + 3      /opt/jdk1.7.0_79/bin/java
   4      /opt/jdk1.8.0_91/bin/java

Enter to keep the current selection[+], or type selection number: 4
[root@localhost jdk1.8.0_91]#java -version
java version "1.8.0_91"
Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)
[root@localhost jdk1.8.0_91]#alternatives --install /usr/bin/jar jar /opt/jdk1.8.0_91/bin/jar 2
[root@localhost jdk1.8.0_91]#alternatives --install /usr/bin/javac javac /opt/jdk1.8.0_91/bin/javac 2
[root@localhost jdk1.8.0_91]#alternatives --set jar /opt/jdk1.8.0_91/bin/jar
[root@localhost jdk1.8.0_91]#alternatives --set javac /opt/jdk1.8.0_91/bin/javac
[root@localhost jdk1.8.0_91]#export JAVA_HOME=/opt/jdk1.8.0_91
[root@localhost jdk1.8.0_91]#export JRE_HOME=/opt/jdk1.8.0_91/jre
[root@localhost jdk1.8.0_91]#export PATH=$PATH:/opt/jdk1.8.0_91/bin:/opt/jdk1.8.0_91/jre/bin

Apache Hadoop Installation

Now we have to create a user hadoop before we proceeding to install Hadoop. By using useradd command we can add user hadoop as follows

useradd hadoop
passwd hadoop

Now we need to configure ssh keys for the user hadoop as follows

su - hadoop
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
 

Now we have to download Hadoop latest version from their official site

cd ~
wget http://apache.claz.org/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz
tar xzf hadoop-2.7.2.tar.gz
mv hadoop-2.7.2 hadoop

Now the next step is to set environment variable uses by hadoop.
Edit ~/.bashrc file and add the following listes of values at end of file.

export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

Then apply the changes in current running environment

source ~/.bashrc

edit $HADOOP_HOME/etc/hadoop/hadoop-env.sh file and set JAVA_HOME environment variable

export JAVA_HOME=/opt/jdk1.8.0_91/

Now you start with the configuration with basic hadoop single node cluster setup.
First edit hadoop configuration files and make following changes.

 cd /home/hadoop/etc/hadoop

Let’s start by editing core-site.xml

<configuration>
<property>
  <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
</property>
</configuration>

Then Edit hdfs-site.xml:

<configuration>
<property>
 <name>dfs.replication</name>
 <value>1</value>
</property>

<property>
  <name>dfs.name.dir</name>
    <value>file:///home/hadoop/hadoopdata/hdfs/namenode</value>
</property>

<property>
  <name>dfs.data.dir</name>
    <value>file:///home/hadoop/hadoopdata/hdfs/datanode</value>
</property>
</configuration>

Now edit mapred-site.xml:

<configuration>
 <property>
  <name>mapreduce.framework.name</name>
   <value>yarn</value>
 </property>
</configuration>

finally edit yarn-site.xml:

<configuration>
 <property>
  <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
 </property>
</configuration>

Now format the namenode using following command:

hdfs namenode –format

To start all hadoop services use the following command:

cd /home/hadoop/sbin/
start-dfs.sh
start-yarn.sh

To check if all services are started well use ‘jps‘ command:

jps

You should see like this output.

Hadoop jps command

Now you can access to Hadoop Services in your Browser at: http://your-ip-address:8088/.

Hadoop installation home screen

We can check hadoop version by using command hadoop version as follows

hadoop version command

Now Enjoy Hadoop in Your machine

SHARE
Previous articleGlassfish installation in Linux
Next articleMariaDB Installation in Linux
This is Naga Ramesh Reddy from Bangalore (India).I have 4+ years of experience in System and Network Administration field. I like to read and write about Linux, Cisco, Microsoft and DevOps technologies and the latest software releases. Particularly I am very interested about Linux flavors like Centos, RHEL, Ubuntu and Linux Mint.