In this article, we want to teach you How to Install Hadoop on Debian 10 Linux VPS step by step. Hadoop is an open-source software framework that enables distributed processing of large data on clusters of servers. This framework is written in Java is designed to perform distributed processing on thousands of machines with high fault tolerance. Instead of relying on expensive hardware, fault tolerance in these clusters is applied by the software’s ability to detect and manage layer failure. Leading users of Hadoop are Facebook and Yahoo.

Learn Install Hadoop on Debian Linux

What is Hadoop?

Hadoop is using to split and distribute centralized files. Hadoop is licensed by Apache and is programmed by Java. Following the increase in data exchange, Google was looking for a way to increase the speed and efficiency of its servers, inventing a unique distribution system called GFS. GFS stands for Google File System. Following this success, the Apache Distribution Association planned to expand the technology to a wider scale, created the Hadoop system.

Prerequisites to Installing Hadoop on Debian

You must first add the repository:

sudo add-apt-repository ppa:openjdk-r/ppa

Update system packages with the following command:

Sudo apt update

Install open-jdk-8 by entering the following command:

sudo apt install openjdk-8-jdk

In this step, you should install ssh by using the following command:

sudo apt install ssh

You can now install rsync:

sudo apt install rsync

Setup ssh without passphase using the following command:

ssh-keygen -t rsa

In the next step, you must enter the following command:

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Note 1: To connect to the host localhost port 22 ssh, you must first restart ssh by entering the following command:

service ssh restart

Note 2: This can be a permission issue, so try using the chmod command:

chmod -R 700 ~/.ssh
chmod -R 700 ~/.ssh
chmod 644 ~/.ssh/authorized_keys
chmod 644 ~/.ssh/known_hosts
chmod 644 ~/.ssh/config
chmod 600 ~/.ssh/id_rsa
chmod 644 ~/.ssh/id_rsa.pub

Now run again by entering the following command:

ssh localhost

How to Install Hadoop on Debian 10

At first, You should now download Apache Hadoop.

In this step, after downloading the file, you must extract the file with the help of the following command:

tar -xzf Hadoop-3.2.1.tar.gz

Now you need to copy the Hadoop folder to your desired location and rename it.

In this step, you need to edit the .bashrc folder [location: ~ (home directory)] and insert the code given in the image below into .bashrc. Note that change the username in HADOOP_HOME according to your username.

#for hadoop

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 #JAVA_JDK directory

export HADOOP_HOME=/home/username/hadoop #location of your hadoop file directory

export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_USER_CLASSPATH_FIRST=true

alias hadoop=$HADOOP_HOME/bin/./hadoop #for convenience
alias hdfs=$HADOOP_HOME/bin/./hdfs #for convenience

#done

Enter the following command to get the JAVA_JDK path command:

readlink -f \$(which java)

Now reload the .bashrc file by entering the following command to apply the changes:

source .bashrc

At this point, start editing the files in Hadoop/etc/Hadoop by adding the core-site.xml code below:

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

You should also add the hdfs-site.xml code below.Do not forget to change your username.

<configuration>
<property>
<name>dfs.name.dir</name>
<value>file:///home/username/pseudo/dfs/name</value> <!-- username = use `whoami` command in terminal to know your username in machine -->
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/username/pseudo/dfs/data</value> <!-- username = use `whoami` command in terminal to know your username in machine -->
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

Add the mapred-site.xml code below:

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
</configuration>

Finally, add the code hadoop-env.sh below:

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 #JAVA_JDK directory

Now enter the following code to run the JAVA_JDK path:

readlink -f \$(which java)

In this step, you must format the Hadoop file by running the following command:

hadoop namenode -format

How to Run Hadoop on Debian 10

To run Hadoop, just enter the following command:

HADOOP_HOME/sbin/start-all.sh

Now if you go to HTTP://localhost: 50070 from your browser, you will get your Hadoop working.

http://localhost: 50070 Moved to http://localhost: 9870 because Hadoop 3.0.0 РAlpha 1 changed the port configuration.

The following command is used to check the process and port:

jps

Use the following command to stop Hadoop:

HADOOP_HOME/sbin/stop-all.sh

After the PC started, you can enable Hadoop by entering the following command:

HADOOP_HOME/sbin/start-all.sh`

The default port number to access all applications of cluster 8088 is as follows:

http://localhost:8088/

Conclusion

Hadoop is an open-source software used to split and distribute centralized files. In this article, Hadoop and its features are fully introduced. If you were planning to install Hadoop but did not know enough about how to install Hadoop in Debian 10, you can easily do so with our step-by-step tutorials.