In this article, we want to teach you How to Install Hadoop on Debian 10 Linux VPS step by step. Hadoop is an open-source software framework that enables distributed processing of large data on clusters of servers. This framework is written in Java is designed to perform distributed processing on thousands of machines with high fault tolerance. Instead of relying on expensive hardware, fault tolerance in these clusters is applied by the software’s ability to detect and manage layer failure. Leading users of Hadoop are Facebook and Yahoo.
Learn Install Hadoop on Debian Linux
What is Hadoop?
Hadoop is using to split and distribute centralized files. Hadoop is licensed by Apache and is programmed by Java. Following the increase in data exchange, Google was looking for a way to increase the speed and efficiency of its servers, inventing a unique distribution system called GFS. GFS stands for Google File System. Following this success, the Apache Distribution Association planned to expand the technology to a wider scale, created the Hadoop system.
Prerequisites to Installing Hadoop on Debian
You must first add the repository:
sudo add-apt-repository ppa:openjdk-r/ppa
Update system packages with the following command:
Sudo apt update
Install open-jdk-8 by entering the following command:
sudo apt install openjdk-8-jdk
In this step, you should install ssh by using the following command:
sudo apt install ssh
You can now install rsync:
sudo apt install rsync
Setup ssh without passphase using the following command:
ssh-keygen -t rsa
In the next step, you must enter the following command:
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Note 1: To connect to the host localhost port 22 ssh, you must first restart ssh by entering the following command:
service ssh restart
Note 2: This can be a permission issue, so try using the chmod command:
chmod -R 700 ~/.ssh chmod -R 700 ~/.ssh chmod 644 ~/.ssh/authorized_keys chmod 644 ~/.ssh/known_hosts chmod 644 ~/.ssh/config chmod 600 ~/.ssh/id_rsa chmod 644 ~/.ssh/id_rsa.pub
Now run again by entering the following command:
How to Install Hadoop on Debian 10
At first, You should now download Apache Hadoop.
In this step, after downloading the file, you must extract the file with the help of the following command:
tar -xzf Hadoop-3.2.1.tar.gz
Now you need to copy the Hadoop folder to your desired location and rename it.
In this step, you need to edit the .bashrc folder [location: ~ (home directory)] and insert the code given in the image below into .bashrc. Note that change the username in HADOOP_HOME according to your username.
#for hadoop export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 #JAVA_JDK directory export HADOOP_HOME=/home/username/hadoop #location of your hadoop file directory export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_USER_CLASSPATH_FIRST=true alias hadoop=$HADOOP_HOME/bin/./hadoop #for convenience alias hdfs=$HADOOP_HOME/bin/./hdfs #for convenience #done
Enter the following command to get the JAVA_JDK path command:
readlink -f \$(which java)
Now reload the .bashrc file by entering the following command to apply the changes:
At this point, start editing the files in Hadoop/etc/Hadoop by adding the core-site.xml code below:
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration>
You should also add the hdfs-site.xml code below.Do not forget to change your username.
<configuration> <property> <name>dfs.name.dir</name> <value>file:///home/username/pseudo/dfs/name</value> <!-- username = use `whoami` command in terminal to know your username in machine --> </property> <property> <name>dfs.data.dir</name> <value>file:///home/username/pseudo/dfs/data</value> <!-- username = use `whoami` command in terminal to know your username in machine --> </property> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
Add the mapred-site.xml code below:
<configuration> <property> <name>mapred.job.tracker</name> <value>localhost:8021</value> </property> </configuration>
Finally, add the code hadoop-env.sh below:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 #JAVA_JDK directory
Now enter the following code to run the JAVA_JDK path:
readlink -f \$(which java)
In this step, you must format the Hadoop file by running the following command:
hadoop namenode -format
How to Run Hadoop on Debian 10
To run Hadoop, just enter the following command:
Now if you go to HTTP://localhost: 50070 from your browser, you will get your Hadoop working.
http://localhost: 50070 Moved to http://localhost: 9870 because Hadoop 3.0.0 – Alpha 1 changed the port configuration.
The following command is used to check the process and port:
Use the following command to stop Hadoop:
After the PC started, you can enable Hadoop by entering the following command:
The default port number to access all applications of cluster 8088 is as follows:
Hadoop is an open-source software used to split and distribute centralized files. In this article, Hadoop and its features are fully introduced. If you were planning to install Hadoop but did not know enough about how to install Hadoop in Debian 10, you can easily do so with our step-by-step tutorials.