How to Install Hadoop on Debian 10 [Complete]
In this article, we want to teach you How to Install Hadoop on Debian 10, step by step. Hadoop is an open-source software framework that enables distributed processing of large data on clusters of servers. This framework is written in Java and is designed to perform distributed processing on thousands of machines with high fault tolerance. Instead of relying on expensive hardware, fault tolerance in these clusters is applied by the software’s ability to detect and manage layer failure. Leading users of Hadoop are Facebook and Yahoo.
If you intend to buy a Linux VPS server, we suggest you use the plans provided on our website, which are with immediate delivery.
Prerequisites to Installing Hadoop on Debian
You must first add the repository:
sudo add-apt-repository ppa:openjdk-r/ppa
Update system packages with the following command:
Sudo apt update
Install open-jdk-8 by entering the following command:
sudo apt install openjdk-8-jdk
In this step, you should install SSH by using the following command:
sudo apt install ssh
You can now install rsync:
sudo apt install rsync
Setup SSH without a passphrase using the following command:
ssh-keygen -t rsa
In the next step, you must enter the following command:
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Note 1: To connect to the host localhost port 22 SSH, you must first restart SSH by entering the following command:
service ssh restart
Note 2: This can be a permission issue, so try using the chmod command:
chmod -R 700 ~/.ssh
chmod -R 700 ~/.ssh
chmod 644 ~/.ssh/authorized_keys
chmod 644 ~/.ssh/known_hosts
chmod 644 ~/.ssh/config
chmod 600 ~/.ssh/id_rsa
chmod 644 ~/.ssh/id_rsa.pub
Now run again by entering the following command:
ssh localhost
How to Install Hadoop on Debian 10
At first, You should now download Apache Hadoop.
In this step, after downloading the file, you must extract the file with the help of the following command:
tar -xzf Hadoop-3.2.1.tar.gz
Now you need to copy the Hadoop folder to your desired location and rename it.
In this step, you need to edit the .bashrc folder [location: ~ (home directory)] and insert the code given in the image below into .bashrc. Note that change the username in HADOOP_HOME according to your username.
#for hadoop
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 #JAVA_JDK directory
export HADOOP_HOME=/home/username/hadoop #location of your hadoop file directory
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_USER_CLASSPATH_FIRST=true
alias hadoop=$HADOOP_HOME/bin/./hadoop #for convenience
alias hdfs=$HADOOP_HOME/bin/./hdfs #for convenience
#done
Enter the following command to get the JAVA_JDK path command:
readlink -f \$ which java
Now reload the .bashrc file by entering the following command to apply the changes:
source .bashrc
At this point, start editing the files in Hadoop/etc/Hadoop by adding the core-site.xml code below:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
You should also add the hdfs-site.xml code below. Do not forget to change your username.
<configuration>
<property>
<name>dfs.name.dir</name>
<value>file:///home/username/pseudo/dfs/name</value> <!-- username = use `whoami` command in terminal to know your username in machine -->
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/username/pseudo/dfs/data</value> <!-- username = use `whoami` command in terminal to know your username in machine -->
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Add the mapred-site.xml code below:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
</configuration>
Finally, add the code hadoop-env.sh below:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 #JAVA_JDK directory
Now enter the following code to run the JAVA_JDK path:
readlink -f \$ which java
In this step, you must format the Hadoop file by running the following command:
hadoop namenode -format
How to Run Hadoop on Debian 10
To run Hadoop, just enter the following command:
HADOOP_HOME/sbin/start-all.sh
Now if you go to HTTP://localhost: 50070 from your browser, you will get your Hadoop working.
http://localhost: 50070 Moved to http://localhost: 9870 because Hadoop 3.0.0 – Alpha 1 changed the port configuration.
The following command is used to check the process and port:
jps
Use the following command to stop Hadoop:
HADOOP_HOME/sbin/stop-all.sh
After the PC starts, you can enable Hadoop by entering the following command:
HADOOP_HOME/sbin/start-all.sh`
The default port number to access all applications of cluster 8088 is as follows:
http://localhost:8088/
Conclusion
Hadoop is an open-source software used to split and distribute centralized files. In this article, Hadoop and its features are fully introduced. If you are planning to install Hadoop but do not know enough about how to install Hadoop in Debian 10, you can easily do so with our step-by-step tutorials.
You might like it
Linux Tutorials
How to Create Ruby app in cPanel by Cloudlinux
Kali Linux Tutorials
How to Update Firefox Inside Kali Linux Virtual box
Linux Tutorials
How to Connect Studio 3T MongoDB to local DB