Hadoop has become a foundational tool in the big data ecosystem, widely used for distributed storage and processing of large datasets. This article walks through the installation and configuration of Hadoop 3.4.1 and HDFS on a Kali Linux system. Designed for students and professionals alike, the steps below outline a hands-on, local setup using a non-root user—ideal for experimentation, learning, and lightweight development.
Understanding the Basics
Hadoop is an open-source framework that allows distributed processing of large data sets across clusters of computers using simple programming models. Its core modules include:
- HDFS (Hadoop Distributed File System): For scalable, fault-tolerant storage
- MapReduce: A computation model for parallel processing
- YARN: Yet Another Resource Negotiator, used for cluster resource management
Before diving into setup, it's essential to understand the significance of each component and why configuration matters. Setting up Hadoop locally on Kali Linux (or any Unix-based OS) provides valuable insights into its core mechanisms and how real-world clusters are managed.
#Prerequisites
Before diving into Hadoop, ensure your system is ready:
#1. Update System & Install Java
Hadoop requires Java. We'll use OpenJDK 8.
1sudo apt update2sudo apt install openjdk-8-jdk -y3java -version; javac -version#2. Install SSH
SSH is used by Hadoop daemons for communication.
1sudo apt install openssh-server openssh-client -y#3. Create a Hadoop User
Avoid using root. Create a dedicated user:
1sudo adduser hdoop2sudo adduser hdoop sudo3su - hdoop#4. Setup Passwordless SSH
This enables internal Hadoop communication.
1ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa2cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys3chmod 0600 ~/.ssh/authorized_keys4ssh localhost#Download and Extract Hadoop
1wget https://downloads.apache.org/hadoop/common/hadoop-3.4.1/hadoop-3.4.1.tar.gz2tar xzf hadoop-3.4.1.tar.gz#Configure Environment Variables
#Edit
1sudo nano .bashrcAdd these lines at the end of code:
1export HADOOP_HOME=/home/hdoop/hadoop-3.4.12export HADOOP_INSTALL=$HADOOP_HOME3export HADOOP_MAPRED_HOME=$HADOOP_HOME4export HADOOP_COMMON_HOME=$HADOOP_HOME5export HADOOP_HDFS_HOME=$HADOOP_HOME6export YARN_HOME=$HADOOP_HOME7export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native8export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin9export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
Then reload the file:
1source ~/.bashrc#Configure Hadoop Core Files
#Edit hadoop-env.sh
1nano $HADOOP_HOME/etc/hadoop/hadoop-env.shAdd this line:
1export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64#Edit core-site.xml
1nano $HADOOP_HOME/etc/hadoop/core-site.xml1<configuration>2 <property>3 <name>hadoop.tmp.dir</name>4 <value>/home/hdoop/tmpdata</value>5 </property>6 <property>7 <name>fs.default.name</name>8 <value>hdfs://localhost:9000</value>9 </property>10</configuration>#Edit hdfs-site.xml
1nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml1<configuration>2 <property>3 <name>dfs.name.dir</name>4 <value>/home/hdoop/dfsdata/namenode</value>5 </property>6 <property>7 <name>dfs.data.dir</name>8 <value>/home/hdoop/dfsdata/datanode</value>9 </property>10 <property>11 <name>dfs.replication</name>12 <value>1</value>13 </property>14</configuration>#Edit mapred-site.xml
1cp mapred-site.xml.template mapred-site.xml2nano $HADOOP_HOME/etc/hadoop/mapred-site.xml1<configuration>2 <property>3 <name>mapreduce.framework.name</name>4 <value>yarn</value>5 </property>6</configuration>#Edit yarn-site.xml
1nano $HADOOP_HOME/etc/hadoop/yarn-site.xml1<configuration>2 <property>3 <name>yarn.nodemanager.aux-services</name>4 <value>mapreduce_shuffle</value>5 </property>6 <property>7 <name>yarn.resourcemanager.hostname</name>8 <value>127.0.0.1</value>9 </property>10</configuration>#Start Hadoop Services
#Format the NameNode
1hdfs namenode -format#Start HDFS
1start-dfs.sh#Start YARN
1start-yarn.sh#Check Running Processes
1jpsYou should see processes like

By following the steps in this guide, you've successfully installed and configured Hadoop and HDFS in a standalone environment. This local setup is perfect for learning Hadoop's core components like MapReduce, HDFS, and YARN and for running small test jobs.
Whether you're preparing for a data engineering role, experimenting with distributed computing, or just curious about big data infrastructure, this Kali Linux-based Hadoop setup offers a controlled, educational sandbox.
From here, you can:
- Write MapReduce programs in Java or Python
- Explore data ingestion tools like Apache Sqoop or Flume
- Connect Hadoop to Hive or Spark for advanced analytics
This practical foundation prepares you to take on larger-scale deployments in the cloud or on actual clusters with confidence.