Configure hadoop cluster using ansible

Dhiraj Kumar
3 min readJan 9, 2021

In this post, i have created automation using ansible.I have created a playbook for configure hadoop cluster

Hadoop: Hadoop is an opensource software , which is used to manage Big data . Hadoop works master-slave cluster.

In hadoop cluster Name node and Data node are avilable , Data node share storage power to name node.

For learn more about ansible got to https://firsttalk26.medium.com/ansible-and-use-case-solved-using-ansible-1d90b18fafea

First of all need to configure ansible controller node and target node cluster. For this go to https://firsttalk26.medium.com/configure-docker-and-launch-container-using-ansible-93c8b708d20a

After configure ansible need to create playbook

Ansible Playbook:

- hosts: namenode
tasks:
- copy:
src: "jdk-8u171-linux-x64.rpm"
dest: "/root/"
- copy:
src: "hadoop-1.2.1-1.x86_64.rpm"
dest: "/root/"
- shell: java -version 2>&1 | grep version | awk '{print $3}' | sed 's/"//g'
register: java_version
- command: "rpm -ivh jdk-8u171-linux-x64.rpm"
when: java_version.stdout != "1.8.0_171"
- command: "hadoop version"
register: hadoop_version
- command: "rpm -ivh hadoop-1.2.1-1.x86_64.rpm"
when: hadoop_version.stdout_lines[0] != "Hadoop 1.2.1"
- template:
src: "name_node/core-site.xml.j2"
dest: "/etc/hadoop/core-site.xml"
- template:
src: "name_node/hdfs-site.xml.j2"
dest: "/etc/hadoop/hdfs-site.xml"
- command: "ls -l /nn"
register: folder_status
ignore_errors: yes
- file:
path: "/nn"
state: directory
when: folder_status.rc != 0
- command: "ls -l /nn/current/VERSION"
register: version_status
ignore_errors: yes
- shell: "echo Y | hadoop namenode -format"
when: version_status.rc != 0
- shell: "jps | grep NameNode"
register: namenode_status
ignore_errors: yes
- command: "hadoop-daemon.sh start namenode"
when: namenode_status.rc != 0
- hosts: datanode
tasks:
- copy:
src: "jdk-8u171-linux-x64.rpm"
dest: "/root/"
- copy:
src: "hadoop-1.2.1-1.x86_64.rpm"
dest: "/root/"
- shell: java -version 2>&1 | grep version | awk '{print $3}' | sed 's/"//g'
register: java_version
- command: "rpm -ivh jdk-8u171-linux-x64.rpm"
when: java_version.stdout != "1.8.0_171"
- command: "hadoop version"
register: hadoop_version
ignore_errors: yes
- debug:
var: hadoop_version
- command: "rpm -ivh hadoop-1.2.1-1.x86_64.rpm --force"
when: hadoop_version.rc != 0
- template:
src: "data_node/core-site.xml.j2"
dest: "/etc/hadoop/core-site.xml"
- template:
src: "data_node/hdfs-site.xml.j2"
dest: "/etc/hadoop/hdfs-site.xml"
- command: "ls -l /dn"
register: folder_status
ignore_errors: yes
- file:
path: "/dn"
state: directory
when: folder_status.rc != 0
- shell: "jps | grep DataNode"
register: datanode_status
ignore_errors: yes
- command: "hadoop-daemon.sh start datanode"
when: datanode_status.rc != 0

For configure namenode some file is used use below github repository for get full code for configure hadoop using ansible

https://github.com/firsttalk26/task11.git

After creating playbook run below command for do task in target node

ansible-playbook hadoop.yml

--

--

Dhiraj Kumar

Expertise on Cloud Computing who has helped many startups to reduce cloud cost upto 40% based on business need. Focused to optimize development process.