We provide best Hadoop training in Bangalore.Take Hadoop online or in person at RelQSoft. Our instructors are industry experts and they carry vast experience in Hadoop and Big Data

Course Curriculum

Hadoop Administration

Master Your Big Data With Hadoop

Course Outline

1.    Introduction to Big Data

  1. What is Big Data?
  2. Why Big Data?
  3. The Three V’s of Big Data

2.    Understanding Hadoop

  1. What is Hadoop?
  2. Structured Data Vs Unstructured Data
  3. Relational Databases Vs. Hadoop

3.    The Hadoop Distributed File system (HDFS)

  1. What is HDFS?
  2. HDFS components
  3. Understanding Block Storage
  4. Reading and Writing Files in HDFS
  5. Hadoop Daemons(Namenode, Datanode, Resource Manager, Node Manager etc)
  6. HDFS Commands
  7. HDFS File Permissions

4.    The MapReduce Framework

  1. Overview of MapReduce
  2. Understanding MapReduce
  3. The Map Phase
  4. The Reduce Phase
  5. WordCount in MapReduce
  6. Running MapReduce Job

5.    Planning Your Hadoop Cluster

  1. Single Node Cluster Configuration
  2. Multi-Node Cluster Configuration
  3. Setup Cluster in High Availability


6.   Cluster Maintenance

  1. Checking HDFS Status
  2. Copying Data Between Clusters
  3. Adding and Removing Cluster Nodes
  4. Rebalancing the cluster
  5. Namenode Metabata Backup
  6. Cluster Upgrade

7.    Installing and Managing Hadoop Ecosystem Projects

  1. Spark
  2. Tez
  3. Hive
  4. Presto
  5. Oozie
  6. Sqoop
  7. Flume

8.    Understanding and Installing Different Flavors of Hadoop

  1. Cloudera Distribution
  2. Hortonworks Distribution
  3. MapR Distribution

9.    Managing and Scheduling Jobs

  1. Managing Jobs
  2. The FIFO Scheduler
  3. The Fair Scheduler
  4. Capacity Scheduler

10.   Cluster Monitoring, Troubleshooting and Optimization

  1. General system conditions to monitor
  2. Namenode and Resource Manager Web UIs
  3. View and manage Hadoop’s log files
  4. Ganglia Monitoring Tool
  5. Common cluster issues and their resolutions
  6. Benchmarking your cluster’s performance

11.   Populating HDFS from External Sources

  1. How to use Sqoop to import data from RDBMS to HDFS
  2. How to gather logs from multiple systems using Flume
  3. Features of Hive, Presto and Spark
  1. How to populate HDFS from external Sources

Hadoop  DEVELOPER & ADMIN With Cassandra & Impala 

  1. Understanding Big Data and Hadoop 4hrs

Learning Objectives – In this module, you will understand Big Data, the limitations of the existing solutions for Big Data problem, how Hadoop solves the Big Data problem, the common Hadoop ecosystem components, Hadoop Architecture, HDFS, Anatomy of File Write and Read, how MapReduce Framework works. 


Topics – Big Data, Limitations and Solutions of existing Data Analytics Architecture, Hadoop, Hadoop Features, Hadoop Ecosystem, Hadoop 2.x core components, Hadoop Storage: HDFS, Hadoop Processing: MapReduce Framework, Hadoop Different Distributions.

  1. Hadoop Architecture and HDFS 6hrs  Hands On for Cluster Setup

Learning Objectives – In this module, you will learn the Hadoop Cluster Architecture, Important Configuration files in a Hadoop Cluster, Data Loading Techniques, how to setup single node and multi node hadoop cluster. 


Topics – Hadoop 2.x Cluster Architecture – Federation and High Availability, A Typical Production Hadoop Cluster, Hadoop Cluster Modes, Common Hadoop Shell Commands, Hadoop 2.x Configuration Files, Single node cluster set up Hadoop Administration.


  1. Hadoop MapReduce Framework : 6hrs Lab

Learning Objectives – In this module, you will understand Hadoop MapReduce framework and the working of MapReduce on data stored in HDFS. You will understand concepts like Input Splits in MapReduce, Combiner & Partitioner and Demos on MapReduce using different data sets. 


Topics – MapReduce Use Cases, Traditional way Vs MapReduce way, Why MapReduce, Hadoop 2.x MapReduce Architecture, Hadoop 2.x MapReduce Components, YARN MR Application Execution Flow, YARN Workflow, Anatomy of MapReduce Program, Demo on MapReduce. Input Splits, Relation between Input Splits and HDFS Blocks, MapReduce: Combiner & Partitioner.


  1. Pig : 8hrs (6hrs ) Lab

Learning Objectives – In this module, you will learn Pig, types of use case we can use Pig, tight coupling between Pig and MapReduce, and Pig Latin scripting, PIG running modes, PIG UDF, Pig Streaming, Testing PIG Scripts. Demo on healthcare dataset. 


Topics – About Pig, MapReduce Vs Pig, Pig Use Cases, Programming Structure in Pig, Pig Running Modes, Pig components, Pig Execution, Pig Latin Program, Data Models in Pig, Pig Data Types, Shell and Utility Commands, Pig Latin : Relational Operators, File Loaders, Group Operator, COGROUP Operator, Joins and COGROUP, Union, Diagnostic Operators, Specialized joins in Pig, Built In Functions ( Eval Function, Load and Store Functions, Math function, String Function, Date Function, Pig UDF, Piggybank.


  1. Hive : 8hrs Lab

Learning Objectives – This module will help you in understanding Hive concepts, Hive Data types, Loading and Querying Data in Hive, running hive scripts and Hive UDF. 


Topics – Hive Background, Hive Use Case, About Hive, Hive Vs Pig, Hive Architecture and Components, Metastore in Hive, Limitations of Hive, Comparison with Traditional Database, Hive Data Types and Data Models, Partitions and Buckets, Hive Tables(Managed Tables and External Tables), Importing Data, Querying Data, Managing Outputs, Hive Script, Hive UDF, Retail use case in Hive, Hive Demo on Healthcare Data set. Advanced Hive concepts such as UDF, Dynamic Partitioning



Apache SQOOP : 2hrs

  • Introduction to Sqoop
  • MySQL client and  Server Installation Sqoop Installation
  • How to connect to Relational Database using Sqoop Sqoop Commands and Examples on Import and Export commands

Apache FLUME : 2HRS


  • Introduction to flume Flume installation


  • Flume agent usage and Flume examples execution


Apache OOZIE : 1HRS

  • Introduction to oozie Oozie installation

o   Executing oozie workflow jobs Monitering Oozie workflow jobs


  • Introduction to Zookeeper
  • Configuring Zookeeper
  • what is the role of zookeeper
  • One use case with zookeeper








o   Introduction

o   Quick Start – Standalone HBase


ü  Apache HBase Configuration

o   Configuration Files

o   Basic Prerequisites

o   HBase run modes: Standalone and Distributed

o   Running and Confirming Your Installation

o    Default Configuration

o   Example Configurations

o   The Important Configurations

o   Dynamic Configuration


ü  Data Model

o    Conceptual View

o   Physical View

o   Namespace

o   Table

o   Row

o   Column Family

o   Cells

o   Data Model Operations

o   Versions

o   Sort Order

o   Column Metadata

o   Joins

o   3ACID


ü  HBase and Schema Design


o    Schema Creation

o   Table Schema Rules Of Thumb

ü  RegionServer Sizing Rules of Thumb

o    On the number of column families

o   Rowkey Design

o   Number of Versions

o    Supported Datatypes

o    Joins

o   Time To Live (TTL)

o    Keeping Deleted Cells

o   Secondary Indexes and Alternate Query Paths

o    Constraints

o   Schema Design Case Studies

o   Operational and Performance Configuration Options

o    Special Cases





ü  HBase and MapReduce


o   HBase, MapReduce, and the CLASSPATH

o   MapReduce Scan Caching

o   Bundled HBase MapReduce Jobs

o    HBase as a MapReduce Job Data Source and Data Sink

o    Writing HFiles Directly During Bulk Import

o    RowCounter Example

o   Map-Task Splitting

o   HBase MapReduce Examples

o   Accessing Other HBase Tables in a MapReduce Job

o    Speculative Execution

o    Cascading


ü  Securing Apache HBase


o    Using Secure HTTP (HTTPS) for the Web UI

o    Using SPNEGO for Kerberos authentication with Web UIs

o    Secure Client Access to Apache HBase

o   Simple User Access to Apache HBase

o   Securing Access to HDFS and ZooKeeper

o   Securing Access To Your Data

o   Security Configuration Example


ü  Architecture

o    Overview

o    Catalog Tables

o   Client

o   Client Request Filters

o   Master

o    RegionServer

o   Regions

o    Bulk Loading

o   HDFS

o   Timeline-consistent High Available Reads

o   Storing Medium-sized Objects (MOB)

ü  Apache HBase APIs

o   Examples

o    Bulk Load