Big Data

by Mahalakshmi
0 Lessons
0 Students

Module 1 Introduction to Big Data


Why, How and What s of Big data


Module 2 Module 2

Introduction to Hadoop Ecosystem


Sharding , Distributed and Replication factor (SDR)


Map reduce (MRV1) and Yarn

Hadoop v1 and v2

Hadoop Data federation

Module 3 Prerequisite for Installation

Single node , Pseudo distributed and Multinode cluster

Virtual machine using Linux ubuntu/CentOS

Installation and configuration of Hadoop, HDFS, Daemons, YARN Daemons

High Availability (Active and Standby)

Automatic and manual failover

Hadoop Fs shell commands

Writing Data to HDFS

Reading Data from DFS

Module 4 Module 4

Rack awareness policy and Replica placement Strategy

Failure Handling



Block-Safe mode

Rebalancing and load optimization

Trouble shooting and error rectification

Hadoop fs shell command

Module 5 Introduction to Mapreduce

Architecture of Map reduce

Execution Map reduce in YARN

App Master ,Resource Manager and Node manager

Input format , Input split and Key Value Pairs

class and methods of Mapreduce paradigm




Custom and Default partition

Shuffle and Sort


App Master /manager

Container-Node manager

Module 6 Map reduce Hands on

word count program/ log analytics

Hadoop streaming in R/Python

Data processing Transformations

Map only jobs and uber jobs

Inverted index and searches

Module 7 Structured and Unstructured Data handling

optimizing using Combiner


Custom partition and default partition

Module 8 Introduction to Hive Data warehouse

Installation hive and metastore database

Configure metastore to mysql

Creation of hive table

Different ways of loading data to hive

Hive QL Commands

Data transformations: joins,filter and others

Module 9 Manipulation and analytical function in hive

Managed table and external tables

Partitioning and Bucketing

Complex data types and Unstructured data

Advance HQL commands


Integration with Hbase

Module 10 SerDe / Regular Expression

File formats

JSON , AVRO file conversion

Parquet compressed file to uncompressed

AVRO schema and data file

ORC file

Module 11 Ingest data from RDB

Introduction to Sqoop and installation

Import and export data from and to RDB

Bulk loading , Incremental load , Split by , Conditional query

Sqoop validation and sqoop jobs

Data ingestion into hive

Data ingestion to Hbase

Different file formats

Module 12 Ingest streaming data

Flume Architecture

Agent ,Source,sink channel

Ingest log file

Collecting data from twitter for Sentimental analysis

Module 13 Spark core and Components

Spark Shell

Create RDD from HDFS /Local

Creating new RDD-Transformations on RDD

Lineage Graph – DAG

Actions on RDD

Different resource management

Spark-shell scala REPL


Monitoring jobs

Module 14 Scala/Spark Functional Programming

Using Function Literals

Anonymous Functions

Define a function which accepts another function

Spark Loading and Saving Your Data


CSV and TSV files

JSON Files


Spark jobs

Build scala program using SBT /Maven

Spark submit and spark Application

Module 15 RDD Transformation Programming in Depth

Hands on and core concepts of map() transformation

Hands on and core concepts of filter() transformation

Hands on and core concepts of flatMap() transformation

Compare map and flatMap transformation


Apache Spark in Action

Hands on and core concepts of reduce() action

Hands on and core concepts of fold() action

Hands on and core concepts of aggregate() action

Basics of Accumulator-Hands on and core concepts of collect() action

Hands on and core concepts of take() action

Ordered access of RDD

Module 16 Creating DataframeData Frames & Datasets

Creating Dataframe

Interoperating with RDDs

JSON and Parquet File Formats

Loading Data through Different Source

RDD to DF and DF.RDD

Dataframe operations(Dataset)

Module 17 Need for Spark SQL

What is Spark SQL?

Spark SQL Architecture

SQL Context in Spark SQL

Module 18 Spark Streaming Overview

Streaming data collections from different sources

Other Streaming Operations

Sliding Window Operation

Developing Spark Streaming Applications

Kafka integration

Module 19 Introduction to NOSQL

ACID vs CAP theorem/BASE

Schema design

Introduction to HBASE and installation

The HBase Data Model

The HBase Shell

HBase Architecture

Schema Design

Module 20 The HBase API

HBase Configuration and Tuning

Hive and hbase integration

Loading data using sqoop

Time to live



Module 21 Hue web interface

HIVE,PIG editors

Oozie scheduler



configuration files and monitoring

Module 22 Kafka

Producer ,consumer and topics

Flume with kafka

Kafka topic with spark streaming

Module 23 Hadoop distribution

Cloudera components

Horton works components




Module 24 Zeppelin notebook


Cloudera manager


Module 25 AWS and Azure in Bigdata

S3 or Azure Blob storage

components and usage

Module 26 Talend Bigdata edition

ETL Tool integration

Data analytics using tableau

connecting with Hadoop Hiveserver

Interactive visualization

Module 27 Cloudera spark Hadoop Developer certification

Horton works certification

Guidance and mock

Module 28 Introduction to machine learning

Applying machine learning algorithm in Hadoop and spark MLlib

classification and clustering

Module 29 Case study 1: Sqoop , Hbase, Hive, spark , tableau


Module 30 Case study 2 : Kafka , spark streaming and hbase


Working Hours

  • Monday9am - 6pm
  • Tuesday9am - 6pm
  • Wednesday9am - 6pm
  • Thursday9am - 6pm
  • Friday9am - 6pm
  • SaturdayClosed
  • SundayClosed
Latest Posts

Big Data training Academy in chennai
data science course in chennai
Wanna attend a demo class?

We are glad that you preferred to schedule a demo class. Please fill our short form and one of our friendly team members will contact you back.


Demo Class