Module 1 | Introduction to Big Data
Characteristics Why, How and What s of Big data Existing OLTP, ETL,DWH,OLAP |
Module 2 | Module 2
Introduction to Hadoop Ecosystem Architecture-HDFS Sharding , Distributed and Replication factor (SDR) Daemons Map reduce (MRV1) and Yarn Hadoop v1 and v2 Hadoop Data federation |
Module 3 | Prerequisite for Installation
Single node , Pseudo distributed and Multinode cluster Virtual machine using Linux ubuntu/CentOS Installation and configuration of Hadoop, HDFS, Daemons, YARN Daemons High Availability (Active and Standby) Automatic and manual failover Hadoop Fs shell commands Writing Data to HDFS Reading Data from DFS |
Module 4 | Module 4
Rack awareness policy and Replica placement Strategy Failure Handling Namenode Datanode Block-Safe mode Rebalancing and load optimization Trouble shooting and error rectification Hadoop fs shell command |
Module 5 | Introduction to Mapreduce
Architecture of Map reduce Execution Map reduce in YARN App Master ,Resource Manager and Node manager Input format , Input split and Key Value Pairs class and methods of Mapreduce paradigm Mapper Reducer Partitioner Custom and Default partition Shuffle and Sort Combiner-Scheduler App Master /manager Container-Node manager |
Module 6 | Map reduce Hands on
word count program/ log analytics Hadoop streaming in R/Python Data processing Transformations Map only jobs and uber jobs Inverted index and searches |
Module 7 | Structured and Unstructured Data handling
optimizing using Combiner Partitioner Custom partition and default partition |
Module 8 | Introduction to Hive Data warehouse
Installation hive and metastore database Configure metastore to mysql Creation of hive table Different ways of loading data to hive Hive QL Commands Data transformations: joins,filter and others |
Module 9 | Manipulation and analytical function in hive
Managed table and external tables Partitioning and Bucketing Complex data types and Unstructured data Advance HQL commands UDF and UDAF Integration with Hbase |
Module 10 | SerDe / Regular Expression
File formats JSON , AVRO file conversion Parquet compressed file to uncompressed AVRO schema and data file ORC file |
Module 11 | Ingest data from RDB
Introduction to Sqoop and installation Import and export data from and to RDB Bulk loading , Incremental load , Split by , Conditional query Sqoop validation and sqoop jobs Data ingestion into hive Data ingestion to Hbase Different file formats |
Module 12 | Ingest streaming data
Flume Architecture Agent ,Source,sink channel Ingest log file Collecting data from twitter for Sentimental analysis |
Module 13 | Spark core and Components
Spark Shell Create RDD from HDFS /Local Creating new RDD-Transformations on RDD Lineage Graph – DAG Actions on RDD Different resource management Spark-shell scala REPL Pyspark Monitoring jobs |
Module 14 | Scala/Spark Functional Programming
Using Function Literals Anonymous Functions Define a function which accepts another function Spark Loading and Saving Your Data TextFiles CSV and TSV files JSON Files
Spark jobs Build scala program using SBT /Maven Spark submit and spark Application |
Module 15 | RDD Transformation Programming in Depth
Hands on and core concepts of map() transformation Hands on and core concepts of filter() transformation Hands on and core concepts of flatMap() transformation Compare map and flatMap transformation
Apache Spark in Action Hands on and core concepts of reduce() action Hands on and core concepts of fold() action Hands on and core concepts of aggregate() action Basics of Accumulator-Hands on and core concepts of collect() action Hands on and core concepts of take() action Ordered access of RDD |
Module 16 | Creating DataframeData Frames & Datasets
Creating Dataframe Interoperating with RDDs JSON and Parquet File Formats Loading Data through Different Source RDD to DF and DF.RDD Dataframe operations(Dataset) |
Module 17 | Need for Spark SQL
What is Spark SQL? Spark SQL Architecture SQL Context in Spark SQL |
Module 18 | Spark Streaming Overview
Streaming data collections from different sources Other Streaming Operations Sliding Window Operation Developing Spark Streaming Applications Kafka integration |
Module 19 | Introduction to NOSQL
ACID vs CAP theorem/BASE Schema design Introduction to HBASE and installation The HBase Data Model The HBase Shell HBase Architecture Schema Design |
Module 20 | The HBase API
HBase Configuration and Tuning Hive and hbase integration Loading data using sqoop Time to live compactions Tombstone |
Module 21 | Hue web interface
HIVE,PIG editors Oozie scheduler Coordinator Dashboard configuration files and monitoring |
Module 22 | Kafka
Producer ,consumer and topics Flume with kafka Kafka topic with spark streaming |
Module 23 | Hadoop distribution
Cloudera components Horton works components security Monitoring Dashboard |
Module 24 | Zeppelin notebook
Ambari Cloudera manager
|
Module 25 | AWS and Azure in Bigdata
S3 or Azure Blob storage components and usage |
Module 26 | Talend Bigdata edition
ETL Tool integration Data analytics using tableau connecting with Hadoop Hiveserver Interactive visualization |
Module 27 | Cloudera spark Hadoop Developer certification
Horton works certification Guidance and mock |
Module 28 | Introduction to machine learning
Applying machine learning algorithm in Hadoop and spark MLlib classification and clustering |
Module 29 | Case study 1: Sqoop , Hbase, Hive, spark , tableau
|
Module 30 | Case study 2 : Kafka , spark streaming and hbase
|