Hadoop Syllabus

APTRON Noida

HADOOP COURSE CONTENT

Hadoop Course Content

Overview Hadoop, Architecture Considerations, Infrastructure, Platforms and Automation

Use case walkthrough

ETL
Log Analytics
Real Time Analytics

HBase for Developers:

NoSQL Introduction

Traditional RDBMS approach
NoSQL introduction
Hadoop & HBase positioning

HBase Introduction

What it is, history and common use-cases
HBase Client – Shell, exercise

HBase Architecture

Building Components
Storage, B+ tree, Log Structured Merge Trees
Region Lifecycle
Read/Write Path

HBase Schema Design

Introduction to HBase schema
Column Family, Rows, Cells, Cell timestamp
Deletes
Exercise - schema, data-load , query data

HBase Java API – Exercises

Connection
CRUD API
Scan API
Filters
Counters
HBase MapReduce
HBase Bulk load

HBase Operations, cluster management

Performance Tuning
Advanced Features
Exercise
Recap and Q&A

MapReduce for Developers

Introduction

Traditional Systems / Why Big Data / Why Hadoop
Hadoop Basic Concepts/Fundamentals

Hadoop in the Enterprise

Where Hadoop Fits in the Enterprise
Review Use Cases

Architecture

Hadoop Architecture & Building Blocks
HDFS and MapReduce

Hadoop CLI

Walkthrough
Exercise

MapReduce Programming

Fundamentals
Anatomy of MapReduce Job Run
Job Monitoring, Scheduling
Sample Code Walk Through
Hadoop API Walk Through
Exercise

MapReduce Formats

Input Formats, Exercise
Output Formats, Exercise

Hadoop File Formats

MapReduce Design Considerations

Hadoop File Formats

MapReduce Algorithms

Walkthrough of 2-3 Algorithms

MapReduce Features

Counters, Exercise
Map Side Join, Exercise
Reduce Side Join, Exercise
Sorting, Exercise

Use Case A (Long Exercise)

Input Formats, Exercise
Output Formats, Exercise

MapReduce Testing

Hadoop Ecosystem

Oozie
Flume
Sqoop
Exercise 1 (Sqoop)
Streaming API
Exercise 2 (Streaming API)
Hcatalog
Zookeeper

HBase Introduction

Introduction
HBase Architecture

VIEW Types

Default Views
Overridden Views
Normal Views

MapReduce Performance Tuning

Development Best Practice and Debugging

Apache Hadoop for Administrators

Hadoop Fundamentals and Architecture

Why Hadoop, Hadoop Basics and Hadoop Architecture
HDFS and Map Reduce

Hadoop Ecosystems Overview

Hive
Hbase
ZooKeeper
Pig
Mahout
Flume
Sqoop
Oozie

Hardware and Software requirements

Hardware, Operating System and Other Software
Management Console

Deploy Hadoop ecosystem services

Hive
ZooKeeper
HBase
Administration
Pig
Mahout
MySQL
Setup Security

Enable Security – Configure Users, Groups, Secure HDFS, MapReduce, HBase and Hive

Configuring User and Groups
Configuring Secure HDFS
Configuring Secure MapReduce
Configuring Secure HBase and Hive

Manage and Monitor your cluster

Command Line Interface

Troubleshooting your cluster

Introduction to Big Data and Hadoop

Hadoop Overview

Why Hadoop
Hadoop Basic Concepts
Hadoop Ecosystem – MapReduce, Hadoop Streaming, Hive, Pig, Flume, Sqoop, Hbase, Oozie, Mahout
Where Hadoop fits in the Enterprise
Review use cases

Apache Hive & Pig for Developers

Overview of Hadoop

Why Hadoop
Hadoop Basic Concepts
Hadoop Ecosystem – MapReduce, Hadoop Streaming, Hive, Pig, Flume, Sqoop, Hbase, Oozie, Mahout
Where Hadoop fits in the Enterprise
Review use cases

Overview of Hadoop

Big Data and the Distributed File System
MapReduce

Hive Introduction

Why Hive?
Compare vs SQL
Use Cases

Hive Architecture – Building Blocks

Hive CLI and Language (Exercise)
HDFS Shell
Hive CLI
Data Types
Hive Cheat-Sheet
Data Definition Statements
Data Manipulation Statements
Select, Views, Group By, Sort By/Distribute By/Cluster By/Order By, Joins
Built-in Functions
Union, Sub Queries, Sampling, Explain

Hive Architecture – Building Blocks

Hive CLI and Language (Exercise)
HDFS Shell
Hive CLI
Data Types
Hive Cheat-Sheet
Data Definition Statements
Data Manipulation Statements
Select, Views, Group By, Sort By/Distribute By/Cluster By/Order By, Joins
Built-in Functions
Union, Sub Queries, Sampling, Explain

Hive Architecture – Building Blocks

Hive CLI and Language (Exercise)
HDFS Shell
Hive CLI
Data Types
Hive Cheat-Sheet
Data Definition Statements
Data Manipulation Statements
Select, Views, Group By, Sort By/Distribute By/Cluster By/Order By, Joins
Built-in Functions
Union, Sub Queries, Sampling, Explain

Hive Usecase implementation -(Exercise)

Use Case 1
Use Case 2
Best Practices

Advance Features

Transform and Map-Reduce Scripts
Custom UDF
UDTF
SerDe
Recap and Q&A

Pig Introduction

Position Pig in Hadoop ecosystem
Why Pig and not MapReduce
Simple example (slides) comparing Pig and MapReduce
Who is using Pig now and what are the main use cases
Pig Architecture
Discuss high level components of Pig
Pig Grunt - How to Start and Use

Pig Latin Programming

Data Types
Cheat sheet
Schema
Expressions
Commands and Exercise
Load, Dump, Relational-Operations, Store, Foreach, Group, Order-By, Distinct, Filter, Join, Cogroup, Union, Cross, Limit, Sample, Parallel

Use Cases (working exercise)

Use Case 1
Use Case 2
Use Case 3 (compare pig and hive)

Advanced Features, UDFs

Best Practices and common pitfalls

Mahout & Machine Learning

Mahout Overview
Mahout Installation
Introduction to the Math Library
Vector implementation and Operations (Hands-on exercise)
Matrix Implementation and Operations (Hands-on exercise)
Anatomy of a Machine Learning Application

Classification

Introduction to Classification
Classification Workflow
Feature Extraction
Classification Techniques (Hands-on exercise)

Evaluation (Hands-on exercise)

Clustering
Use Cases
Clustering algorithms in Mahout
K-means clustering (Hands-on exercise)
Canopy clustering (Hands-on exercise)

Clustering

Mixture Models
Probabilistic Clustering – Dirichlet (Hands-on exercise)
Latent Dirichlet Model (Hands-on exercise)
Evaluating and Improving Clustering quality (Hands-on exercise)
Distance Measures (Hands-on exercise)

Recommendation Systems

Overview of Recommendation Systems
Use cases
Types of Recommendation Systems
Collaborative Filtering (Hands-on exercise)
Recommendation System Evaluation (Hands-on exercise)
Similarity Measures
Architecture of Recommendation Systems
Wrap Up

Contact us:

+91-7065273000
info@aptronnoida.in

B-10, SECTOR-2

NEAR SECTOR - 15 METRO STATION

NOIDA - 201301, U.P. (INDIA)