Hadoop Course Content - Overview Hadoop, Architecture Considerations, Infrastructure, Platforms and Automation
Use case walkthrough - ETL
- Log Analytics
- Real Time Analytics
HBase for Developers:
NoSQL Introduction - Traditional RDBMS approach
- NoSQL introduction
- Hadoop & HBase positioning
HBase Introduction - What it is, history and common use-cases
- HBase Client – Shell, exercise
HBase Architecture - Building Components
- Storage, B+ tree, Log Structured Merge Trees
- Region Lifecycle
- Read/Write Path
HBase Schema Design - Introduction to HBase schema
- Column Family, Rows, Cells, Cell timestamp
- Deletes
- Exercise - schema, data-load , query data
HBase Java API – Exercises - Connection
- CRUD API
- Scan API
- Filters
- Counters
- HBase MapReduce
- HBase Bulk load
HBase Operations, cluster management - Performance Tuning
- Advanced Features
- Exercise
- Recap and Q&A
MapReduce for Developers
Introduction - Traditional Systems / Why Big Data / Why Hadoop
- Hadoop Basic Concepts/Fundamentals
Hadoop in the Enterprise - Where Hadoop Fits in the Enterprise
- Review Use Cases
Architecture - Hadoop Architecture & Building Blocks
- HDFS and MapReduce
Hadoop CLI - Walkthrough
- Exercise
MapReduce Programming - Fundamentals
- Anatomy of MapReduce Job Run
- Job Monitoring, Scheduling
- Sample Code Walk Through
- Hadoop API Walk Through
- Exercise
MapReduce Formats
- Input Formats, Exercise
- Output Formats, Exercise
Hadoop File Formats MapReduce Design Considerations
Hadoop File Formats
MapReduce Algorithms
- Walkthrough of 2-3 Algorithms
MapReduce Features - Counters, Exercise
- Map Side Join, Exercise
- Reduce Side Join, Exercise
- Sorting, Exercise
Use Case A (Long Exercise) - Input Formats, Exercise
- Output Formats, Exercise
MapReduce Testing
Hadoop Ecosystem - Oozie
- Flume
- Sqoop
- Exercise 1 (Sqoop)
- Streaming API
- Exercise 2 (Streaming API)
- Hcatalog
- Zookeeper
HBase Introduction - Introduction
- HBase Architecture
VIEW Types
- Default Views
- Overridden Views
- Normal Views
MapReduce Performance Tuning
Development Best Practice and Debugging
Apache Hadoop for Administrators
Hadoop Fundamentals and Architecture
- Why Hadoop, Hadoop Basics and Hadoop Architecture
- HDFS and Map Reduce
Hadoop Ecosystems Overview
- Hive
- Hbase
- ZooKeeper
- Pig
- Mahout
- Flume
- Sqoop
- Oozie
Hardware and Software requirements - Hardware, Operating System and Other Software
- Management Console
Deploy Hadoop ecosystem services - Hive
- ZooKeeper
- HBase
- Administration
- Pig
- Mahout
- MySQL
- Setup Security
Enable Security – Configure Users, Groups, Secure HDFS, MapReduce, HBase and Hive
- Configuring User and Groups
- Configuring Secure HDFS
- Configuring Secure MapReduce
- Configuring Secure HBase and Hive
Manage and Monitor your cluster
Command Line Interface
Troubleshooting your cluster
Introduction to Big Data and Hadoop Hadoop Overview - Why Hadoop
- Hadoop Basic Concepts
- Hadoop Ecosystem – MapReduce, Hadoop Streaming, Hive, Pig, Flume, Sqoop, Hbase, Oozie, Mahout
- Where Hadoop fits in the Enterprise
- Review use cases
Apache Hive & Pig for Developers
Overview of Hadoop - Why Hadoop
- Hadoop Basic Concepts
- Hadoop Ecosystem – MapReduce, Hadoop Streaming, Hive, Pig, Flume, Sqoop, Hbase, Oozie, Mahout
- Where Hadoop fits in the Enterprise
- Review use cases
Overview of Hadoop
- Big Data and the Distributed File System
- MapReduce
Hive Introduction - Why Hive?
- Compare vs SQL
- Use Cases
Hive Architecture – Building Blocks - Hive CLI and Language (Exercise)
- HDFS Shell
- Hive CLI
- Data Types
- Hive Cheat-Sheet
- Data Definition Statements
- Data Manipulation Statements
- Select, Views, Group By, Sort By/Distribute By/Cluster By/Order By, Joins
- Built-in Functions
- Union, Sub Queries, Sampling, Explain
Hive Architecture – Building Blocks - Hive CLI and Language (Exercise)
- HDFS Shell
- Hive CLI
- Data Types
- Hive Cheat-Sheet
- Data Definition Statements
- Data Manipulation Statements
- Select, Views, Group By, Sort By/Distribute By/Cluster By/Order By, Joins
- Built-in Functions
- Union, Sub Queries, Sampling, Explain
Hive Architecture – Building Blocks - Hive CLI and Language (Exercise)
- HDFS Shell
- Hive CLI
- Data Types
- Hive Cheat-Sheet
- Data Definition Statements
- Data Manipulation Statements
- Select, Views, Group By, Sort By/Distribute By/Cluster By/Order By, Joins
- Built-in Functions
- Union, Sub Queries, Sampling, Explain
Hive Usecase implementation -(Exercise) - Use Case 1
- Use Case 2
- Best Practices
Advance Features - Transform and Map-Reduce Scripts
- Custom UDF
- UDTF
- SerDe
- Recap and Q&A
Pig Introduction - Position Pig in Hadoop ecosystem
- Why Pig and not MapReduce
- Simple example (slides) comparing Pig and MapReduce
- Who is using Pig now and what are the main use cases
- Pig Architecture
- Discuss high level components of Pig
- Pig Grunt - How to Start and Use
Pig Latin Programming - Data Types
- Cheat sheet
- Schema
- Expressions
- Commands and Exercise
- Load, Dump, Relational-Operations, Store, Foreach, Group, Order-By, Distinct, Filter, Join, Cogroup, Union, Cross, Limit, Sample, Parallel
Use Cases (working exercise) - Use Case 1
- Use Case 2
- Use Case 3 (compare pig and hive)
Advanced Features, UDFs Best Practices and common pitfalls
Mahout & Machine Learning - Mahout Overview
- Mahout Installation
- Introduction to the Math Library
- Vector implementation and Operations (Hands-on exercise)
- Matrix Implementation and Operations (Hands-on exercise)
- Anatomy of a Machine Learning Application
Classification - Introduction to Classification
- Classification Workflow
- Feature Extraction
- Classification Techniques (Hands-on exercise)
Evaluation (Hands-on exercise) - Clustering
- Use Cases
- Clustering algorithms in Mahout
- K-means clustering (Hands-on exercise)
- Canopy clustering (Hands-on exercise)
Clustering - Mixture Models
- Probabilistic Clustering – Dirichlet (Hands-on exercise)
- Latent Dirichlet Model (Hands-on exercise)
- Evaluating and Improving Clustering quality (Hands-on exercise)
- Distance Measures (Hands-on exercise)
Recommendation Systems - Overview of Recommendation Systems
- Use cases
- Types of Recommendation Systems
- Collaborative Filtering (Hands-on exercise)
- Recommendation System Evaluation (Hands-on exercise)
- Similarity Measures
- Architecture of Recommendation Systems
- Wrap Up