HADOOP COURSE CONTENT
Hadoop Course Content

  1. Overview Hadoop, Architecture Considerations, Infrastructure, Platforms and Automation
Use case walkthrough

  1. ETL
  2. Log Analytics
  3. Real Time Analytics
HBase for Developers:

NoSQL Introduction

  1. Traditional RDBMS approach
  2. NoSQL introduction
  3. Hadoop & HBase positioning
HBase Introduction

  1. What it is, history and common use-cases
  2. HBase Client – Shell, exercise
HBase Architecture

  1. Building Components
  2. Storage, B+ tree, Log Structured Merge Trees
  3. Region Lifecycle
  4. Read/Write Path
HBase Schema Design

  1. Introduction to HBase schema
  2. Column Family, Rows, Cells, Cell timestamp
  3. Deletes
  4. Exercise - schema, data-load , query data
HBase Java API – Exercises

  1. Connection
  2. CRUD API
  3. Scan API
  4. Filters
  5. Counters
  6. HBase MapReduce
  7. HBase Bulk load
HBase Operations, cluster management

  1. Performance Tuning
  2. Advanced Features
  3. Exercise
  4. Recap and Q&A
MapReduce for Developers

Introduction

  1. Traditional Systems / Why Big Data / Why Hadoop
  2. Hadoop Basic Concepts/Fundamentals
Hadoop in the Enterprise

  1. Where Hadoop Fits in the Enterprise
  2. Review Use Cases
Architecture

  1. Hadoop Architecture & Building Blocks
  2. HDFS and MapReduce
Hadoop CLI

  1. Walkthrough
  2. Exercise
MapReduce Programming

  1. Fundamentals
  2. Anatomy of MapReduce Job Run
  3. Job Monitoring, Scheduling
  4. Sample Code Walk Through
  5. Hadoop API Walk Through
  6. Exercise
MapReduce Formats

  1. Input Formats, Exercise
  2. Output Formats, Exercise
Hadoop File Formats

MapReduce Design Considerations

Hadoop File Formats

MapReduce Algorithms

  1. Walkthrough of 2-3 Algorithms
MapReduce Features

  1. Counters, Exercise
  2. Map Side Join, Exercise
  3. Reduce Side Join, Exercise
  4. Sorting, Exercise
Use Case A (Long Exercise)

  1. Input Formats, Exercise
  2. Output Formats, Exercise
MapReduce Testing

Hadoop Ecosystem

  1. Oozie
  2. Flume
  3. Sqoop
  4. Exercise 1 (Sqoop)
  5. Streaming API
  6. Exercise 2 (Streaming API)
  7. Hcatalog
  8. Zookeeper
HBase Introduction

  1. Introduction
  2. HBase Architecture
VIEW Types

  1. Default Views
  2. Overridden Views
  3. Normal Views
MapReduce Performance Tuning

Development Best Practice and Debugging

Apache Hadoop for Administrators

Hadoop Fundamentals and Architecture

  1. Why Hadoop, Hadoop Basics and Hadoop Architecture
  2. HDFS and Map Reduce
Hadoop Ecosystems Overview

  1. Hive
  2. Hbase
  3. ZooKeeper
  4. Pig
  5. Mahout
  6. Flume
  7. Sqoop
  8. Oozie
Hardware and Software requirements

  1. Hardware, Operating System and Other Software
  2. Management Console
Deploy Hadoop ecosystem services

  1. Hive
  2. ZooKeeper
  3. HBase
  4. Administration
  5. Pig
  6. Mahout
  7. MySQL
  8. Setup Security
Enable Security – Configure Users, Groups, Secure HDFS, MapReduce, HBase and Hive

  1. Configuring User and Groups
  2. Configuring Secure HDFS
  3. Configuring Secure MapReduce
  4. Configuring Secure HBase and Hive
Manage and Monitor your cluster

Command Line Interface

Troubleshooting your cluster

Introduction to Big Data and Hadoop

Hadoop Overview

  1. Why Hadoop
  2. Hadoop Basic Concepts
  3. Hadoop Ecosystem – MapReduce, Hadoop Streaming, Hive, Pig, Flume, Sqoop, Hbase, Oozie, Mahout
  4. Where Hadoop fits in the Enterprise
  5. Review use cases
Apache Hive & Pig for Developers

Overview of Hadoop

  1. Why Hadoop
  2. Hadoop Basic Concepts
  3. Hadoop Ecosystem – MapReduce, Hadoop Streaming, Hive, Pig, Flume, Sqoop, Hbase, Oozie, Mahout
  4. Where Hadoop fits in the Enterprise
  5. Review use cases
Overview of Hadoop

  1. Big Data and the Distributed File System
  2. MapReduce
Hive Introduction

  1. Why Hive?
  2. Compare vs SQL
  3. Use Cases
Hive Architecture – Building Blocks

  1. Hive CLI and Language (Exercise)
  2. HDFS Shell
  3. Hive CLI
  4. Data Types
  5. Hive Cheat-Sheet
  6. Data Definition Statements
  7. Data Manipulation Statements
  8. Select, Views, Group By, Sort By/Distribute By/Cluster By/Order By, Joins
  9. Built-in Functions
  10. Union, Sub Queries, Sampling, Explain
Hive Architecture – Building Blocks

  1. Hive CLI and Language (Exercise)
  2. HDFS Shell
  3. Hive CLI
  4. Data Types
  5. Hive Cheat-Sheet
  6. Data Definition Statements
  7. Data Manipulation Statements
  8. Select, Views, Group By, Sort By/Distribute By/Cluster By/Order By, Joins
  9. Built-in Functions
  10. Union, Sub Queries, Sampling, Explain
Hive Architecture – Building Blocks

  1. Hive CLI and Language (Exercise)
  2. HDFS Shell
  3. Hive CLI
  4. Data Types
  5. Hive Cheat-Sheet
  6. Data Definition Statements
  7. Data Manipulation Statements
  8. Select, Views, Group By, Sort By/Distribute By/Cluster By/Order By, Joins
  9. Built-in Functions
  10. Union, Sub Queries, Sampling, Explain
Hive Usecase implementation -(Exercise)

  1. Use Case 1
  2. Use Case 2
  3. Best Practices
Advance Features

  1. Transform and Map-Reduce Scripts
  2. Custom UDF
  3. UDTF
  4. SerDe
  5. Recap and Q&A
Pig Introduction

  1. Position Pig in Hadoop ecosystem
  2. Why Pig and not MapReduce
  3. Simple example (slides) comparing Pig and MapReduce
  4. Who is using Pig now and what are the main use cases
  5. Pig Architecture
  6. Discuss high level components of Pig
  7. Pig Grunt - How to Start and Use
Pig Latin Programming

  1. Data Types
  2. Cheat sheet
  3. Schema
  4. Expressions
  5. Commands and Exercise
  6. Load, Dump, Relational-Operations, Store, Foreach, Group, Order-By, Distinct, Filter, Join, Cogroup, Union, Cross, Limit, Sample, Parallel
Use Cases (working exercise)

  1. Use Case 1
  2. Use Case 2
  3. Use Case 3 (compare pig and hive)
Advanced Features, UDFs

Best Practices and common pitfalls

Mahout & Machine Learning

  1. Mahout Overview
  2. Mahout Installation
  3. Introduction to the Math Library
  4. Vector implementation and Operations (Hands-on exercise)
  5. Matrix Implementation and Operations (Hands-on exercise)
  6. Anatomy of a Machine Learning Application
Classification

  1. Introduction to Classification
  2. Classification Workflow
  3. Feature Extraction
  4. Classification Techniques (Hands-on exercise)
Evaluation (Hands-on exercise)

  1. Clustering
  2. Use Cases
  3. Clustering algorithms in Mahout
  4. K-means clustering (Hands-on exercise)
  5. Canopy clustering (Hands-on exercise)
Clustering

  1. Mixture Models
  2. Probabilistic Clustering – Dirichlet (Hands-on exercise)
  3. Latent Dirichlet Model (Hands-on exercise)
  4. Evaluating and Improving Clustering quality (Hands-on exercise)
  5. Distance Measures (Hands-on exercise)
Recommendation Systems

  1. Overview of Recommendation Systems
  2. Use cases
  3. Types of Recommendation Systems
  4. Collaborative Filtering (Hands-on exercise)
  5. Recommendation System Evaluation (Hands-on exercise)
  6. Similarity Measures
  7. Architecture of Recommendation Systems
  8. Wrap Up
Contact us:
+91-7065273000
info@aptronnoida.in
B-10, SECTOR-2

NEAR SECTOR - 15 METRO STATION

NOIDA - 201301, U.P. (INDIA)
This site was made on Tilda — a website builder that helps to create a website without any code
Create a website