Hadoop


Apache Hadoop is the open source data management software that helps organizations analyze huge volumes of structured and unstructured data, is a very hot topic across the tech industry. It can be quickly learn to take advantage of the MapReduce framework through technical sessions and hands-on labs.
by Mithun Reddy
Free
0 Lessons

WHAT YOU GET

play-button

Request for a Demo

Hadoop

KEY FEATURES

woman-with-glasses

Experienced and certified trainers with real time project experience.

null

35-40 Interactive online training sessions.

null

In-house course content and with consistent course structure and design

null

Exclusive Student Portal

null

Weekly Assessments

null

Learn Anytime and Anywhere

Overview

Hadoop Course will provide the basic concepts of MapReduce applications developed using Hadoop, including a close look at framework components, use of Hadoop for a variety of data analysis tasks, and numerous examples of Hadoop in action. This course will further examine related technologies such as Hive, Pig, and Apache Accumulo.

Pre-Requisites

Students must be belonging to IT Background and familiar with Concepts in Java and Linux.
0
Average Rating

Class Interaction

Study Material

Trainer Experience

Doubts Clearing

Assignments & Case Studies

Overall Experience

Modules

Hadoop Basic Concepts

  • An Overview of Hadoop
  • The Hadoop Distributed File System
  • Hands on Exercise
  • How MapReduce Works
  • Hands on Exercies
  • Anatomy of a Hadoop Cluster
  • Other Hadoop Ecosystem Components

Performing several hadoop jobs

  • The configure and close Methods
  • Sequence Files
  • Record Reader
  • Record Writer
  • Role of Reporter
  • Output Collector
  • Processing video files and audio files
  • Processing image files
  • Processing XML files
  • Counters
  • Directly Accessing HDFS
  • ToolRunner
  • Using The Distributed Cache

Developing Deeper Into The Hadoop API

  • More About ToolRunner
  • Testing with MRUnit
  • Reducing Intermediate Data With Combiners
  • The configure and close methods for Map/Reduce Setup and Teardown
  • Writing Partitioners for Better Load Balancing
  • Hands-On Exercise
  • Directly Accessing HDFS
  • Using the Distributed Cache
  • Hands-On Exercise

Writing a MapReduce Program

  • Examining a Sample MapReduce Program
  • With several examples
  • Basic API Concepts
  • The Driver Code
  • The Mapper
  • The Reducer
  • Hadoop’s Streaming API

Using Hive and Pig

  • Hive Basics
  • Pig Basics
  • Hands on Exercise

Practical Development Tips and Techniques

  • Debugging MapReduce Code
  • Using LocalJobRunner Mode for Easier Debugging
  • Retrieving Job Information with Countrers
  • Logging
  • Splittable File Formats
  • Determining the Optimal Number of Reducers
  • Map-Only MapReduce Jobs
  • Hands on Exercise

Debugging MapReduce Programs

  • Testing with MRUnit
  • Logging
  • Classification/Machine Learning

Advanced MapReduce Programming

  • A Recap of the MapReduce Flow
  • The Secondary Sort
  • CustomizedInputFormats and OutputFormats
  • Pipelining Jobs With Oozie
  • Map-Side Joins
  • Reduce-Side Joins

Joining Data Sets in MapReduce

  • Map-Side Joins
  • The Secondary Sort
  • Reduce-Side Joins

Monitoring and debugging on a Production Cluster

  • Counters
  • Skipping Bad Records
  • Rerunning failed tasks with Isolation Runner

Tuning for Performance in MapReduce

  • Reducing network traffic with combiner
  • Partitioners
  • Reducing the amount of input data
  • Using Compression
  • Reusing the JVM
  • Running with speculative execution
  • Refactoring code and rewriting algorithms Parameters affecting Performance
  • Other Performance Aspects
Angular Js

Front-End JavaScript Frameworks: AngularJs

About this course: This course concentrates mainly on Javascript based front-end frame-works, And in particular, AngularJs is, The most popular among them. We will review the model view controller(MVC) design-pattern in the context of AngularJs

Subscribe to our Newsletter for our Latest Updates

Lessons

Contact Us

Category