Big Data Hadoop



New job opportunities are emerging for IT professionals in the field of "Hadoop", the term used to describe how corporations gather vast amounts of real-time data about their customers and analyze that data to drive decision making and increase profitability.

Prerequisite:This course is designed for developers with some programming experience (preferably Java). Existing knowledge of Hadoop is not required.

Training is designed to accomplish the following skills:
  • How MapReduce and the Hadoop Distributed File System work
  • How to write MapReduce code in Java or other programming languages
  • What issues to consider when developing MapReduce jobs
  • How to implement common algorithms in Hadoop
  • Best practices for Hadoop development and debugging
  • Advanced Hadoop API topics required for real-world data analysis
  • Get Hands-on Big Data / Hadoop Ecosystem Training
  • Learn from the Experts
  • Post Training Support & Guidance Hands on Practical Approach in our State of the Art Lab in the Silicon Valley
  • Direct work opportunity with Fortune 100 companies
  • Throughout the course, students write Hadoop code and perform other hands-on exercises to solidify their understanding of the concepts being presented.
Detailed Course tasks
Week1: An Overview of Hadoop
  • Hadoop History
  • The ecosystem and stack: HDFS, MapReduce, Hive, Pig
  • Cluster architecture overview
  • Hadoop distribution and basic commands
  • The Hadoop Distributed File System
  • How MapReduce Works
  • Hands-On Exercise
  • Anatomy of a Hadoop Cluster
  • Other Hadoop Ecosystem Components
  • HDFS Introduction
  • The HDFS command line and web interfaces
  • The HDFS Java API (lab)
Week2: Writing a MapReduce Program
  • The MapReduce Flow
  • Examining a Sample MapReduce Program
  • Basic MapReduce API Concepts
  • The Driver Code
  • The Mapper
  • The Reducer
  • Hadoop´s Streaming API
  • Using Eclipse for Rapid Development
  • Hands-on exercise
  • The New MapReduce API

Week3: Common MapReduce Algorithms
  • Sorting and Searching
  • Indexing
  • Machine Learning With Mahout
  • Term Frequency Inverse Document Frequency
  • Word Co-Occurrence
  • Hands-On Exercise
Week4: Practical Development Tips and Techniques
  • Debugging MapReduce Code
  • Using LocalJobRunner Mode For Easier Debugging
  • Retrieving Job Information with Counters
  • Logging
  • Split table File Formats
  • Determining the Optimal Number of Reducers
  • Map-Only MapReduce Jobs
  • Hands-On Exercise

Week5: HBase
  • The components of an HBase cluster
  • When you should -- and should not -- use HBase
  • How to use the HBase shell to directly manipulate HBase tables
  • How to design optimal HBase schemas for efficient data storage and recovery
  • How to connect to HBase using the Java API
  • How to configure an HBase cluster
  • How to administer an HBase cluster, identifying and resolving performance bottlenecks Hands-On Exercise
Week6-8: Live Project and Interview preparation