Virtual box/VM Ware
- Basics & Installations
Linux
- Basics
Hadoop
- What is Hadoop?
- Why Hadoop and flow of Hadoop
- Scaling
- Distributed Framework
- Hadoop v/s RDBMS
- A brief history of Hadoop
Hadoop installation in pseudo mode
- Adding and removing nodes (without down time)
- Decommissioning nodes
- Block size
- Hadoop Processes ( NN, SNN, JT, DN, TT)
- Common errors when running Hadoop cluster, solutions
HDFS- Hadoop distributed File System
- HDFS Design and Architecture
- HDFS Concepts
- Interacting HDFS using the command line
- Dataflow
- Introduction about Blocks
- Data Replication
- Admin Commands
- Hadoop archives
Hadoop Processes
- Name node and its functionality
- Secondary name node and its functionality
- Job tracker and its functionality
- Task tracker and its functionality
- Data node and its functionality
- Resource manager and its functionality Hadoop
- Node Manager and its functionality
Map Reduce
- Developing Map Reduce Application
- Phases in Map-Reduce Framework
- Map Reduce Input and Output Formats
- Advanced Concepts
- Combiner
- HAR
- Partitioner, sorting, shuffling
- Different phases of MapReduce programs
- Data localization
- Different unstructured data processing examples
- Image processing by using MapReduce
Joining datasets in MapReduce jobs
- Map-side join
- Reduce-Side join
Hadoop Programming Languages
PIG
- Introduction (Basics)
- Installation and Configuration
- Different datatypes
- Interacting HDFS using PIG
- Map Reduce Programs through PIG 6. PIG Commands
- Execution mechanisms (grunt, script…)
- Loading, Filtering, Grouping, joins….
- Sample programs in PIG with Real-time
Hive
- Basics (Introduction)
- Installation and Configurations
- Datatypes and operators
- HQL Commands
- Interacting HDFS using Hive
- MapReduce programs through Hive
- Joins, groups, filter……
- Sample Programs in a hive with real-time
- Join vs Map Join
Impala
- Basics
- Commands
Sqoop
- Introduction to scoop
- Installations & Configurations
- Sqoop commands
- Connect to a relational database using sqoop and downloading lakhs of records to Hadoop (in a single minute)
Flume
- Basics (Introduction)
- Installation and Configurations
NoSQL Databases Concepts
Hbase
- Basics & Installations
- commands
- Interacting Hbase with HDF
MongoDB
- Basics & Installations I
- All queries for processing data
Apache Spark
- Introduction
- Installations and configurations
- RDD, SC….
- Scala Introduction
- Interacting spark with HDFS
- Programs in Spark through Scala
Specialities
-
ETL tool (Data Warehousing BI Tools)
PDI
- Introduction
- Creating RDBMS database
- Establishing Connection between PDI to RDMS database
- Creating data in Hadoop
- Establishing Connection between PDI to Hadoop data
- Moving data from Hadoop to RDBMS and vice versa
- Summarization