facebook-pixel
$200.00
Certification

Industry recognized certification enables you to add this credential to your resume upon completion of all courses

Need Custom Training for Your Team?
Get Quote
Call Us

Toll Free (844) 397-3739

Inquire About This Course
Instructor
Ganapathi Devappa, Instructor - Real Time Big Data Streaming on Apache Storm - Beginner to Advanced

Ganapathi Devappa

Has been working in the IT industry for more than 20 years as a Developer, Technical Manager, and a Big Data Consultant. Gananpathi is a Certified Hadoop Developer; he's been teaching and offering trainings for the past 5 years, in Big Data Hadoop, Storm, Kafka, and NoSQL databases like HBase and Cassandra. He also writes blogs on Big Data for several websites.

Develop distributed stream processing applications using Apache Storm

  • Learn how to develop Apache Storm programs and interface with tools like Kafka, Cassandra, and Twitter.
  • 26 demos and hands-on examples.
  • Instructor has more than 20 years of experience working in the high tech space. 

Duration: 5h 05m

Course Description

This course teaches you how to write programs in Apache Storm to take streaming data from tools like Kafka and Twitter in real time, process in Storm and save to tables in Cassandra or files in Hadoop HDFS. You will be able to develop distributed stream processing applications that can process streaming data in parallel and handle failures. You will be able to implement data transformations like maps and filters in Apache Storm, implement stateful stream processing and exactly once processing. It covers some administrative aspects also like setting up an Apache Storm cluster, scheduling, monitoring and metrics reporting. This is a hands on course, so you will be developing many Apache Storm programs in the course using Eclipse IDE and Java programming. Theory will be intermixed with practice so that you implement what you have learned as a developer. You will write more than thirty programs during this course. Only way to learn a new tool quickly is to practice by writing programs. This course provides you the right mix of theory and practice with real life industry use of Apache Storm. By enrolling in this course, you will be on a journey to become a big data developer using Apache Storm.

What am I going to get from this course?

Implement Apache Storm programs that take real time streaming data from tools like Kafka and Twitter, process in Storm and save to tables in Cassandra or files in Hadoop HDFS. You will be able to develop distributed stream processing applications that can process streaming data in parallel and handle failures. You will be able to implement stateful stream processing, implement data transformations like maps and filters and implement exactly once processing.

Prerequisites and Target Audience

What will students need to know or do before starting this course?

  • Experience in devloping software projects
  • Some programming experience in Java required
  • Use of Java IDE like Eclipe or IntelliJ

Who should take this course? Who should not?

Real time big data processing tools have become main stream now and lot of organizations have started processing big data in real time. Apache storm is one of the popular tools for processing big data in real time. If you are familiar with Java, then you can easily learn Apache Storm programming to process streaming data in your organization. Through this course, I aim to provide you with working knowledge of Apache Storm so that you can write distributed programs to process streaming data. 
 

Curriculum

Module 1: Introduction

Lecture 1 Introduction
Lecture 2 Course Prerequisites
Lecture 3 Course Structure
Lecture 4 Data Sizes in Big Data
Lecture 5 Big Data Problem
Lecture 6 Traditional Solution
Lecture 7 Big Data Solution
Lecture 8 Demo and practice activity: Install Eclipse

Download, Install and start Eclipse

Lecture 9 Down load the training programs

Download the training program zip file. Create a directory C:\storm in windows and unzip the training program zip file in that directory. It will create three directories input, output and training and copy the files. The output folder will be empty.

Lecture 10 Demo and practice activity: Create a maven project in Eclipse

Create a maven project and set build path

Lecture 11 Demo and practice activity : Add Apache storm programs to Eclipse project

Add the training programs provided to the created eclipse project

Lecture 12 Demo and practice activity: Compile the storm program in Eclipse

Correct the mistakes, adjust the build path and create a run configuration to run the program

Lecture 13 Demo and practice activity: Run the Apache Storm program from Eclipse

Using the run configuration, run the storm program in local cluster and see the results.

Lecture 14 Summary

Module 2: Introduction to Apache Storm

Lecture 15 Agenda
Lecture 16 Storm Features
Lecture 17 Zookeeper
Lecture 18 Storm Architecture
Lecture 19 Storm Data Model
Lecture 20 Storm Topology
Lecture 21 Storm Topology Simple Example
Lecture 22 Demo and practice activity: Create a simple Apache Storm program

In this demo, practice to create a simple Storm program using the sample program provided and run in the local cluster to see the results.

Lecture 23 Storm Topology: Case Study1
Lecture 24 Demo and practice activity: Implement the case study as Apache Storm program

Implement the case study 1 program in Eclipse and run the program to see the results

Lecture 25 Storm Topology: Case Study2
Lecture 26 Demo and practice activity: Implement the case study 2 program

Implement the case study 2 program in Eclipse, run and see the results.

Lecture 27 Tick Tuples
Lecture 28 Demo and practice activity: Implement the periodic processing in Storm with tick tuples

Use tick tuples to implement the Apache Storm program for periodic processing. Run and see the results.

Lecture 29 Summary
Lecture 30 Practice activity: Write the storm programs for the five assignments and run them with the data provided

Five assignments are described in the document. Modify the programs in this section to complete assignment programs and run them. Sample programs are provided to help with few assignments. Download them from the download section. Look at them only if you have trouble completing the assignment programs. Make sure to run them and see the results before you move on to the next section.

Module 3: Storm Installation & Configuration

Lecture 31 Agenda
Lecture 32 Storm Environment Setup
Lecture 33 Install Zookeeper
Lecture 34 Storm Download
Lecture 35 Starting Storm Servers
Lecture 36 Demo
Lecture 37 Demo and practice activity: Create a thin jar in Eclipse

Use this demo to create a thin jar that you can use to run your Apache Storm programs. The maven build in Eclipse can be used to build a thin jar.

Lecture 38 Demo and practice activity: Create a far jar in Eclipse

Create a fat jar for the storm program so that it includes the dependent libraries.

Lecture 39 Submitting a Job to Storm
Lecture 40 Demo
Lecture 41 Storm Topology
Lecture 42 Topology Demo
Lecture 43 Using Eclipse for Storm Programs
Lecture 44 Setup a Storm Cluster
Lecture 45 Summary
Lecture 46 Practice activity: Perform the five activities specified

Practice what you have learned in this section by completing the practice activities. Sample program is provided for one of the activities in the download section.

Module 4: Storm Classes & Groupings

Lecture 47 Agenda
Lecture 48 Bolt Parallelism
Lecture 49 Stream Grouping
Lecture 50 The Fields Class
Lecture 51 Storm Classes and Interfaces
Lecture 52 IRichSpout Interface
Lecture 53 NextTuple Method
Lecture 54 IRichBolt Interface
Lecture 55 Building a Topology
Lecture 56 Declarer Interfaces
Lecture 57 Demo and practice activity: Shuffle grouping with multiple tasks
Lecture 58 Demo and practice activity: Fields grouping with multiple tasks
Lecture 59 Normal Tuple Processing in Storm
Lecture 60 Demo and practice activity: Implement reliable processing in Storm
Lecture 61 Summary
Lecture 62 Practice activity: Write the programs for the nine activities listed and run to check the output

Nine activities listed provide good practice for this section. Sample programs are provided for some of the activities in the download section. Look at them only after trying out the activities. Always run, correct the mistakes and check the output.

Module 5: Trident

Lecture 63 Agenda
Lecture 64 Trident Layer
Lecture 65 Trident Operations
Lecture 66 Case Study: Trident Operations
Lecture 67 Demo and practice activity: Implement Trident stream transformations

The previous case study is illustrated with the actual program in Eclipse. The student is encouraged to create this program in Eclipse using the training program files provided, run and check the results.

Lecture 68 Windowing
Lecture 69 Partition Agreggate
Lecture 70 General Aggregator
Lecture 71 Repartitioning Operations
Lecture 72 Aggregate Operations
Lecture 73 Operations on Grouped Streams
Lecture 74 Trident State
Lecture 75 Trident Exactly Once Processing
Lecture 76 Case Study: Trident State Updates
Lecture 77 Demo and practice activity: Trident state implementation part 1 : Spout implementation

The Trident state processing and exactly once processing implementation is quite complex. It is implemented and illustrated in a step by step manner in multiple parts. I start with showing the spout that produces the batch of tuples.

Lecture 78 Demo and practice activity: Trident state implementation part 2: IBackingMap implementation

I continue here with the implementation of IBackingMap interface in Trident. Part of this class, the method multiGet is illustrated here.

Lecture 79 Demo and practice activity: Trident state implementation part 3: IBackingMap and StateFactory implementation

Here I cover the multiPut method of IBackingMap implementation and continue with simple implementation of StateFactory.

Lecture 80 Demo and practice activity: Trident state implementation part 4: The main method implementation

Now that I have all the pieces in place, it is time to connect the pieces in the main method by creating the Trident topology and adding the spout and state processing to the topology.

Lecture 81 Demo and practice activity: Trident state implementation part 5: Run the Trident state processing program

It is finally time to see the fruits of our labor. Here I will run the create program in the local cluster and see the results. Make sure you also follow the demo and run the program on your machine to check the results.

Lecture 82 Summary
Lecture 83 Practice activity: Write the programs for the six activities listed and check the output

These six activities help you apply the Trident interface to processing streams in Apache Storm. Sample programs are provided for some of the activities. You can download the sample programs from the download section,

Module 6: Storm Scheduling

Lecture 84 Agenda
Lecture 85 Storm User Interface
Lecture 86 Storm Schedulers
Lecture 87 Isolation Scheduler
Lecture 88 Resource Aware Scheduler
Lecture 89 Resource Aware Scheduler: Example
Lecture 90 Default Configurations
Lecture 91 Metrics Reporting
Lecture 92 Configuration for Ganglia
Lecture 93 Summary
Lecture 94 Practice activity: Perform the two activities listed in this section

Perform the two activities listed in this section by modifying the existing programs. A sample modified file is provided. You can download the sample program from the download section.

Lecture 95 Demo: Monitor multiple topologies using Storm User Interface

Look at multiple topologies including reliable topology and Trident topology in the Storm UI

Module 7: Storm Interfaces

Lecture 96 Agenda
Lecture 97 Apache Kafka
Lecture 98 Storm Kafka Spout Example
Lecture 99 Compiling for Kafka
Lecture 100 Demo and practice activity: Setup and start Zookeeper and Kafka servers

To illustrate Storm interface to Kafka, let us first setup Zookeeper and Kafka and start the servers.

Lecture 101 Demo and practice activity: Create a new topic in Kafka

Create a topic in Kafka so that Storm can receive messages from this topic.

Lecture 102 Demo and practice activity: Start Kafka producer

Start the Kafka console producer process that can take the typed messages and send them to Storm

Lecture 103 Demo and practice activity: Storm Program for interfacing with Kafka

Here you can look at the Apache Storm program that uses a Kafka client spout to connect to a Kafka topic and gets the messages and prints them out.

Lecture 104 Demo and practice activity: Run the program and see flow of messages from Kafka to Storm

Here you will start the Storm program that interfaces with Kafka. Messages will be entered for Kafka and the same messages can be seen in the Storm output.

Lecture 105 Cassandra
Lecture 106 Setting Properties for Cassandra
Lecture 107 Writing to Cassandra Table
Lecture 108 Real Time Data Analytics Platform
Lecture 109 Demo and practice activity: Setup and start Cassandra server

Install Cassandra and start the Cassandra server

Lecture 110 Demo and practice activity: Create key space and table in Cassandra

Create a key space in Cassandra and create a table in Cassandra to receive the data from Storm.

Lecture 111 Demo and practice activity: Look at the Storm program that takes messages from Kafka and stores to table in Cassandra

Here I illustrate the real time data analytics platform with the Apache Storm program that takes messages from a topic in Kafka and stores as rows into a table in Cassandra in real time.

Lecture 112 Demo and practice activity: Run the Kafka-Storm-Cassandra interface program to see the flow of data from Kafka to Cassandra table

Finally the real time data analytics platform is illustrated by running the Storm interface program. You can enter the messages for Kafka topic in one console window and see the data updated in the Cassandra table in another console window.

Lecture 113 Example Writing to HDFS
Lecture 114 Demo and practice activity: Create the program to store data into Hadoop HDFS from Kafka

The program illustrates reading data from Kafka topic and inserting into a directory in HDFS.

Lecture 115 Interfacing with Twitter
Lecture 116 Setting Authorization
Lecture 117 Demo and practice activity: Create the program for getting tweets from Twitter in Apache Storm

This program illustrates using Twitter4J to get data from Twitter and processing the tweets in Storm. The link for creating the Twitter developer account and getting the Twitter credentials is provided in the download section.

Lecture 118 Demo and practice activity: Run the twitter interface program and look at the live tweets

The program filters the tweets in real time for certain key words and displays the tweets. You can also run the program by providing the Twitter credentials. The Twitter credentials can be obtained from the link provided.

Lecture 119 Summary
Lecture 120 Practice activity: Write the programs for the seven activities listed in this section and check the results

Seven activities in this section can be used to practice the Storm interfaces. You can go through the demos multiple times to perform the commands for Kafka and Cassandra as well. Sample programs are provided for some of the activities. You can download the sample programs from the download section.

Lecture 121 Course Conclusion

Course summary, next steps

Quiz 1 Big Data - Storm Intro - Installation
Quiz 2 Installation - Classes & Groupings
Quiz 3 Scheduling & Monitoring - Interfaces - Trident
Quiz 4