Data Science Training



Data Science Training Course

The DevOps Certification Training Program will provide you with in-depth knowledge of various DevOps tools including Git, Jenkins, Docker, Ansible, Puppet, Kubernetes and Nagios. This training is completely hands-on and designed in a way to help you become a certified practitioner through best practices in Continuous Development, Continuous Testing, Configuration Management and Continuous Integration, and finally, Continuous Monitoring of software throughout its development life cycle.

Why should you take Data Science Training

80% of Global Fortune 500 organizations are expected to adopt DevOps by 2019 – CA

ADP, BBC News, ebay, GE, ING, Intuit, PayPal, Splunk, Uber & other MNCs worldwide use DevOps

Average salary given to a DevOps Engineer is around $123,354 per annum –


Quick Contact



Like the Course Data Science Training ? Enroll Now or Get the free career path

80% of Global Fortune 500 organizations are expected to adopt DevOps by 2019 – CA 80% of Global Fortune 500 organizations are expected to adopt DevOps by 2019 – CA



Data Science Training Course Curriculum

  • Introduction To Data Science
  • Life Cycle of Data Science
  • Skills required for Data Science
  • Careers Path in Data Science
  • Applications of Data Science
  • Relationship between Statistics and Data Science
  • Introduction to Data:
    • Data types
    • Data Collection Techniques
  • Descriptive Statistics:
     • Measures of Central Tendency
    • Measures of Dispersion
    • Measures of Skewness and Kurtosis
    • Visualization
  • Inferential Statistics:
    • Sampling variability and Central Limit Theorem
    • Confidence Interval for Mean
    • Hypothesis ,t- Test,F-Test,Chi-square Test
    • ANOVA
  • Random Sampling and Probability Distribution:
    • Probability and Limitations,Discrete Probability,Continuous Probability
    • Binomial, Poisson Distributions,Normal Distribution

Python programming:

  • Environment Setup
  • Jupyter Notebook Overview
  • Data types:Numbers,Strings,Printing,Lists,Dictionaries,Booleans,Tuples ,Sets
  • Comparison Operators
  • if,elif, else Statements
  • Loops:for Loops,while Loops
  • range()
  • list comprehension
  • functions
  • lambda expressions
  • map and filter
  • methods
  • Programming Exercises
  • Object Oriented Programming
  • Modules and packages
  • Errors and Exception Handling
  • Python Decorators
  • Python generators
  • Collections
  • Regular Expression

Python for Exploratory Data Analysis:
     • NumPy :
    • Installing numpy
    • Using numpy
                     • NumPy arrays
Creating numpy arrays from python list
  • Creating arrays using built in methods(arrange(),zeros(),ones(),linspace(),eye(),rand(),etc.
    •  Array attributes :shape, type
          • Array methods: Reshape(),min(),max(),argmax(),argmin(),etc.
   • Pandas:
          • Introduction to Pandas
                     • Series
                     • DataFrames
                     • Missing Data
                     • GroupBy
                     • Merging, Joining and Concatenating
                     • Operations
                     • Data Input and Output
Python for Data Visualization:

  • Matplotlib:
             • Installing Matplotlib,Basic Matplotlib commands
             • Creating Multiplot on same canvas
             • Object Oriented Method:figure(),plot(),add_axes(),subplots(),etc.
             • Matplotlib Exercise
  • Seaborn:
             • Categorical plot
             • Distribution plot
             • Regression plot
             • Seaborn Exercise

     • Pandas built in visualization:
                  • Scatter plot
              • Histograms
                  • Box plot

  • Introduction to RDBMS
  • Retrieving
  • Updating
  • Inserting
  • Deleting
  • Sorting AND Filtering
  • Summarizing AND Grouping
  • Using Sub queries
  • Joining Tables
  • Views
  • Stored Procedure
  • Python Database Connection API

Introduction To Machine Learning:

  • Relationship between Data Science and Machine Learning
  • Supervised Learning
  • Unsupervised Learning

Supervised Learning (Regression AND Classification Algorithms):

  • Linear Regression
  • Ridge Regression
  • Lasso Regression
  • Polynomial Regression
  • Support vector regression
  • Decision Tree Regression
  • Random Forest Regression
  • Logistic Regression
  • Support Vector Machines
  • Kernel SVM
  • Decision Trees and Random Forest
  • Ensemble Of Decision Trees
  • Model Evaluation and Improvement
  • CAPSTONE PROJECT for supervised learning

Unsupervised Learning:

  • Challenges in Unsupervised Learning
  • Preprocessing AND Scaling
  • Dimensionality Reduction, Feature Extraction
  • Principle Component Analysis (PCA)
  • Clustering
  • Model evaluation and improvement
  • Cross validation, Grid search, Evaluation metrics and scoring
  • Working with text data
  • Corpus
  • Text preprocessing using Bag of words technique
  • TF(Term Frequency)
  • IDF(Inverse Document Frequency)
  • Normalization
  • Vectorization
  • NLP with Python
  • Introduction to Deep Learning:Deep Learning Applications, Artificial Neural Network, TensorFlow Demo, Deep Learning Frameworks
  • Up and Running with TensorFlow:Installation, Creating Your First Graph and Running It in a Session, Managing Graphs, Lifecycle of a Node Value, Linear Regression with Tensor Flow, Implementing Gradient Descent, Feeding Data to the Training Algorithm, Saving and Restoring Models, Visualizing
    the Graph and Training Curves Using Tensor Board, Name Scopes, Modularity, Sharing Variables
  • Introduction to Artificial Neural Networks:From Biological to Artificial Neurons, Training an MLP with TensorFlow’s High-Level API, Training a DNN Using Plain TensorFlow, Fine-Tuning Neural Network Hyper parameters
  • Training Deep Neural Nets:Vanishing / Exploding Gradients Problems, Reusing Pretrained Layers, Faster Optimizers, Avoiding Overfitting Through Regularization, Practical Guidelines
  • Convolutional Neural Networks:The Architecture of the Visual Cortex, Convolutional Layer, Pooling Layer, CNN Architectures
  • Recurrent Neural Networks:Recurrent Neurons, Basic RNNs in TensorFlow, Training RNNs, Deep RNNs, LSTM Cell, GRU Cell, Natural Language Processing
  • Auto encoders: Efficient Data Representations, Performing PCA with an Undercomplete Linear Autoencoder, Stacked Autoencoders, Unsupervised Pretraining Using Stacked Auto encoders, Denoising Autoencoders, Sparse Auto encoders, Variational Autoencoders
  • Reinforcement Learning:Learning to Optimize Rewards, Policy Search, Introduction to OpenAI Gym,Neural Network Policies, Evaluating Actions: The Credit Assignment Problem, Policy Gradients,Markov Decision Processes, Temporal Difference Learning and Q-Learning, Learning to Play Ms. Pac-Man Using Deep Q-Learning
  • Quizzes, gamified assessments & Capstone project

Linux (Ubuntu/Cent Os) – Tips and Tricks
Basic(core) Java Programming Concepts – OOPS
Introduction to Big Data and Hadoop

  • What is Big Data?
  • What are the challenges for processing big data?
  • What is Hadoop?
  • Why Hadoop?
  • History of Hadoop
  • Hadoop ecosystem
  • HDFS
  • MapReduce

Understanding the Cluster

  • Hadoop 2.x Architecture
  • Typical workflow
  • HDFS Commands
  • Writing files to HDFS
  • Reading files from HDFS
  • Rack awareness
  • Hadoop daemons
  • Before MapReduce
  • MapReduce overview
  • Word count problem
  • Word count flow and solution
  • MapReduce flow

Developing the MapReduce Application

  • Data Types
  • File Formats
  • Explain the Driver, Mapper and Reducer code
  • Configuring development environment – Eclipse
  • Writing unit test
  • Running locally
  • Running on cluster
  • Hands on exercises

How MapReduce Works

  •  Anatomy of MapReduce job run
  • Job submission
  • Job initialization
  • Task assignment
  • Job completion
  • Job scheduling
  • Job failures
  • Shuffle and sort
  • Hands on exercises

MapReduce Types and Formats

  • File Formats – Sequence Files
  • Compression Techniques
  • Input Formats – Input splits & records, text input, binary input
  • Output Formats – text output, binary output, lazy output
  • Hands on exercises


  • Side data distribution
  • MapReduce combiner
  • MapReduce partitioner
  • MapReduce distributed cache
  • Hands exercises


  • Hive Architecture
  • Types of Metastore
  • Hive Data Types
  • HiveQL
  • File Formats – Parquet, ORC, Sequence and Avro Files Comparison
  • Partitioning & Bucketing
  • Hive JDBC Client
  • Hive UDFs
  • Hive Serdes
  • Hive on Tez
  • Hands-on exercises
  • Integration with Tableau


  • Pig Architecture
  • Pig Data Types
  • Load/Store Functions
  • PigLatin
  • Pig Udfs


  • HBase architecture and concepts
  • Hbase Data Model
  • Hbase Shell Interface
  • Hbase Java API


  • Sqoop Architecture
  • Sqoop Import Command Arguments, Incremental Import
  • Sqoop Export
  • Sqoop Jobs
  • Hands-on exercises


  • Flume Agent Setup
  • Types of sources, channels, sinks Multi Agent Flow
  • Hands-on exercises


  • Oozie workflow creations
  • Oozie Job submission, monitoring, debugging
  • Concepts on Coordinators and Bundles
  • Hands-on exercises

Case Studies Discussions
Any one of the Four Projects

  • Log File Analysis covering Flume, HDFS, MR/Pig, Hive, Tableau
  • Crime Data Analysis Covering Oozie, Sqoop, HDFS, Hive, Hbase, RestFul Client.
  • Hadoop Use Cases in Insurance Domain
  • Hadoop Use Cases in Retail Domain

Understand the difference between Apache Spark and Hadoop

Learn Scala and its programming implementation

  • Why Scala
  • Scala Installation
  • Get deep insights into the functioning of Scala
  • Execute Pattern Matching in Scala
  • Functional Programming in Scala – Closures, Currying, Expressions, Anonymous Functions
  • Know the concepts of classes in Scala
  • Object Orientation in Scala – Primary, Auxiliary Constructors, Singleton & Companion Objects
  • Traits and Abstract classes in Scala
  • Scala Simple Build Tool – SBT
  • Building with Maven
  • What is Apache Spark?
  • Spark Installation
  • Spark Configuration
  • Spark Context
  • Using Spark Shell
  • Resilient Distributed Datasets (RDDs) – Features, Partitions, Tuning Parallelism
  • Functional Programming with Spark

Working with RDDs

  • RDD Operations – Transformations and Actions
  • Types of RDDs
  • Key-Value Pair RDDs – Transformations and Actions
  • MapReduce and Pair RDD Operations
  • Serialization

Spark on a cluster

  • Overview
  • A Spark Standalone Cluster
  • The Spark Standalone Web UI
  • Executors & Cluster Manager
  • Spark on YARN Framework

 Writing Spark Applications

  • Spark Applications vs. Spark Shell
  • Creating the SparkContext
  • Configuring Spark Properties
  • Building and Running a Spark Application
  • Logging
  • Spark Job Anatomy

Caching and Persistence

  • RDD Lineage
  • Caching Overview
  • Distributed Persistence

Improving Spark Performance

  • Shared Variables: Broadcast Variables
  • Shared Variables: Accumulators
  • Per Partition Processing
  • Common Performance Issues

Spark API for different File Formats & Compression Codecs

  • Text
  • CSV
  • Sequence
  • Parquet
  • ORC
  • Compression Techniques – Snappy, Zlib, Gzip
  • Spark SQL Overview
  • HiveContext
  • SQL Datatypes
  • Dataframes vs RDDs
  • Operations on DFs
  • Parquet Files with Spark Sql – Read, Write, Partitioning, Merging Schema
  • ORC Files
  • JSON Files
  • Inferring Schema programmatically
  • Custom Case Classes
  • Temp Tables vs Persistent Tables
  • Writing UDFs
  • Hive Support
  • JDBC Support – Examples
  • HBase Support – Examples

Spark Streaming

  • Spark Streaming Overview
  • Example: Streaming Word Count
  • Other Streaming Operations
  • Sliding Window Operations
  • Developing Spark Streaming Applications – Integration with Kafka and H base

Kafka Ecosystem

  • Overview
  • Producer
  • Consumer
  • Broker
  • Topics
  • Partitions

Kafka Twitter Data Setup

  • Writing Producer in Scala
  • Writing Consumer in Scala & Java

Kafka Integration with Spark Streaming

  • Real use case – Integration of Kafka with Spark Streaming for processing Streaming Log files and Storing results into H base


Data Science Training Description

DevOps improves collaboration and productivity by automating infrastructure and workflows and continuously measuring applications performance. In this course you will learn about Version Controlling, Code Automation, Continuous Integration, Continuous Deployment, Configuration Management, and Monitoring of application.

The following professionals can go for this course:
  • Software Tester
  • System Admin
  • Solution Architect
  • Security Engineer
  • Application Developers
  • Integration Specialist

After completing this Devops Certification Training, you should be able to:

  • Manage and keep a track of different versions of the source code using Git
  • Build and Automate Test using Jenkins and Maven
  • Automate testing of web elements using Selenium suite of tools
  • Build and Deploy containerization using Docker
  • Learn different roles and Command Line usage of Ansible
  • Manage clustering and scaling with Kubernetes
  • Perform Continuous Monitoring using Nagios
  • Gain experience of working on an industry standard live Project
Required Pre-requisites:
  • Any Scripting Language Knowledge
  • Linux Fundamentals
To help you brush up these skills, you will get the following self-paced courses absolutely free:
  • Python Scripting
  • Linux Fundamentals

Data Science Training Features


Be future ready. Start learning – Data Science Training? Enroll Now or Get the free career path

80% of Global Fortune 500 organizations are expected to adopt DevOps by 2019 – CA 80% of Global Fortune 500 organizations are expected to adopt DevOps by 2019 – CA