Data Science Training

Data Science Training Course

The DevOps Certification Training Program will provide you with in-depth knowledge of various DevOps tools including Git, Jenkins, Docker, Ansible, Puppet, Kubernetes and Nagios. This training is completely hands-on and designed in a way to help you become a certified practitioner through best practices in Continuous Development, Continuous Testing, Configuration Management and Continuous Integration, and finally, Continuous Monitoring of software throughout its development life cycle.

Why should you take Data Science Training

80% of Global Fortune 500 organizations are expected to adopt DevOps by 2019 – CA

ADP, BBC News, ebay, GE, ING, Intuit, PayPal, Splunk, Uber & other MNCs worldwide use DevOps

Average salary given to a DevOps Engineer is around $123,354 per annum – smecjobs.com

Quick Contact

+91 9606046725

Like the Course Data Science Training ? Enroll Now or Get the free career path

80% of Global Fortune 500 organizations are expected to adopt DevOps by 2019 – CA 80% of Global Fortune 500 organizations are expected to adopt DevOps by 2019 – CA

Data Science Training Course Curriculum

Module 1: Introduction to Data Science

Introduction To Data Science
Life Cycle of Data Science
Skills required for Data Science
Careers Path in Data Science
Applications of Data Science

Module 2: Statistics

Relationship between Statistics and Data Science
Introduction to Data:
• Data types
• Data Collection Techniques
Descriptive Statistics:
• Measures of Central Tendency
• Measures of Dispersion
• Measures of Skewness and Kurtosis
• Visualization

Inferential Statistics:
• Sampling variability and Central Limit Theorem
• Confidence Interval for Mean
• Hypothesis ,t- Test,F-Test,Chi-square Test
• ANOVA
Random Sampling and Probability Distribution:
• Probability and Limitations,Discrete Probability,Continuous Probability
• Binomial, Poisson Distributions,Normal Distribution

Module 3: Python for Data Science

Python programming:

Environment Setup
Jupyter Notebook Overview
Data types:Numbers,Strings,Printing,Lists,Dictionaries,Booleans,Tuples ,Sets
Comparison Operators
if,elif, else Statements
Loops:for Loops,while Loops
range()
list comprehension
functions
lambda expressions
map and filter
methods
Programming Exercises
Object Oriented Programming
Modules and packages
Errors and Exception Handling
Python Decorators
Python generators
Collections
Regular Expression

Python for Exploratory Data Analysis:
• NumPy :
• Installing numpy
• Using numpy
• NumPy arrays
• Creating numpy arrays from python list
• Creating arrays using built in methods(arrange(),zeros(),ones(),linspace(),eye(),rand(),etc.
• Array attributes :shape, type
• Array methods: Reshape(),min(),max(),argmax(),argmin(),etc.
• Pandas:
• Introduction to Pandas
• Series
• DataFrames
• Missing Data
• GroupBy
• Merging, Joining and Concatenating
• Operations
• Data Input and Output
Python for Data Visualization:

Matplotlib:
• Installing Matplotlib,Basic Matplotlib commands
• Creating Multiplot on same canvas
• Object Oriented Method:figure(),plot(),add_axes(),subplots(),etc.
• Matplotlib Exercise
Seaborn:
• Categorical plot
• Distribution plot
• Regression plot
• Seaborn Exercise

• Pandas built in visualization:
• Scatter plot
• Histograms
• Box plot
CAPSTONE PROJECT FOR DATA ANALYSIS

Module 4: SQL for Data Science

Introduction to RDBMS
Retrieving
Updating
Inserting
Deleting
Sorting AND Filtering
Summarizing AND Grouping
Using Sub queries
Joining Tables
Views
Stored Procedure
Python Database Connection API

Module 5: Deep dive into Machine Learning

Introduction To Machine Learning:

Relationship between Data Science and Machine Learning
Supervised Learning
Unsupervised Learning

Supervised Learning (Regression AND Classification Algorithms):

Linear Regression
Ridge Regression
Lasso Regression
Polynomial Regression
Support vector regression
Decision Tree Regression
Random Forest Regression
Logistic Regression
Support Vector Machines
Kernel SVM
Decision Trees and Random Forest
Ensemble Of Decision Trees
Model Evaluation and Improvement
CAPSTONE PROJECT for supervised learning

Unsupervised Learning:

Challenges in Unsupervised Learning
Preprocessing AND Scaling
Dimensionality Reduction, Feature Extraction
Principle Component Analysis (PCA)
Clustering
K MEANS
Model evaluation and improvement
Cross validation, Grid search, Evaluation metrics and scoring
Working with text data
CAPSTONE PROJECT FOR Unsupervised Learning ALGORITHM

Module 6: NLP & Recommender Systems:

Corpus
Text preprocessing using Bag of words technique
TF(Term Frequency)
IDF(Inverse Document Frequency)
Normalization
Vectorization
NLP with Python

Module 7: Artificial Neural Network and Deep Learning

Introduction to Deep Learning:Deep Learning Applications, Artificial Neural Network, TensorFlow Demo, Deep Learning Frameworks
Up and Running with TensorFlow:Installation, Creating Your First Graph and Running It in a Session, Managing Graphs, Lifecycle of a Node Value, Linear Regression with Tensor Flow, Implementing Gradient Descent, Feeding Data to the Training Algorithm, Saving and Restoring Models, Visualizing
the Graph and Training Curves Using Tensor Board, Name Scopes, Modularity, Sharing Variables
Introduction to Artificial Neural Networks:From Biological to Artificial Neurons, Training an MLP with TensorFlow’s High-Level API, Training a DNN Using Plain TensorFlow, Fine-Tuning Neural Network Hyper parameters
Training Deep Neural Nets:Vanishing / Exploding Gradients Problems, Reusing Pretrained Layers, Faster Optimizers, Avoiding Overfitting Through Regularization, Practical Guidelines

Convolutional Neural Networks:The Architecture of the Visual Cortex, Convolutional Layer, Pooling Layer, CNN Architectures
Recurrent Neural Networks:Recurrent Neurons, Basic RNNs in TensorFlow, Training RNNs, Deep RNNs, LSTM Cell, GRU Cell, Natural Language Processing
Auto encoders: Efficient Data Representations, Performing PCA with an Undercomplete Linear Autoencoder, Stacked Autoencoders, Unsupervised Pretraining Using Stacked Auto encoders, Denoising Autoencoders, Sparse Auto encoders, Variational Autoencoders
Reinforcement Learning:Learning to Optimize Rewards, Policy Search, Introduction to OpenAI Gym,Neural Network Policies, Evaluating Actions: The Credit Assignment Problem, Policy Gradients,Markov Decision Processes, Temporal Difference Learning and Q-Learning, Learning to Play Ms. Pac-Man Using Deep Q-Learning
Quizzes, gamified assessments & Capstone project

Module 8: Data Visualization with Tableau

Hadoop Developer Course

Linux (Ubuntu/Cent Os) – Tips and Tricks
Basic(core) Java Programming Concepts – OOPS
Introduction to Big Data and Hadoop

What is Big Data?
What are the challenges for processing big data?
What is Hadoop?
Why Hadoop?
History of Hadoop
Hadoop ecosystem
HDFS
MapReduce

Understanding the Cluster

Hadoop 2.x Architecture
Typical workflow
HDFS Commands
Writing files to HDFS
Reading files from HDFS
Rack awareness
Hadoop daemons

MapReduce

Before MapReduce
MapReduce overview
Word count problem
Word count flow and solution
MapReduce flow

Developing the MapReduce Application

Data Types
File Formats
Explain the Driver, Mapper and Reducer code
Configuring development environment – Eclipse
Writing unit test
Running locally
Running on cluster
Hands on exercises

How MapReduce Works

Anatomy of MapReduce job run
Job submission
Job initialization
Task assignment
Job completion
Job scheduling
Job failures
Shuffle and sort
Hands on exercises

MapReduce Types and Formats

File Formats – Sequence Files
Compression Techniques
Input Formats – Input splits & records, text input, binary input
Output Formats – text output, binary output, lazy output
Hands on exercises

MapReduce Features

Counters

Side data distribution
MapReduce combiner
MapReduce partitioner
MapReduce distributed cache
Hands exercises

Hive

Hive Architecture
Types of Metastore
Hive Data Types
HiveQL
File Formats – Parquet, ORC, Sequence and Avro Files Comparison
Partitioning & Bucketing
Hive JDBC Client
Hive UDFs
Hive Serdes
Hive on Tez
Hands-on exercises
Integration with Tableau

Pig

Pig Architecture
Pig Data Types
Load/Store Functions
PigLatin
Pig Udfs

Hbase

HBase architecture and concepts
Hbase Data Model
Hbase Shell Interface
Hbase Java API

Sqoop

Sqoop Architecture
Sqoop Import Command Arguments, Incremental Import
Sqoop Export
Sqoop Jobs
Hands-on exercises

Flume

Flume Agent Setup
Types of sources, channels, sinks Multi Agent Flow
Hands-on exercises

Oozie

Oozie workflow creations
Oozie Job submission, monitoring, debugging
Concepts on Coordinators and Bundles
Hands-on exercises

Case Studies Discussions
Any one of the Four Projects

Log File Analysis covering Flume, HDFS, MR/Pig, Hive, Tableau
Crime Data Analysis Covering Oozie, Sqoop, HDFS, Hive, Hbase, RestFul Client.
Hadoop Use Cases in Insurance Domain
Hadoop Use Cases in Retail Domain

Scala, Spark & Kafka

Understand the difference between Apache Spark and Hadoop

Learn Scala and its programming implementation

Why Scala
Scala Installation
Get deep insights into the functioning of Scala
Execute Pattern Matching in Scala
Functional Programming in Scala – Closures, Currying, Expressions, Anonymous Functions
Know the concepts of classes in Scala
Object Orientation in Scala – Primary, Auxiliary Constructors, Singleton & Companion Objects
Traits and Abstract classes in Scala
Scala Simple Build Tool – SBT
Building with Maven

Spark Basics

What is Apache Spark?
Spark Installation
Spark Configuration
Spark Context
Using Spark Shell
Resilient Distributed Datasets (RDDs) – Features, Partitions, Tuning Parallelism
Functional Programming with Spark

Working with RDDs

RDD Operations – Transformations and Actions
Types of RDDs
Key-Value Pair RDDs – Transformations and Actions
MapReduce and Pair RDD Operations
Serialization

Spark on a cluster

Overview
A Spark Standalone Cluster
The Spark Standalone Web UI
Executors & Cluster Manager
Spark on YARN Framework

Writing Spark Applications

Spark Applications vs. Spark Shell
Creating the SparkContext
Configuring Spark Properties
Building and Running a Spark Application
Logging
Spark Job Anatomy

Caching and Persistence

RDD Lineage
Caching Overview
Distributed Persistence

Improving Spark Performance

Shared Variables: Broadcast Variables
Shared Variables: Accumulators
Per Partition Processing
Common Performance Issues

Spark API for different File Formats & Compression Codecs

Text
CSV
Sequence
Parquet
ORC
Compression Techniques – Snappy, Zlib, Gzip

Spark SQL

Spark SQL Overview
HiveContext
SQL Datatypes
Dataframes vs RDDs
Operations on DFs
Parquet Files with Spark Sql – Read, Write, Partitioning, Merging Schema
ORC Files
JSON Files
Inferring Schema programmatically
Custom Case Classes
Temp Tables vs Persistent Tables
Writing UDFs
Hive Support
JDBC Support – Examples
HBase Support – Examples

Spark Streaming

Spark Streaming Overview
Example: Streaming Word Count
Other Streaming Operations
Sliding Window Operations
Developing Spark Streaming Applications – Integration with Kafka and H base

Kafka

Kafka Ecosystem

Overview
Producer
Consumer
Broker
Topics
Partitions

Kafka Twitter Data Setup

Writing Producer in Scala
Writing Consumer in Scala & Java

Kafka Integration with Spark Streaming

Real use case – Integration of Kafka with Spark Streaming for processing Streaming Log files and Storing results into H base

Data Science Training Description

What will you learn as a part of this course?

DevOps improves collaboration and productivity by automating infrastructure and workflows and continuously measuring applications performance. In this course you will learn about Version Controlling, Code Automation, Continuous Integration, Continuous Deployment, Configuration Management, and Monitoring of application.

Who should go for this training?

The following professionals can go for this course:

Software Tester
System Admin
Solution Architect
Security Engineer
Application Developers
Integration Specialist

What are the skills that you will be learning with our DevOps course?

After completing this Devops Certification Training, you should be able to:

Manage and keep a track of different versions of the source code using Git
Build and Automate Test using Jenkins and Maven
Automate testing of web elements using Selenium suite of tools
Build and Deploy containerization using Docker
Learn different roles and Command Line usage of Ansible
Manage clustering and scaling with Kubernetes
Perform Continuous Monitoring using Nagios
Gain experience of working on an industry standard live Project

What are the pre-requisites for this Course?

Required Pre-requisites:

Any Scripting Language Knowledge
Linux Fundamentals

To help you brush up these skills, you will get the following self-paced courses absolutely free:

Python Scripting
Linux Fundamentals

Data Science Training Features

Real-Time Training Experts

Experienced faculties took the classes and one to one attention, Novice to professional handing capable faculties.

Real-life Case Studies

Live project based on any of the selected use cases, involving implementation of the various DevOps concepts.

Assignments

Each class will be followed by practical assignments. Both online and offline with detailed solutions after the exam.

Lifetime Access

You get lifetime access to all SMEC Technologies branches to refresh the course and get the new updates from Instructors.

24 x 7 Expert Support

We have 24×7 online support team to resolve all your technical queries, through ticket based tracking system or above the phone.

Certification

Successfully complete your final course project and SMEC Technologies will certify you as a DATA SCIENCE TRAINING.

Be future ready. Start learning – Data Science Training? Enroll Now or Get the free career path

80% of Global Fortune 500 organizations are expected to adopt DevOps by 2019 – CA 80% of Global Fortune 500 organizations are expected to adopt DevOps by 2019 – CA

Data Science Training