Data Science Training
Data Science Training Course
The DevOps Certification Training Program will provide you with in-depth knowledge of various DevOps tools including Git, Jenkins, Docker, Ansible, Puppet, Kubernetes and Nagios. This training is completely hands-on and designed in a way to help you become a certified practitioner through best practices in Continuous Development, Continuous Testing, Configuration Management and Continuous Integration, and finally, Continuous Monitoring of software throughout its development life cycle.
Why should you take Data Science Training
80% of Global Fortune 500 organizations are expected to adopt DevOps by 2019 – CA
ADP, BBC News, ebay, GE, ING, Intuit, PayPal, Splunk, Uber & other MNCs worldwide use DevOps
Average salary given to a DevOps Engineer is around $123,354 per annum – smecjobs.com
Quick Contact
Like the Course Data Science Training ? Enroll Now or Get the free career path
80% of Global Fortune 500 organizations are expected to adopt DevOps by 2019 – CA 80% of Global Fortune 500 organizations are expected to adopt DevOps by 2019 – CA
Data Science Training Course Curriculum
- Introduction To Data Science
- Life Cycle of Data Science
- Skills required for Data Science
- Careers Path in Data Science
- Applications of Data Science
- Relationship between Statistics and Data Science
- Introduction to Data:
• Data types
• Data Collection Techniques - Descriptive Statistics:
• Measures of Central Tendency
• Measures of Dispersion
• Measures of Skewness and Kurtosis
• Visualization
- Inferential Statistics:
• Sampling variability and Central Limit Theorem
• Confidence Interval for Mean
• Hypothesis ,t- Test,F-Test,Chi-square Test
• ANOVA - Random Sampling and Probability Distribution:
• Probability and Limitations,Discrete Probability,Continuous Probability
• Binomial, Poisson Distributions,Normal Distribution
Python programming:
- Environment Setup
- Jupyter Notebook Overview
- Data types:Numbers,Strings,Printing,Lists,Dictionaries,Booleans,Tuples ,Sets
- Comparison Operators
- if,elif, else Statements
- Loops:for Loops,while Loops
- range()
- list comprehension
- functions
- lambda expressions
- map and filter
- methods
- Programming Exercises
- Object Oriented Programming
- Modules and packages
- Errors and Exception Handling
- Python Decorators
- Python generators
- Collections
- Regular Expression
Python for Exploratory Data Analysis:
• NumPy :
• Installing numpy
• Using numpy
• NumPy arrays
• Creating numpy arrays from python list
• Creating arrays using built in methods(arrange(),zeros(),ones(),linspace(),eye(),rand(),etc.
• Array attributes :shape, type
• Array methods: Reshape(),min(),max(),argmax(),argmin(),etc.
• Pandas:
• Introduction to Pandas
• Series
• DataFrames
• Missing Data
• GroupBy
• Merging, Joining and Concatenating
• Operations
• Data Input and Output
Python for Data Visualization:
- Matplotlib:
• Installing Matplotlib,Basic Matplotlib commands
• Creating Multiplot on same canvas
• Object Oriented Method:figure(),plot(),add_axes(),subplots(),etc.
• Matplotlib Exercise - Seaborn:
• Categorical plot
• Distribution plot
• Regression plot
• Seaborn Exercise
• Pandas built in visualization:
• Scatter plot
• Histograms
• Box plot
CAPSTONE PROJECT FOR DATA ANALYSIS
- Introduction to RDBMS
- Retrieving
- Updating
- Inserting
- Deleting
- Sorting AND Filtering
- Summarizing AND Grouping
- Using Sub queries
- Joining Tables
- Views
- Stored Procedure
- Python Database Connection API
Introduction To Machine Learning:
- Relationship between Data Science and Machine Learning
- Supervised Learning
- Unsupervised Learning
Supervised Learning (Regression AND Classification Algorithms):
- Linear Regression
- Ridge Regression
- Lasso Regression
- Polynomial Regression
- Support vector regression
- Decision Tree Regression
- Random Forest Regression
- Logistic Regression
- Support Vector Machines
- Kernel SVM
- Decision Trees and Random Forest
- Ensemble Of Decision Trees
- Model Evaluation and Improvement
- CAPSTONE PROJECT for supervised learning
Unsupervised Learning:
- Challenges in Unsupervised Learning
- Preprocessing AND Scaling
- Dimensionality Reduction, Feature Extraction
- Principle Component Analysis (PCA)
- Clustering
- K MEANS
- Model evaluation and improvement
- Cross validation, Grid search, Evaluation metrics and scoring
- Working with text data
- CAPSTONE PROJECT FOR Unsupervised Learning ALGORITHM
- Corpus
- Text preprocessing using Bag of words technique
- TF(Term Frequency)
- IDF(Inverse Document Frequency)
- Normalization
- Vectorization
- NLP with Python
- Introduction to Deep Learning:Deep Learning Applications, Artificial Neural Network, TensorFlow Demo, Deep Learning Frameworks
- Up and Running with TensorFlow:Installation, Creating Your First Graph and Running It in a Session, Managing Graphs, Lifecycle of a Node Value, Linear Regression with Tensor Flow, Implementing Gradient Descent, Feeding Data to the Training Algorithm, Saving and Restoring Models, Visualizing
the Graph and Training Curves Using Tensor Board, Name Scopes, Modularity, Sharing Variables - Introduction to Artificial Neural Networks:From Biological to Artificial Neurons, Training an MLP with TensorFlow’s High-Level API, Training a DNN Using Plain TensorFlow, Fine-Tuning Neural Network Hyper parameters
- Training Deep Neural Nets:Vanishing / Exploding Gradients Problems, Reusing Pretrained Layers, Faster Optimizers, Avoiding Overfitting Through Regularization, Practical Guidelines
- Convolutional Neural Networks:The Architecture of the Visual Cortex, Convolutional Layer, Pooling Layer, CNN Architectures
- Recurrent Neural Networks:Recurrent Neurons, Basic RNNs in TensorFlow, Training RNNs, Deep RNNs, LSTM Cell, GRU Cell, Natural Language Processing
- Auto encoders: Efficient Data Representations, Performing PCA with an Undercomplete Linear Autoencoder, Stacked Autoencoders, Unsupervised Pretraining Using Stacked Auto encoders, Denoising Autoencoders, Sparse Auto encoders, Variational Autoencoders
- Reinforcement Learning:Learning to Optimize Rewards, Policy Search, Introduction to OpenAI Gym,Neural Network Policies, Evaluating Actions: The Credit Assignment Problem, Policy Gradients,Markov Decision Processes, Temporal Difference Learning and Q-Learning, Learning to Play Ms. Pac-Man Using Deep Q-Learning
- Quizzes, gamified assessments & Capstone project
Linux (Ubuntu/Cent Os) – Tips and Tricks
Basic(core) Java Programming Concepts – OOPS
Introduction to Big Data and Hadoop
- What is Big Data?
- What are the challenges for processing big data?
- What is Hadoop?
- Why Hadoop?
- History of Hadoop
- Hadoop ecosystem
- HDFS
- MapReduce
Understanding the Cluster
- Hadoop 2.x Architecture
- Typical workflow
- HDFS Commands
- Writing files to HDFS
- Reading files from HDFS
- Rack awareness
- Hadoop daemons
- Before MapReduce
- MapReduce overview
- Word count problem
- Word count flow and solution
- MapReduce flow
Developing the MapReduce Application
- Data Types
- File Formats
- Explain the Driver, Mapper and Reducer code
- Configuring development environment – Eclipse
- Writing unit test
- Running locally
- Running on cluster
- Hands on exercises
How MapReduce Works
- Anatomy of MapReduce job run
- Job submission
- Job initialization
- Task assignment
- Job completion
- Job scheduling
- Job failures
- Shuffle and sort
- Hands on exercises
MapReduce Types and Formats
- File Formats – Sequence Files
- Compression Techniques
- Input Formats – Input splits & records, text input, binary input
- Output Formats – text output, binary output, lazy output
- Hands on exercises
Counters
- Side data distribution
- MapReduce combiner
- MapReduce partitioner
- MapReduce distributed cache
- Hands exercises
Hive
- Hive Architecture
- Types of Metastore
- Hive Data Types
- HiveQL
- File Formats – Parquet, ORC, Sequence and Avro Files Comparison
- Partitioning & Bucketing
- Hive JDBC Client
- Hive UDFs
- Hive Serdes
- Hive on Tez
- Hands-on exercises
- Integration with Tableau
Pig
- Pig Architecture
- Pig Data Types
- Load/Store Functions
- PigLatin
- Pig Udfs
Hbase
- HBase architecture and concepts
- Hbase Data Model
- Hbase Shell Interface
- Hbase Java API
Sqoop
- Sqoop Architecture
- Sqoop Import Command Arguments, Incremental Import
- Sqoop Export
- Sqoop Jobs
- Hands-on exercises
Flume
- Flume Agent Setup
- Types of sources, channels, sinks Multi Agent Flow
- Hands-on exercises
Oozie
- Oozie workflow creations
- Oozie Job submission, monitoring, debugging
- Concepts on Coordinators and Bundles
- Hands-on exercises
Case Studies Discussions
Any one of the Four Projects
- Log File Analysis covering Flume, HDFS, MR/Pig, Hive, Tableau
- Crime Data Analysis Covering Oozie, Sqoop, HDFS, Hive, Hbase, RestFul Client.
- Hadoop Use Cases in Insurance Domain
- Hadoop Use Cases in Retail Domain
Understand the difference between Apache Spark and Hadoop
Learn Scala and its programming implementation
- Why Scala
- Scala Installation
- Get deep insights into the functioning of Scala
- Execute Pattern Matching in Scala
- Functional Programming in Scala – Closures, Currying, Expressions, Anonymous Functions
- Know the concepts of classes in Scala
- Object Orientation in Scala – Primary, Auxiliary Constructors, Singleton & Companion Objects
- Traits and Abstract classes in Scala
- Scala Simple Build Tool – SBT
- Building with Maven
- What is Apache Spark?
- Spark Installation
- Spark Configuration
- Spark Context
- Using Spark Shell
- Resilient Distributed Datasets (RDDs) – Features, Partitions, Tuning Parallelism
- Functional Programming with Spark
Working with RDDs
- RDD Operations – Transformations and Actions
- Types of RDDs
- Key-Value Pair RDDs – Transformations and Actions
- MapReduce and Pair RDD Operations
- Serialization
Spark on a cluster
- Overview
- A Spark Standalone Cluster
- The Spark Standalone Web UI
- Executors & Cluster Manager
- Spark on YARN Framework
Writing Spark Applications
- Spark Applications vs. Spark Shell
- Creating the SparkContext
- Configuring Spark Properties
- Building and Running a Spark Application
- Logging
- Spark Job Anatomy
Caching and Persistence
- RDD Lineage
- Caching Overview
- Distributed Persistence
Improving Spark Performance
- Shared Variables: Broadcast Variables
- Shared Variables: Accumulators
- Per Partition Processing
- Common Performance Issues
Spark API for different File Formats & Compression Codecs
- Text
- CSV
- Sequence
- Parquet
- ORC
- Compression Techniques – Snappy, Zlib, Gzip
- Spark SQL Overview
- HiveContext
- SQL Datatypes
- Dataframes vs RDDs
- Operations on DFs
- Parquet Files with Spark Sql – Read, Write, Partitioning, Merging Schema
- ORC Files
- JSON Files
- Inferring Schema programmatically
- Custom Case Classes
- Temp Tables vs Persistent Tables
- Writing UDFs
- Hive Support
- JDBC Support – Examples
- HBase Support – Examples
Spark Streaming
- Spark Streaming Overview
- Example: Streaming Word Count
- Other Streaming Operations
- Sliding Window Operations
- Developing Spark Streaming Applications – Integration with Kafka and H base
Kafka Ecosystem
- Overview
- Producer
- Consumer
- Broker
- Topics
- Partitions
Kafka Twitter Data Setup
- Writing Producer in Scala
- Writing Consumer in Scala & Java
Kafka Integration with Spark Streaming
- Real use case – Integration of Kafka with Spark Streaming for processing Streaming Log files and Storing results into H base
Data Science Training Description
DevOps improves collaboration and productivity by automating infrastructure and workflows and continuously measuring applications performance. In this course you will learn about Version Controlling, Code Automation, Continuous Integration, Continuous Deployment, Configuration Management, and Monitoring of application.
- Software Tester
- System Admin
- Solution Architect
- Security Engineer
- Application Developers
- Integration Specialist
After completing this Devops Certification Training, you should be able to:
- Manage and keep a track of different versions of the source code using Git
- Build and Automate Test using Jenkins and Maven
- Automate testing of web elements using Selenium suite of tools
- Build and Deploy containerization using Docker
- Learn different roles and Command Line usage of Ansible
- Manage clustering and scaling with Kubernetes
- Perform Continuous Monitoring using Nagios
- Gain experience of working on an industry standard live Project
- Any Scripting Language Knowledge
- Linux Fundamentals
- Python Scripting
- Linux Fundamentals
Data Science Training Features
Real-Time Training Experts
Experienced faculties took the classes and one to one attention, Novice to professional handing capable faculties.
Real-life Case Studies
Live project based on any of the selected use cases, involving implementation of the various DevOps concepts.
Assignments
Each class will be followed by practical assignments. Both online and offline with detailed solutions after the exam.
Lifetime Access
You get lifetime access to all SMEC Technologies branches to refresh the course and get the new updates from Instructors.
24 x 7 Expert Support
We have 24×7 online support team to resolve all your technical queries, through ticket based tracking system or above the phone.
Certification
Successfully complete your final course project and SMEC Technologies will certify you as a DATA SCIENCE TRAINING.
Be future ready. Start learning – Data Science Training? Enroll Now or Get the free career path
80% of Global Fortune 500 organizations are expected to adopt DevOps by 2019 – CA 80% of Global Fortune 500 organizations are expected to adopt DevOps by 2019 – CA