DATA SCIENCE Course in Pune -Sagveek Technologies -Trainwick

Module 1: Data Analysis using Numpy and Pandas

Numpy

Numpy Vector and Matrix
Functions – arange (), zeros (), ones (), linspace (), eye (),
reshape (), random (), max (), min (),
argmax (), argmin (), shape and dtype attribute
Indexing and Selection
Numpy Operations – Array with Array, Array with Scalars,
Universal Array Functions

Pandas

Pandas Series
Pandas Data-Frame
Missing Data (Imputation)
Group by Operations
Merging, Joining and Concatenating Data-Frame.
Pandas Operations
Data Input and Output from wide variety of formats like csv, excel, db and html etc.

Module 2: Data Visualization using Matplotlib, Seaborn, Pandas-in built

Matplotlib

plot() using Functional approach
multi-plot using subplot()
figure() using OO API Methods
add_axes(), set_xlabel(), set_ylabel(), set_title() Methods
Customization – figure size, impoving dpi, Plot ap-pearance,
Markers, Control over axis appearance and special Plot Types

Seaborn

Distribution Plots using distplot(), jointplot(), pair-plot(), rugplot(), kdeplot()
Categorical Plots using barplot(), countplot(), box-plot(), violinplot(), stripplot(), swarmplot(), fac-torplot()
Matrix Plots using heatmap(), clustermap()
Grid Plots using PairGrid(), FacetGrid()
Regression Plots using lmplot()
Styles and Colors customization

Pandas Built-in

Histogram, Area Plot, Bar Plot, Scatter Plot, Box-plot, Hex-plot, Kde-plot, Density Plot e. Choro-pleth Maps
Interactive World Map and US Map using Plotly and Cufflinks Module

Module 3: Jupyter Notebook

Introduction
Basic Commands
Keyboard Shortcut and Magic Functions

Module 4: GIT

Distribution Version Control System
How internally, GIT Manages Version Control on Changesets.
Creating Repository
Basic Commands like, git status, git add, git re-move, git branch, git
checkout, git log, git cat-file, git pull, git push, git commit
Managing Configuration – System Level, User Lev-el, Repository level

Module 5: Machine Learning Introduction

What is Machine Learning?
Machine Learning Process Flow-Diagram
Different Categories of Machine Leaning –
- Supervised
- Unsupervised
- Reinforcement Learning
- Scikit-Learn Overview
- Scikit-Learn cheat-sheet

Module 6: Regression

Linear Regression

Robust Regression (RANSAC Algorithm)

Exploratory Data Analysis (EDA)

Correlation Analysis and Feature Selection

Performance Evaluation –

Residual Analysis,

Mean Square Error (MSE)

Co-efficient of Determination R^2

Mean Absolute Error (MAE)

Root Mean Square Error (RMSE)

Polynomial Regression

Regularized Regression

Ridge, Lasso

Elastic Net Regression

Bias-Variance Trade-Off

Cross Validation

Hold Out

K-Fold Cross Validation

Data Pre-Processing –

Standardization

Min-Max

Normalization and Binarization

Module 7: Classification – Logistic Regression

Sigmoid function

Logistic Regression learning using Stochastic Gradient Descent (SGD)

SGD Classifier

Measuring accuracy using Cross-Validation, Stratified k-fold

Confusion Matrix – True Positive (TP), False Positive (FP), False Negative (FN), True Negative (TN)

Precision, Recall, F1 Score, Precision/Recall Trade- Off, Receiver Operating Characteristics (ROC) Curve.

Gradient Descent

Module 8: Classification –Decision Trees

CART (Classification and Regression Tree)
Advantages and Disadvantages and its applications.
Decision Tree Learning algorithms – ID3, C4.5, C5.0 and CART.
Gini Impurity, Entropy and Information Gain
Decision Tree Regression
Visualizing a Decision Tree using graphviz module.
Regularization using tuning hyper-parameters using GridSearch CV

Module 9: Classification – Ensemble Methods

Bootstrap Aggregating or Bagging
Random Forest algorithm
Extremely Randomized (Extra-Trees) Ensemble
Boosting – AdaBoost (Adaptive Boosting), Gradient Boosting
Machine (GBM), XGBoost (Extreme Gradient Boosting)

Module 10: Unsupervised Learning – Clustering

Connectivity- based Clustering using Hierarchical Clustering.
Ward’s Agglomerative Hierarchical Clustering
K-Means Clustering
Elbow Method and Solhouette Analysis

Module 11: Unsupervised Learning – Dimensionality Reduction

Linear Principal Component Analysis (PCA) reduction.
Kernel PCA
Linear Discriminant Analysis (LDA) on Supervised Data

Module 12: Advance NLP with deep-learning overview

Computational Linguistic.
History of NLP.
Why NLP.
Use of NLP
NLP Components
NLP Ambiguity
Lexical Ambiguity
Syntactic Ambiguity
Referential Ambiguity
Natural Language Understanding
Natural Language Generation
Text Planning
Sentence Planning
Text Realization

Module 13: Feature Engineering

Filter Methods
Wrapper Methods
Embedded Methods

Module 14: Neural Network

A Simple Perception
Neural Network overview and its use case.
Various Neural Network architect overview.
Multilayer Network
Loss Functions.
The Learning Mechanism.
Forward and Backward Propagation.
Gradient Descent
DL algorithms- ANN, CNN, RNN

Course Detail

DATA SCIENCE Course - Sagveek Technologies

Course Detail

Course Description

DATA SCIENCE

Module 6: Regression

Linear Regression

Robust Regression (RANSAC Algorithm)

Exploratory Data Analysis (EDA)

Correlation Analysis and Feature Selection

Performance Evaluation –

Institute Overview

Related Courses

Google Map

Quick Links

Courses By City

Courses By Categories

Contact Us