Course Detail

DATA SCIENCE Course

DATA SCIENCE Course - Sagveek Technologies


Course Detail


Course Description

DATA SCIENCE

Module 1: Data Analysis using Numpy and Pandas

  1. Numpy
  • Numpy Vector and Matrix
  • Functions – arange (), zeros (), ones (), linspace (), eye (),
  • reshape (), random (), max (), min (),
  • argmax (), argmin (), shape and dtype attribute
  • Indexing and Selection
  • Numpy Operations – Array with Array, Array with Scalars,
  • Universal Array Functions
  1. Pandas
  • Pandas Series
  • Pandas Data-Frame
  • Missing Data (Imputation)
  • Group by Operations
  • Merging, Joining and Concatenating Data-Frame.
  • Pandas Operations
  • Data Input and Output from wide variety of formats like csv, excel, db and html etc.

Module 2: Data Visualization using Matplotlib, Seaborn, Pandas-in built

  1. Matplotlib
  • plot() using Functional approach
  • multi-plot using subplot()
  • figure() using OO API Methods
  • add_axes(), set_xlabel(), set_ylabel(), set_title() Methods
  • Customization – figure size, impoving dpi, Plot ap-pearance,
  • Markers, Control over axis appearance and special Plot Types
  1. Seaborn
  • Distribution Plots using distplot(), jointplot(), pair-plot(), rugplot(), kdeplot()
  • Categorical Plots using barplot(), countplot(), box-plot(), violinplot(), stripplot(), swarmplot(), fac-torplot()
  • Matrix Plots using heatmap(), clustermap()
  • Grid Plots using PairGrid(), FacetGrid()
  • Regression Plots using lmplot()
  • Styles and Colors customization
  1. Pandas Built-in
  • Histogram, Area Plot, Bar Plot, Scatter Plot, Box-plot, Hex-plot, Kde-plot, Density Plot e. Choro-pleth Maps
  • Interactive World Map and US Map using Plotly and Cufflinks Module

Module 3: Jupyter Notebook

  • Introduction
  • Basic Commands
  • Keyboard Shortcut and Magic Functions

Module 4: GIT

  • Distribution Version Control System
  • How internally, GIT Manages Version Control on Changesets.
  • Creating Repository
  • Basic Commands like, git status, git add, git re-move, git branch, git
  • checkout, git log, git cat-file, git pull, git push, git commit
  • Managing Configuration – System Level, User Lev-el, Repository level

Module 5: Machine Learning Introduction

  • What is Machine Learning?
  • Machine Learning Process Flow-Diagram
  • Different Categories of Machine Leaning –
    • Supervised
    • Unsupervised
    • Reinforcement Learning
    • Scikit-Learn Overview
    • Scikit-Learn cheat-sheet

Module 6: Regression

 Linear Regression

Robust Regression (RANSAC Algorithm)

Exploratory Data Analysis (EDA)

Correlation Analysis and Feature Selection

Performance Evaluation –

Residual Analysis,

Mean Square Error (MSE)

Co-efficient of Determination R^2

Mean Absolute Error (MAE)

Root Mean Square Error (RMSE)

Polynomial Regression

Regularized Regression

Ridge, Lasso

Elastic Net Regression

Bias-Variance Trade-Off

Cross Validation

Hold Out

K-Fold Cross Validation

Data Pre-Processing –

Standardization

Min-Max

Normalization and Binarization

 Module 7: Classification – Logistic Regression

 Sigmoid function

Logistic Regression learning using Stochastic Gradient Descent (SGD)

SGD Classifier

Measuring accuracy using Cross-Validation, Stratified k-fold

Confusion Matrix – True Positive (TP), False Positive (FP), False Negative (FN), True Negative (TN)

Precision, Recall, F1 Score, Precision/Recall Trade- Off, Receiver Operating Characteristics (ROC) Curve.

Gradient Descent

Module 8: Classification –Decision Trees

  • CART (Classification and Regression Tree)
  • Advantages and Disadvantages and its applications.
  • Decision Tree Learning algorithms – ID3, C4.5, C5.0 and CART.
  • Gini Impurity, Entropy and Information Gain
  • Decision Tree Regression
  • Visualizing a Decision Tree using graphviz module.
  • Regularization using tuning hyper-parameters using GridSearch CV

 Module 9: Classification – Ensemble Methods

  • Bootstrap Aggregating or Bagging
  • Random Forest algorithm
  • Extremely Randomized (Extra-Trees) Ensemble
  • Boosting – AdaBoost (Adaptive Boosting), Gradient Boosting
  • Machine (GBM), XGBoost (Extreme Gradient Boosting)

Module 10: Unsupervised Learning – Clustering

  • Connectivity- based Clustering using Hierarchical Clustering.
  • Ward’s Agglomerative Hierarchical Clustering
  • K-Means Clustering
  • Elbow Method and Solhouette Analysis

Module 11: Unsupervised Learning – Dimensionality Reduction

  • Linear Principal Component Analysis (PCA) reduction.
  • Kernel PCA
  • Linear Discriminant Analysis (LDA) on Supervised Data

Module 12: Advance NLP with deep-learning overview

  • Computational Linguistic.
  • History of NLP.
  • Why NLP.
  • Use of NLP
  • NLP Components
  • NLP Ambiguity
  • Lexical Ambiguity
  • Syntactic Ambiguity
  • Referential Ambiguity
  • Natural Language Understanding
  • Natural Language Generation
  • Text Planning
  • Sentence Planning
  • Text Realization

Module 13: Feature Engineering

  • Filter Methods
  • Wrapper Methods
  • Embedded Methods

Module 14: Neural Network

  • A Simple Perception
  • Neural Network overview and its use case.
  • Various Neural Network architect overview.
  • Multilayer Network
  • Loss Functions.
  • The Learning Mechanism.
  •  
  • Forward and Backward Propagation.
  • Gradient Descent
  • DL algorithms- ANN, CNN, RNN

Institute Overview

Pune, Maharashtra, India

About us About Sagveek Technologies: Sagveek Technology is India’s one among the leading resolution suppliers in info Technology development, Training, Staffing, and merchandise & Tools Consulting services to each Retail ( i.e. stud... Read More

Related Courses

Google Map