Module 1: Data Analysis using Numpy and Pandas
- Numpy
- Numpy Vector and Matrix
- Functions – arange (), zeros (), ones (), linspace (), eye (),
- reshape (), random (), max (), min (),
- argmax (), argmin (), shape and dtype attribute
- Indexing and Selection
- Numpy Operations – Array with Array, Array with Scalars,
- Universal Array Functions
- Pandas
- Pandas Series
- Pandas Data-Frame
- Missing Data (Imputation)
- Group by Operations
- Merging, Joining and Concatenating Data-Frame.
- Pandas Operations
- Data Input and Output from wide variety of formats like csv, excel, db and html etc.
Module 2: Data Visualization using Matplotlib, Seaborn, Pandas-in built
- Matplotlib
- plot() using Functional approach
- multi-plot using subplot()
- figure() using OO API Methods
- add_axes(), set_xlabel(), set_ylabel(), set_title() Methods
- Customization – figure size, impoving dpi, Plot ap-pearance,
- Markers, Control over axis appearance and special Plot Types
- Seaborn
- Distribution Plots using distplot(), jointplot(), pair-plot(), rugplot(), kdeplot()
- Categorical Plots using barplot(), countplot(), box-plot(), violinplot(), stripplot(), swarmplot(), fac-torplot()
- Matrix Plots using heatmap(), clustermap()
- Grid Plots using PairGrid(), FacetGrid()
- Regression Plots using lmplot()
- Styles and Colors customization
- Pandas Built-in
- Histogram, Area Plot, Bar Plot, Scatter Plot, Box-plot, Hex-plot, Kde-plot, Density Plot e. Choro-pleth Maps
- Interactive World Map and US Map using Plotly and Cufflinks Module
Module 3: Jupyter Notebook
- Introduction
- Basic Commands
- Keyboard Shortcut and Magic Functions
Module 4: GIT
- Distribution Version Control System
- How internally, GIT Manages Version Control on Changesets.
- Creating Repository
- Basic Commands like, git status, git add, git re-move, git branch, git
- checkout, git log, git cat-file, git pull, git push, git commit
- Managing Configuration – System Level, User Lev-el, Repository level
Module 5: Machine Learning Introduction
- What is Machine Learning?
- Machine Learning Process Flow-Diagram
- Different Categories of Machine Leaning –
- Supervised
- Unsupervised
- Reinforcement Learning
- Scikit-Learn Overview
- Scikit-Learn cheat-sheet
Module 6: Regression
Linear Regression
Robust Regression (RANSAC Algorithm)
Exploratory Data Analysis (EDA)
Correlation Analysis and Feature Selection
Performance Evaluation –
Residual Analysis,
Mean Square Error (MSE)
Co-efficient of Determination R^2
Mean Absolute Error (MAE)
Root Mean Square Error (RMSE)
Polynomial Regression
Regularized Regression
Ridge, Lasso
Elastic Net Regression
Bias-Variance Trade-Off
Cross Validation
Hold Out
K-Fold Cross Validation
Data Pre-Processing –
Standardization
Min-Max
Normalization and Binarization
Module 7: Classification – Logistic Regression
Sigmoid function
Logistic Regression learning using Stochastic Gradient Descent (SGD)
SGD Classifier
Measuring accuracy using Cross-Validation, Stratified k-fold
Confusion Matrix – True Positive (TP), False Positive (FP), False Negative (FN), True Negative (TN)
Precision, Recall, F1 Score, Precision/Recall Trade- Off, Receiver Operating Characteristics (ROC) Curve.
Gradient Descent
Module 8: Classification –Decision Trees
- CART (Classification and Regression Tree)
- Advantages and Disadvantages and its applications.
- Decision Tree Learning algorithms – ID3, C4.5, C5.0 and CART.
- Gini Impurity, Entropy and Information Gain
- Decision Tree Regression
- Visualizing a Decision Tree using graphviz module.
- Regularization using tuning hyper-parameters using GridSearch CV
Module 9: Classification – Ensemble Methods
- Bootstrap Aggregating or Bagging
- Random Forest algorithm
- Extremely Randomized (Extra-Trees) Ensemble
- Boosting – AdaBoost (Adaptive Boosting), Gradient Boosting
- Machine (GBM), XGBoost (Extreme Gradient Boosting)
Module 10: Unsupervised Learning – Clustering
- Connectivity- based Clustering using Hierarchical Clustering.
- Ward’s Agglomerative Hierarchical Clustering
- K-Means Clustering
- Elbow Method and Solhouette Analysis
Module 11: Unsupervised Learning – Dimensionality Reduction
- Linear Principal Component Analysis (PCA) reduction.
- Kernel PCA
- Linear Discriminant Analysis (LDA) on Supervised Data
Module 12: Advance NLP with deep-learning overview
- Computational Linguistic.
- History of NLP.
- Why NLP.
- Use of NLP
- NLP Components
- NLP Ambiguity
- Lexical Ambiguity
- Syntactic Ambiguity
- Referential Ambiguity
- Natural Language Understanding
- Natural Language Generation
- Text Planning
- Sentence Planning
- Text Realization
Module 13: Feature Engineering
- Filter Methods
- Wrapper Methods
- Embedded Methods
Module 14: Neural Network
- A Simple Perception
- Neural Network overview and its use case.
- Various Neural Network architect overview.
- Multilayer Network
- Loss Functions.
- The Learning Mechanism.
- Forward and Backward Propagation.
- Gradient Descent
- DL algorithms- ANN, CNN, RNN