Learn Data Science Online At Your Own Pace. Start Today and Become an Expert in Days

MlMinds course enables you to master the concepts of Data Science.

Skills Covered in Data Science Course

  • Programming Skills (Python + R for Data Science)
  • Probability and Statistics
  • Visualization techniques
  • Machine Learning
  • Data Mining
  • Text Mining and Analytics
  • Web Mining
  • Data Collection
  • Deep Learning

Jobs Facts on ML or Data Science

“AI is the new electricity!” Electricity transformed countless industries; AI will now do the same. – Andrew Ng

Jobs on AI is not limited to one industry. Most of the industries have been extensively using it.

  1. Banking and Financial Industries
  2. Healthcare
  3. E-commerce
  4. Gaming
  5. Manufacturing, and many more.

Advantages of Data Science

  • It’s In Demand
  • Lot of job Opportunities
  • Highly Paid Career
  • Skilled / Versatile

Basics of Business Analytics

Business analytics is the practice of iterative, methodical exploration of an Organization’s data, with an emphasis on statistical analysis. Business analytics is used by companies committed to data-driven decision-making. It is about using your data to derive information, insights, knowledge, and recommendations. Businesses use business analytics to improve the effectiveness and efficiency of their solutions.

In this module, I will talk about how analytics has progressed from simple descriptive analytics to being predictive and prescriptive. I will also talk about multiple examples to understand this better and discuss various industry use cases. I will also introduce multiple components of big data analysis including data mining, machine learning, web mining, natural language processing, social network analysis, and Visualization in this module. Lastly, I will provide some tips for learners of data science to succeed in learning and applying data science successfully for their projects.

Types of analytics
  • Descriptive analytics, predictive analytics, prescriptive analysis
Introduction to components of big data analytics
  • Brief Introduction about Components of Big Data Analysis
  • Introduction to Hadoop and Big Data
  • Infrastructure
  • Introduction to Data Mining
  • Introduction to Machine Learning
  • Introduction to Nature Language Processing
  • Introduction to Information Retrieval
  • Introduction to Web Mining
  • Introduction to Social Network Analytics
  • Introduction to IOT
  • Introduction to Visualization
Practical Data Science

1.Lots of open positions available in the industry
2.High paying jobs
3.DataScience makes the product better.
4.Reduces of lot of human work and can create impact on the society

Python for data science

Python is an interpreted high-level programming language for general-purpose programming. Created by Guido van Rossum and first released in 1991, Python has a design philosophy that emphasizes code readability, notably using significant whitespace. Python is open source, has awesome community support, is easy to learn, good for quick scripting as well as coding for actual deployments, good for web coding too.

In this module, I will start with basics of the Python language. We will do both theory as well as hands-on exercises intermixed. I will use Jupyter notebooks while doing hands-on. I will also discuss in detail topics like control flow, input output, data structures, functions, regular expressions and object orientation in Python. Closer to data science, I will discuss about popular Python libraries like NumPy, Pandas, SciPy, Matplotlib, Scikit-Learn and NLTK.

Basics of the Python language
  • Why Python
  • Python Installation
  • Python 2.7 Vs 3.x
  • Introduction to Essential Python Libraries
  • Introduction to iPython and Jupyter Notebooks
  • Python Language Basics- Indentation, Comments, Function Calls, Variables and Argument Passing
  • Python Language Basics-Types, Duck-Typing, Import
  • Python Language Basics-Binary operators, Comparisons, Mutable
  • Python Language Basics-Standard Data types in Python
  • Python Language Basics-Command Line Arguments
Python Language Basics-Control Flow
  • Loops: for, while
  • Conditional Execution
Input/Output in Python
  • Input, output, Eval, Print
  • repr, str, zfill
  • File IO
  • JSON I/O with Python Dictionary
  • JSON I/O with Generic objects
  • JSON I/O Serialization and Deserialization
  • JSON I/O File
  • Introduction to Pickle
  • cPickle
  • Pickle and Multi-Processing
Python Data Structures and Sequences
  • Tuples
  • List
  • Sorting, Searching, Slicing
  • Built-In Functions-Enumerate, Sort, Zip, Reversed
  • Dictionary
  • Sets
  • Lists, Sets and Dict Comprehensions
  • Introduction to Functions and Variable Length Argument
  • Namespace, Scope, Local Funtions, Local vs Global Variables
  • Returning multiple vales, Pass by Reference
  • Functions are objects
  • Recursive functions, Anonymous(Lambda) Functions
  • Currying, Generators
  • Itertools Module
  • Errors and Exception Handling
  • Introduction to Functions and Variable Length Argument
  • Namespace, Scope, Local Funtions, Local vs Global Variables
  • Returning multiple vales, Pass by Reference
  • Functions are objects
  • Recursive functions, Anonymous(Lambda) Functions
  • Currying, Generators
  • Itertools Module
  • Errors and Exception Handling
Object Orientation in Python
  • Python Modules and Packages
  • object oriented Nature of Python
  • Class Inheritance, overriding, overloading, Data Hiding
Regular expressions in Python
  • Searching for patterns, matching groups
  • Regular expression flags
  • split, findall, finditer
  • Repetition syntax
  • Character sets, Exclusion, Character Ranges, Escape Codes
  • Substitution
  • Greedy vs non-greedy matching
  • Backreferences and anchors
  • Capturing parts of pattern match
  • split and zero-width assertions
  • Look-arounds
  • Introduction to Numpy and ndarrays
  • Datatypes of ndarrays
  • Arithmetic operations, Indexing, Slicing
  • Boolean and fancy indexing
  • Basic ndarray operations
  • Array-oriented programming with arrays
  • Conditional, Statistical and Boolean operation
  • Sorting and set operation
  • File IO with NumPy
  • Linear Algebra for Numpy
  • Reshaping, Concatenating and Splitting Arrays
  • Broadcasting
  • Series Data Structures
  • DataFrame
  • Index objects
  • Reindexing
  • Dropping entries from an axis
  • Indexing, Selection and Filtering
  • Arithmetic and Data Alignment
  • Operations between DataFrame and Series
  • Function Application and Mapping
  • Sorting and Ranking
  • Axis indexes with duplicate labels
  • Computing Descriptive Statistics
  • pct_change(), Correlation and Covariance, Unique values, Value counts and membership
  • Introduction to Matpotlib
  • Colours, Markers and line styles
  • Customization of Matplotlib
  • Plotting with Pandas
  • Barplots, Histograms plots, Density Plots
  • Introduction to Seaborn, Style Management
  • Controlling figure aesthetics
  • Colour Palettes
  • Plotting univariate Distribution
  • Plotting bivariate Distribution
  • Visualizing pairwise relationship in pairplots
  • Plotting with Categorical Data
  • Visualizing Linear Relationships
  • Plotting on Data-aware grids
  • Other Python Visualization tools
  • Linear Algebra in SciPy
  • Sparse Matrices in SciPy
  • Constants, Cluster and FFT Packages
  • Integration using SciPy
  • Interpolation in SciPy
  • SciPy I/O, SciPy ndimage
  • Optimization and root finding
  • SciPy.Stats
Scikit learn
  • Introduction to SciKit Learn and Machine Learning
  • Sample Dataset in SciKit Learn
  • Train Test using SciKit Learn
  • Classification IRIS using Decision Trees
  • Holdout Validation, K-fold cross Validation
  • Cross Validation using SciKit Learn
  • K-means Clustering in SciKit Learn
Basic Text Mining using Python
  • Introduction to Nature Language Processing tool kit
  • Tokenization, Lower casing and removing stop words, Lemmatization, Stemming
  • ngrams, Sentence tokenization, Part of speech tagging
  • Chunking, Named Entity Recognition
  • Introduction to WordNet, and word sense disambiguation
Mini Projects
  • Word ladders game Read more
  • Data Analysis and Prediction using the Loan Prediction Dataset Read more

R for data science

While Python has been used by many programmers even before they were introduced to data science, R has its main focus on statistics, data analysis, and graphical models. R is meant mainly for data science. Just like Python, R has also has very good community support. Python is good for beginners, R is good for experienced data scientists. R provides the most comprehensive statistical analysis packages.

In this module, I will again talk about both theory as well as hands-on about various aspects of R. I will use the R Studio for hands-on. I will discuss basic programming aspects of R as well as visualization using R. Then, I will talk about how to use R for exploratory data analysis, for data wrangling, and for building models on labeled data. Overall, I will cover whatever you need to do good data science using R.

Introduction to R
  • R Vs Python
  • Basics of R
  • Data Exploration in R
  • Customizations for ggplot in R
  • Common Problems, Facets, Geoms
  • Statistical Transformation
  • Position Adjustments
  • Coordinate Systems
RStudio Basics
  • Introduction to R Studio
  • RStudio Editor
  • Keyboard shortcuts
  • RStudio Diagnostics
Data Transformation with dplyr
  • Introduction to dplyr
  • dplyr-filter
  • dplyr-arrange, select
  • dplyr-mutate
  • dplyr-summarize
  • dplyr-Grouping and Ungrouping
Exploratory Data Analysis
  • Introduction to Exploratory Data Analysis
  • Variation
  • Covariation
  • Introduction to Data Wrangling and Tibbles
  • Tibbles Vs Data Frames
Data Import with readr
  • Introduction to Readr and Read csv
  • Parsing Vector
  • Parsing a file using Readr
  • Writing to files
Tidy data with tidyr
  • Introduction to tidy data
  • Spreading and Gathering
  • Separating and Unite
  • Missing Values
Relational Data with dplyr
  • Relational Data in Keys
  • Mutating joins in dplyr
  • Filtering joins and Set operations
Strings with stringr
  • Introduction to Strings and Combining Strings
  • Regular Expressions
Factors with forcats
  • Creating Factors using forcats
  • Visualization and reordering of categorical variables
Dates and times with lubridate
  • Creating Date/Time objects
  • Date/Time Components
  • Time Spans
Pipes with magrittr
  • Details about Pipe operator
  • Tools in magrittr
  • Functions in R
  • Conditional execution and function arguments
  • Variable Arguments in R
  • Return values in R
  • Basics of vector in R
  • Basics of Atomic vectors
  • Coercion, Test functions and Recyling rules
  • Naming and subset
  • Lists
  • Augmented vectors
Iterations with purrr
  • For loop and variations
  • Passing functions as an arguments
  • Map Functions
  • Dealing with failure
  • Advanced purrr
  • other patterns of for loop
Model basics with modelr
  • Introduction to modeling
  • Building your first simple model in R
  • Visualizing models in R
  • Modeling with categorical variables
  • Modeling with mix of categorical variables

Data Analysis using R: Why Are Low-Quality Diamonds More Expensive?

Probability and Statistics

Probability and statistics helps in understanding whether data is meaningful, including inference, testing, and other methods for analyzing patterns in data and using them to predict, understand, and improve results.

We live in an uncertain and complex world, yet we continually have to make decisions in the present with uncertain future outcomes. To study, or not to study? To invest, or not to invest? To marry, or not to marry? This is what is captured mathematically using the notion of probability. Statistics, on the other hand, helps us analyze data sets, and correctly interpret results to make solid evidence-based decisions.

In this module, I will discuss some very fundamental terms/concepts related to probability and statistics that often come across any literature related to Machine Learning and AI. Key topics include quantifying uncertainty with probability, descriptive statistics, point and interval estimation of means, central limit theorem, and the basics of hypothesis testing.

Basics of Probability
  • Introduction to Probability
  • Events, Sample space, Simple Probability, Join Probability
  • Mutually Exclusive events collectively exhaustive events marginal probability
  • Addition Rule
  • Conditional Probability
  • Multiplication Rule
  • Bayes theorem
  • Counting rules caution advanced stuff
Probability Distributions
  • What are probability distributions
  • Poisson Probability Distribution
  • Normal Probability Distribution
  • Binomial Probability Distribution
CLT and Confidence Intervals
  • Central Limit Theorem
  • CLT Example
  • CLT Using R-code
  • Confidence Intervals of Mean
  • Confidence Intervals of Mean Examples
  • Confidence interval of mean in details
  • Confidence interval for the mean with population deviation unknow
  • Confidence interval using Python
  • What do confidence intervals actually mean
  • Confidence intervals for pop mean with unknown pop std dev using Python
Hypothesis Testing
  • what is hypothesis testing? Null and alternative hypothesis
  • Hypothesis testing for pop mean type1 and type2 errors
  • 1-tailed hypothesis testing (known sigma)
  • 2-tailed hypothesis testing (known sigma)
  • Hypothesis testing (unknown sigma)
  • 2-sample tests
  • Independent 2-sample t-tests
  • Paired 2-sample t-tests
  • Chi-squared tests of independence
Measures of Central Tendency and Deviation
  • Descriptive Vs Inferential statistics
  • Central Tendency (mean, median, mode)
  • Measures of dispresion (Range, IQR, std dev, variance)
  • Five Number summary and skew
  • Graphic displays of basic statistical descriptions
  • Correlation Analysis

Machine Learning

Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us self-driving cars, practical speech recognition, effective web search, and a vastly improved understanding of the human genome.

Machine Learning is a first-class ticket to the most exciting careers in data science. As data sources proliferate along with the computing power to process them, automated predictions have become much more accurate and dependable. Machine learning brings together computer science and statistics to harness that predictive power. It’s a must-have skill for all aspiring data analysts and data scientists, or anyone else who wants to wrestle all that raw data into refined trends and predictions.

In this module, broadly I will talk about supervised as well as unsupervised learning. We will talk about multiple types of classifiers like Naïve Bayes, KNN, decision trees, SVMs, artificial neural networks, logistic regression, and ensemble learning. Further, we will also talk about linear regression analysis, sequence labeling using HMMs. As part of unsupervised learning, I will discuss clustering as well as dimensionality reduction. Finally, we will also discuss briefly semi-supervised learning, multi-task learning, architecting ML solutions, and a few ML case studies.

Introduction to Machine Learning
  • Introduction to machine learning
  • Supervised, semisupervised, unsupervised machine learning
  • Types of data sets
  • Data() in R
  • Introduction to classification
Decision Trees
  • Introduction to Decision tree
  • Hunt’s algorithm for learning a decision tree
  • Details of tree induction
  • GINI index computation
  • ID3, Entropy and information gain
  • ID3 Example
  • C4.5
  • Pruning
  • Metrics for performance Evaluation
  • Iris Decision Tree Example
K Nearest Neighbors (KNN)
  • Introduction to KNN algorithm
  • Decision boundary KNN Vs Decision tree
  • What is the best K
  • KNN Problems
  • Feature selection using KNNs
  • Wilson Editing
  • KNN Imputation
  • Speeding up KNN using KMeans
  • Coding up KNN from scratch in Python
  • KNN using sklearn
  • Digits classification using KNN in Python
Naïve Bayes
  • Examples of few text classification problems
  • Classification for text using bag of words
  • Naïve Bayes for text classification
  • Multinomial Naïve Bayes
  • Multinomial Naïve Bayes Example
  • Naïve Bayes for Hand-written digit recognition
  • Naïve Bayes for weather data
  • Numeric stability issue with Naïve bayes
  • Gaussian Naïve Bayes from scratch in Python
  • Naïve Bayes using sklearn
  • Multinomial Naïve Bayes
  • Linear Classifiers
  • Margin of SVM’s
  • SVM optimization
  • SVM for Data which is not linear separable
  • Learning non-linear patterns
  • Kernel Trick
  • SVM Parameter Tuning
  • Handling class imbalance in SVM’s
  • SVM’s pros and cons and summary
  • Linear SVM using Python
  • SVM with RBF kernel with Python
  • Learning SVM with noise data in Python
Ensemble Learning
  • Introduction to Ensemble learning
  • Why Ensemble learning
  • Independently constructed ensembles for classification: Majority voting
  • Independently constructed ensembles for classification: Bagging
  • Independently constructed ensembles for classification: Random forests
  • Independently constructed ensembles for classification: Error correcting output codes
  • Sequentially constructed ensembles for classification boosting
  • Sequentially constructed ensembles for classification boosting example
  • Sequentially constructed ensembles for classification stacking
  • Introduction to gradient boosted machines (GBM)
  • Relations between GBM gradient Descent
  • GBM regression with squared loss
  • Bagging in Python
  • Random forests in Python
  • Boosting in Python
  • Feature importance using ensemble classifiers
  • XGBoost in Python
  • Parameter tuning for GBM’s
  • Voting classifier using skLearn
Artificial Neural Networks
  • Motivation for Artificial Neural Network
  • Mimicing a single neuron, integration function, Activation Function
  • Perceptron Algorithm
  • Perceptron Algorithm Example
  • Decision Boundary for a single Neuron
  • Learning Non-Linear Patterns
  • Introduction to Deep Learning
  • What can we achieve using a single hidden layers
  • MLPs with Sigmoid activation Function
  • Layers are transformation into a new space
  • Playing at the Tensorflow playground
  • Cost function, Loss function, Error Surface
  • How to learn Weights
  • Stochastic Gradient descent, Minibatch SGD, Momentum
  • Choosing a learning Rate
  • Updaters
  • Back Propagation
  • Softmax and Binary/Multi-class cross entropy loss
  • Overfitting and Regularization
  • Practical Advice on using Neural Networks
  • Autonomous Vehicles
  • Automated Feature Learning using Neural Networks
  • Deep Learning Architectures and Libraries
  • Applications of Artificial Neural Networks
  • History of Artificial Neural Networks and Revival
  • Python Code: Basic Introduction to Tensorflow: Constants, Placeholders and Variables.
  • Python Code: Learning the first Tensorflow model: Linear Regression using Tensorflow.
  • Python Code: MLP for Hand-written digit recognition with no hidden layer with 10 output neurons
  • Python Code: MLP for Hand-written digit recognition with two hidden layers
  • Python Code: Fashion Multi-class classification using MLP in Keras
Linear Regression
  • Introduction to Linear Regression
  • Understanding the real meaning of Linear Regression
  • 𝑹^𝟐: Coefficient of Determination
  • Multiple Linear Regression and Non-linear Regression
  • Assumptions for Linear Regression
  • Using Residual to Verify the Assumptions for Linear Regression
  • Deriving Linear Regression Formulas using Ordinary Least Squares Method
  • Multiple Linear Regression
  • Underfitting, Overfitting, Bias and Variance
  • Ridge Regularization
  • Lasso Regularization, Elastic Net Regularization
  • Metrics and Practical Considerations for Regression
  • Python code: Simple Linear Regression using sklearn
  • Python code: Example to code up regression using ordinary least squares method
  • Python code: Multiple Linear Regression using Gradient Descent based approach
  • Python code: Multiple Linear Regression using sklearn
  • Python code: Ridge and Lasso Regression
Logistic Regression
  • Logistic regression vs Linear Regression
  • Can we use Regression Mechanism for Classification?
  • Logistic Regression – Deriving the Formula
  • Logistic Regression for Multi-class Classification
  • Logistic Regression Decision Boundary
  • Python Code: Logistic regression on the titanic dataset- Part 1
  • Python Code: Logistic regression on the titanic dataset- Part 2
  • Python Code: Logistic regression on the titanic dataset- Part 3
  • Python Code: Logistic regression on the titanic dataset- Part 4
  • Python Code: Visualizing a logistic regression model
Feature Selection
  • What is feature selection? Why feature selection?
  • Feature selection vs feature extraction
  • Feature subset selection using Filter based methods
  • More Filter based methods for feature selection
  • Wrapper Methods and their Comparison with Filter Methods
  • Wrapper Methods
  • Embedded Methods
  • Model based machine learning with regularization
  • Regularization using L2
  • Regularization using L1
  • Python Code: Feature Extraction with Univariate Statistical Tests (Chi-squared for classification)
  • Python Code: Recursive Feature Elimination — wrapper
  • Python Code: Choosing important features (feature importance)
  • Python Code: Feature Selection using Variance Threshold
  • What is feature selection? Why feature selection?
  • Feature selection vs feature extraction
  • Feature subset selection using Filter based methods
  • More Filter based methods for feature selection
  • Wrapper Methods and their Comparison with Filter Methods
  • Wrapper Methods
  • Embedded Methods
  • Model based machine learning with regularization
  • Regularization using L2
  • Regularization using L1
  • Python Code: Feature Extraction with Univariate Statistical Tests (Chi-squared for classification)
  • Python Code: Recursive Feature Elimination — wrapper
  • Python Code: Choosing important features (feature importance)
  • Python Code: Feature Selection using Variance Threshold
Sequence Labeling
  • Introduction to Sequence Learning
  • Sequence Labeling as Classification
  • Probabilistic Sequence Models
  • Hidden Markov Model
  • Details about HMMs
  • Dishonest Casino Example of an HMM
  • Three Problems of an HMM
  • Decoding Problem of an HMM and the Viterbi Algorithm
  • Evaluation Problem of an HMM
  • The Forward Algorithm
  • The Backward Algorithm and the Posterior Decoding
  • The Learning Problem of an HMM, The Baum Welch Algorithm
  • Conditional Random Fields (CRFs)
  • Why prefer CRFs over HMMs?
  • Python code: Creating a simple Gaussian HMM
  • Python code: Learning a Gaussian HMM
  • Python code: Sampling from HMM
  • Python Code: Use CoNLL 2002 data to build a NER system: Understand the dataset
  • Python Code: Use CoNLL 2002 data to build a NER system: Define features
  • Python Code: Use CoNLL 2002 data to build a NER system: Learn and evaluate the CRF
  • Python Code: Use CoNLL 2002 data to build a NER system: Hyper-parameter Optimization
  • Python Code: Use CoNLL 2002 data to build a NER system: Feature Importances
  • Applications of Clustering
  • Understanding Distance
  • Basics of Clustering
  • Hierarchical (Agglomerative) clustering Part 1
  • Hierarchical (Agglomerative) clustering Part 2
  • K-means Algorithm example
  • K-means Algorithm details
  • Problems with K-means
  • Evaluation of cluster quality
  • Engineering issues with clustering
  • Soft clustering and EM algorithm example
  • Clustering summary
  • Python code: Kmeans Example
  • Python code: Kmeans on digits Example
  • Python code: Clustering for color compression
  • Mini Batch KMeans
  • Python code: Agglomerative Hierarchical Clustering
  • Ensemble Methods for Clustering: Problem Definition
  • Ensemble Methods for Clustering: Image Segmentation
  • Ensemble Methods for Clustering: Broad Approach
  • Ensemble Methods for Clustering: Finding Corresponding Clusters
  • Ensemble Methods for Clustering: Combining Corresponding Clusters
Dimensionality Reduction using PCA and LDA
  • Why PCA?
  • PCA: A Layman’s Introduction
  • Understanding Matrix Transformations and Definition of Eigen Vectors
  • How is PCA Computed?
  • PCA Examples
  • Relationship between PCA, Curve Fitting and Entropy
  • Eigenfaces in OpenCV
  • Kernel PCA
  • Python Code: Compute PCA and show components
  • Python Code: PCA as dimensionality reduction
  • Python Code: PCA for visualization: Hand-written digits
  • Python Code: Eigenfaces
  • LDA
  • PCA vs LDA
  • 2 class LDA
  • 2 class LDA: Computing within and Between Class Scatter
  • 2 class LDA Full Example
  • LDA for C classes
  • Limitations of LDA
  • Python Code: LDA on Wine dataset
  • Python Code: LDA from Scikit Learn on Iris dataset
  • Python Code: LDA on Iris dataset from scratch
Architecting ML solutions
  • Machine Learning Process
  • Qualities of a Classifier
  • Technical Practical Issues in ML
  • Non-Technical Practical Issues in ML
ML case studies
  • Machine Learning for Healthcare – Part 1
  • Machine Learning for Healthcare – Part 2
  • Machine Learning for Internet Service Providers
  • Machine Learning for People Analytics
  • Machine Learning for Retail and Telecom – Part 1
  • Machine Learning for Retail and Telecom – Part 2
  • Machine Learning for Supply Chain Management
  • Machine Learning for Agriculture
  • Machine Learning for Education
  • Machine Learning for Transportation and self-driving cars
  • Machine Learning for Connected Cars
  • Machine Learning for Legal Domain – Part 1
  • Machine Learning for Legal Domain – Part 2
  • Machine Learning for Oil Industry
  • Machine Learning for Banking Domain – Part 1
  • Machine Learning for Banking Domain – Part 2
  • Machine Learning for Insurance
  • Machine Learning for Project Management
  • Machine Learning for Fashion Industry
  • Other use-cases of Machine Learning
ML Mini Projects
  • Learning various classifiers on Iris dataset 
  • MLP for hand-written digit recognition 
  • Logistic regression on the titanic dataset 
  • Use CoNLL 2002 data to build a NER system 

Data Mining

The area of Data Mining specifically deals with topics like pattern mining, OLAP, data cubes, and outlier detection. Frequent pattern mining deals with mining frequent subsets, subsequences or subgraphs from transactional, sequence or graph datasets respectively. These are very useful for Basket data analysis, cross-marketing, catalog design, sale campaign analysis, Web log (click stream) analysis, and DNA sequence analysis. OLAP enables users to quickly analyze information that has been summarized into multidimensional views and hierarchies. By summarizing predicted queries into multidimensional views prior to run time, OLAP tools provide the benefit of increased performance over traditional database access tools. Outlier analysis has numerous applications in a wide variety of domains such as the financial industry, quality control, fault diagnosis, intrusion detection, web analytics, and medical diagnosis.

In this module, I will cover basic methods for pattern mining like Apriori and FP growth. I will also cover basic concepts in OLAP and in outlier detection.

Frequent Pattern Mining and Association Rules
  • What is frequent pattern mining? What are the applications?
  • Understanding frequent patterns, association rules, support and confidence
  • Apriori Frequent Pattern Mining Method
  • Improving Apriori Frequent Pattern Mining Method: Less scans
  • FP Growth Frequent Pattern Mining Method: Building an FP tree
  • FP Growth Frequent Pattern Mining Method: Creating Conditional Pattern Bases
  • FP Growth Frequent Pattern Mining Method: Extracting Frequent Patterns
  • Comparing Apriori with FP Growth
  • ECLAT: Frequent Pattern Mining with Vertical Data Format
  • Which association rules are interesting? Lift, Chi Square
  • Which association rules are interesting? Null invariance
  • Understanding closed patterns and max patterns
  • Summary of frequent pattern mining
  • Python code: Hand-computing support and confidence
  • Python code: Association Rule Mining
  • Python code: Apriori
  • Python code: Evaluating lift for association rules
  • Python code: Problem on computing association rules with 100% confidence
  • Python code: Orange way of computing association rules and frequent patterns
Basic Concepts of Data Warehousing
  • Basic Concepts in Data Warehousing
  • OLTP vs OLAP
  • Data Warehouse Architecture
  • Data Warehouse Modeling: Data Cubes
  • Conceptual Modeling of Data Warehouses
  • Concept Hierarchies and Types of Measures
  • Data Cube Example
  • OLAP Operations
  • Data Warehouse: Design and Usage
  • Data Cube Computation and Query Processing
  • Data Cube Computation: Preliminary Concepts
  • Efficient Data Cube Computation
  • Multi-Way Array Aggregation
  • Bottom-Up Computation (BUC)
  • High-Dimensional OLAP – Part 1
  • High-Dimensional OLAP – Part 2
  • Introduction to Sampling Cube
  • Query Expansion in Sampling Cube
  • Python Code: Introduction to OLAP and OLAP Server API in Python Cubes 1.1
  • Python Code: Loading data, specifying model and building aggregates in Python Cubes 1.1
Outlier Detection
  • What are outliers? What is outlier analysis?
  • Broad overview of outlier detection Methods
  • Statistical Methods for Outlier Detection
  • Proximity based Methods for Outlier Detection: Distance based outliers
  • Proximity based Methods for Outlier Detection: Density based outliers
  • Clustering based Methods for Outlier Detection
  • Classification based Methods for Outlier Detection
  • Outlier Detection for high dimensional data
  • Python Code: Remove values > 2 std dev from mean
  • Python Code: Percentile based outliers vs median absolute deviation based outliers
  • Python Code: Example of using LOF for outlier detection
  • Python Code: Example of using Cluster-based Local Outlier Factor (CBLOF) for outlier detection
  • Python Code: Example of using one class SVM for outlier detection using pyod
  • Python Code: Example of using PCA for outlier detection
  • Python Code: One class SVM using scikit learn for outlier detection

Text Mining and Analytics

Text mining includes techniques for mining and analyzing text data to discover interesting patterns, extract useful knowledge, and support decision making, with an emphasis on statistical approaches that can be generally applied to arbitrary text data in any natural language with no or minimum human effort.

This module will introduce the learner to text mining and text manipulation basics. We cover basics of text processing including regular expressions in the R and Python modules itself. Also, I talked about text classification in the machine learning module. Further, in this module, I will talk about further interesting topics in text mining such as n-gram models, Named Entity Recognition, Natural Language Processing, Sentiment Analysis, and Summarization.

N-gram Models
  • Next Word Prediction
  • Learning n-gram models
  • Text Generation using n-gram models
  • Handling low frequency n-grams
  • Google n-grams
  • Evaluation of n-gram models
  • Information Retrieval using language models
  • Query Likelihood Model
  • Smoothed Query Likelihood Model
  • Laplace Smoothing
  • Jelinek-Mercer Smoothing
  • Dirichlet Smoothing and Two-Stage Smoothing
  • Overall IR Language Model
  • Python code: Building N-Gram models
  • Python code: Next word prediction using 2-gram models (max prob)
  • Python code: Next word prediction using 2-gram models (Weighted random choice based on freq)
  • Python code: Creating Tri-grams and higher n-gram models
  • Python code: Generating text using n-gram models with n>=3
  • Python code: Laplace Smoothed n-grams
  • Python code: Computing perplexity
Named Entity Recognition
  • What is NER?
  • Why is NER challenging?
  • Applications of NER
  • Annotation and Evaluation for NER
  • Broad Approaches for NER
  • Rule based Approaches for NER: List lookup approach
  • Rule based Approaches for NER: Shallow parsing approach
  • Rule based Approaches for NER: Shallow parsing approach with context
  • Learning based Approaches for NER
  • Python Code: Read text file, extract sentences and words
  • Python Code: Part of Speech Tagging and NER
  • Python Code: Chunking/NER visualization
  • Python Code: Get complete Person Names and Location Names from any text
Natural Language Processing
  • What is NLP?
  • List of NLP Tasks
  • Why is NLP challenging?
  • Tokenization
  • Lemmatization and Stemming
  • Sentence Segmentation
  • Phrase Identification
  • Word Sense Disambiguation: Part 1
  • Word Sense Disambiguation: Part 2
  • Parsing
  • Python Code: Word Tokenization with nltk
  • Python Code: Stemming and Lemmatization with nltk
  • Python Code: Tokenization, Word Counts, Stop Word removal, and Text Normalization using Italian recipes data
  • Python Code: Text Processing with Conference Abstracts Dataset
  • Python Code: Text Classification for Reuters Dataset using Scikit-Learn
Sentiment Analysis
  • Applications of Sentiment Analysis
  • Word Classification based Approach for Sentiment Analysis
  • Naïve Bayes for Sentiment Analysis
  • Challenges in Sentiment Analysis
  • Sentiment Lexicons
  • Learning Sentiment Lexicons: “and” and “but”
  • Learning Phrasal Sentiment Lexicons: Turney’s Algorithm
  • Learning Sentiment Lexicons: WordNet approach
  • Learning Sentiment Lexicons: Domain specific
  • Python Code: Basic Sentiment Analysis using Naive Bayes and sentiment dictionaries
  • Python Code: Sentiment Analysis on Movie Reviews Dataset
  • Python Code: Sentiment analysis on Twitter Data obtained via Tweepy
  • What is Summarization? What are its applications?
  • Genres and Types of Summaries
  • Position-based, cue phrase-based and word frequency-based approaches for extractive summarization
  • Lex Rank
  • Problems with Extractive Summarization Methods
  • Cohesion-based Methods
  • Lexical Chains Method for Extractive Summarization
  • Information Extraction based Method for Extractive Summarization
  • Interpretation Methods for Summarization
  • Multi-document Summarization
  • Evaluating Summaries – Extrinsic vs Intrinsic
  • Evaluating Summaries – ROUGE and BLEU
  • Python Code: Write a Simple Summarizer in Python from Scratch
  • Python Code: Text Summarization using Gensim (uses TextRank based summarization)
  • Python Code: Text Summarization using sumy (LSA, Word freq method, cue phrase method)
  • Python Code: LexRank using sumy
  • Python Code: Summarization using PyTeaser
  • Python Code: Text Rank using summa
Topic Modeling
  • What are topic models? Why do you need them?
  • Plate diagrams, unigram models, mixture of unigrams
  • Application of topic modeling to matrices with high dimensionality
  • Singular Value Decomposition
  • Latent Semantic Indexing/Analysis (LSI/LSA) as an application of SVD
  • Latent Semantic Indexing/Analysis (LSI/LSA): Examples, Advantages and Drawbacks
  • Probabilistic Latent Semantic Analysis (PLSA)
  • Comparison between LSI and PLSA/PLSI
  • Motivation for LDA
  • Dirichlet Distributions
  • LDA Model Details
  • Comparison between various topic models: unigrams, mixture of unigrams, PLSI, LDA
  • LDA Hyper-parameters
  • Other Topic Models
  • Python Code: LDA using gensim
  • Python Code: LDA using scikit learn
  • Mini Project: Topic Modeling with Gensim – Loading data
  • Mini Project: Topic Modeling with Gensim – Pre-processing
  • Mini Project: Topic Modeling with Gensim – Building LDA Model
  • Mini Project: Topic Modeling with Gensim – Visualization
  • Mini Project: Topic Modeling with Gensim – Mallet and Hyper-parameter Tuning
  • Mini Project: Topic Modeling with Gensim – LDA Model analysis
Word Representation Learning
  • What are word representations? Where can you use word vectors?
  • Neural Network Language Model (NNLM)
  • Word2Vec
  • CBOW and Skip-gram
  • GloVe (Global vectors for word representation)
  • Python Code: Using gensim to train your first Word2Vec model
  • Python Code: Finding similar words using gensim Word2Vec model
  • Python Code: More stuff with word2vec models: Find odd one out, compute accuracy, get the actual vector, and save model.
  • Python Code: Another gensim model example using Text8 corpus
  • Python Code: GloVe Example
  • Python Code: Using Stanford’s GloVe Embedding

Web Mining

Web Mining deals with analytics on web-related data. How do search engines return relevant results so quickly for various queries? How do these search engines work? How does Amazon recommend products to its users? How are social networks formed and how do they grow? How do people influence each other on social networks? How do search engines make money through ads? How can you use the wisdom of the crowds to generate useful and credible information?

The course will take the participants through understanding of the basic information retrieval concepts, web mining concepts, the architecture of search engines, and applications. This module aims to provide a conceptual and practical understanding of various aspects of web mining starting with the basics of web search to discussions about recent topics studied in the World Wide Web community. Topics covered will include: crawling, indexing, ranking, analysis of social networks, recommendation systems, and basics of computational advertising.

Text indexing
  • Term-Document Incidence Matrices
  • Inverted Indexes
  • Inverted Index Construction
  • Sorting for Inverted Index Construction
  • Query Processing using Inverted Indexes
  • Query Optimization for Inverted Indexes
  • Phrase Query Processing using Bi-word Indexes
  • Phrase Query Processing using Positional Indexes
  • Heap’s Law
  • Zipf’s Law
  • Motivation for Compression of Inverted Indexes
  • Dictionary Compression using Fixed-width terms or a single string
  • Dictionary Compression using blocking and front coding
  • Dictionary Compression using BTrees and Tries
  • Postings Compression by coding gaps
  • Variable Length Encoding for Postings Compression
  • Unary and Gamma codes for Postings Compression
  • What is Lucene?
  • Java code: Indexing Shakespeare’s plays.
  • Java code: Searching Shakespeare’s plays.
  • Fields in Lucene
  • Analyzers in Lucene
  • QueryParsers and Scoring in Lucene
  • Basics of Crawling
  • What any crawler must/should do?
  • URL frontier, politeness, robots.txt
  • Processing Steps in Crawling
  • Webpage and Web Graph processing
  • Using Nutch for Crawling
Relevance ranking
  • Need for Relevance Ranking
  • Jaccard Similarity for Relevance Ranking
  • TF and IDF
  • Vector Space Model, Cosine Similarity, and Okapi BM25
  • Efficient Cosine Ranking
  • Parametric, zone and tiered indexes
  • Evaluating Search Engine Quality: Factors, NDCG
  • Evaluating Search Engine Quality: Kappa Measure, AB testing
  • Python code: TFIDF Computation from scratch
  • Python code: TFIDF computation using SKLearn
  • Python code: TFIDF computation using gensim
Link analysis algorithms
  • Link-based Ranking of Web Pages
  • Power Iterations Method
  • Random Walk Interpretation
  • Spider traps and dead-ends
  • Problems with PageRank
  • Topic Sensitive PageRank
  • HITS (Hypertext-Induced Topic Selection)
  • Web Spam
  • TrustRank to Handle Link Spam
  • Python code: PageRank and HITS using networkx
  • Python code: PageRank from Scratch
Recommendation Systems
  • Introduction to Recommender Systems
  • User-based Collaborative Filtering
  • Problems with User-based Collaborative Filtering
  • Item-based Collaborative Filtering
  • Hybrid Recommendation Methods
  • Recommendation System Case Studies: Video and Software Items
  • Tag Recommendations
  • People Recommendations within an enterprise
  • Friend Recommendation on Twitter
  • Recommendations for Groups
  • Cold Start Problem
  • Explanations for Recommendations
  • Evaluation of Recommendation Systems: Offline Evaluation
  • Evaluation of Recommendation Systems: User Studies and Online Evaluation
  • Python Code: User-user collaborative filtering and item-based CF from scratch
  • Python Code: Introduction to User Article Interaction Dataset
  • Python Code: Pre-processing dataset before building recommendation models
  • Python Code: Defining recommendation evaluation measure
  • Python Code: Popularity based recommender
  • Python Code: Content based recommender
  • Python Code: Collaborative Filtering based recommender
  • Python Code: Simple Hybrid Recommender
  • Python Code: Comparison across multiple types of recommenders
  • Python Code: Obtaining recommendations for a person
Social Network Analysis
  • Introduction to Social Network Analysis
  • Erdös-Renyi Model
  • Small World Model
  • Kleinberg’s Model
  • Power Laws
  • Preferential Attachment Model
  • Copying Model
  • Forest Fire Model
  • Model with Network Components
  • Summary of Various Network Generation Models
  • Python code: Generate Graphs, Traverse Nodes and edges, Save and Load Graphs using snap
  • Python code: Graph Manipulation using snap
  • Python code: Computing Structural Properties using Snap
  • Python code: Plot graphs and their degree distributions
Social Influence Analysis
  • What is Social Influence?
  • Does Social Influence really matter?
  • Examples of Social Influence
  • Measuring Social Influence: RCT test
  • Measuring Social Influence: Shuffle test and reverse test
  • Measuring Social Influence: Reachability and action-based methods
  • Social Theories: Structural Balance and Social Status
  • Models for Social Influence Analysis: Linear Threshold Model
  • Models for Social Influence Analysis: Independent Cascade Model
  • Influence Maximization Problem
  • Solutions for Influence Maximization Problem
  • Applications of Social Influence Analysis
  • Python Code: Independent Cascade Model on Facebook Social Circles Dataset
  • Python Code: Influence maximization heuristics on wiki-Vote data
Twitter Data Analysis
  • Twitter data characteristics and challenges
  • Burstiness to detect events from Twitter
  • Detecting Events using Graph Community Analysis
  • Detecting Events using CRFs
  • Detecting Events using Tag Correlations
  • Detecting Events by Label Propagation from News
  • Finding best phrase to describe an event
  • Finding event types
  • Finding event timespans
  • Detecting sporting events
  • Detecting local festivals
  • Detecting drug related adverse events
  • Detecting emerging controversial events
  • Python Code: Retreiving trends from Twitter
  • Python Code: Collecting search results and extracting text, screen names and hashtags from tweets
  • Python Code: Lexical analysis of tweets
  • Python Code: Analysis of retweets
Mining Structured Information from the Web
  • Introduction to Information Extraction
  • What all can be extracted?
  • Wrapper Induction: Why and what
  • Extraction rules for Wrapper Induction
  • Learning Extraction rules for Wrapper Induction
  • Wrapper Maintenance
  • Extracting Tables from the web
  • Extracting Tables from the web: Recovering relations from raw HTML tables
  • Extracting Tables from the web: Applications
  • Python Code: Get list of all Presidents of India with related information from Wikipedia page using just pandas!
  • Python Code: Understanding Basics of BeautifulSoup
  • Python Code: Scraping weather forecasts using BeautifulSoup
  • Python Code: Scraping apartment information using beautiful soup from apartments.com
  • OpenIE and Tagme
Computational Advertizing
  • Introduction to Computational Advertising
  • Computational Ads Basic Concepts: Stakeholders and Revenue Models
  • Display Ads: Problems and Methods
  • Introduction to Textual Ads
  • Selection of Textual Ads: Part 1
  • Selection of Textual Ads: Part 2
  • Sponsored Search
  • Introduction to Game Theory and Nash Equilibrium
  • Game Theory for Ads
  • Vickrey Auction
  • VCG Auction
  • Auctions for Sponsored Search
  • Generalized First Price Auction
  • Generalized Second Price Auction
  • Comparison between GFP, GSP, VCG
  • Python Code: Introduction to the Ad Click Through Rate (CTR) Prediction Problem
  • Python Code: Exploratory Data Analytics for Ad CTR Prediction: Part 1
  • Python Code: Exploratory Data Analytics for Ad CTR Prediction: Part 2
  • Python Code: Developing Logistic Regression Prediction model for Ad CTR Prediction
  • Python Code: Developing Gradient Boosting Prediction Models for Ad CTR Prediction
  • Introduction to Crowdsourcing
  • Applications of Crowdsourcing: Part 1
  • Applications of Crowdsourcing: Part 2
  • Cons of Crowdsourcing
  • Quality and Incentives Control in Crowdsourcing
  • Managing Complex tasks in Crowdsourcing
  • Security Challenges in Crowdsourcing
  • Fake reviews and social network sybils in Crowdsourcing
  • Managing Quality of Annotations
  • Weighted voting to get final labels
  • Gold testing with bad worker quality and unbalanced datasets
  • Integrating crowdsourcing with machine learning
  • Tips for Iterative HitApp Design
  • Introduction to the Amazon Mechanical Turk Platform
  • An Example of a crowdsourcing project using Mechanical Turk
Entity Resolution in the Web of Data
  • Entities and Knowledge Bases
  • The Entity Resolution Problem
  • Examples of Entity Resolution
  • Similarity Function for Entity Resolution
  • Entity Resolution Workflow
  • Standard Blocking and the Sorted Neighborhood Method
  • Canopy Clustering and Token Blocking
  • Attribute Clustering Blocking
  • ZenCrowd Blocking
  • Prefix-Infix(-Suffix) Blocking
  • Block Post-Processing
  • Meta-Blocking
  • Python Code: Link two datasets using the recordlinkage Python package
  • Python Code: Data deduplication using recordlinkage Python package
  • Python Code: Classification Algorithms for Record Linkage
  • Using the dedupe package in Python

Data Collection

Data scientist is the sexiest job of the 21st century. When performing data science, a lot of time is spent in collecting useful data and pre-processing it. If the collected data is of bad quality, it can lead to bad quality models. Hence, it is very important to understand how to collect good quality data. Also, it is important to understand various ways in which data can be collected.

In this module I will discuss different aspects of data collection. I will begin with discussions around decisions to make while doing data collection, data collection rules and approaches, and ways of performing data collection. Further, data can be collected from the web by scraping. Hence, we will learn how to perform basic scraping. Lastly, we will discuss briefly about collecting graph data as well as data collection using IoT sensors.

Basics of Data Collection
  • What is data collection?
  • Data collection decisions, rules and approaches
  • Data collection tools: Surveys and Questionnaires
  • Data collection tools: Interviews
  • Data Collection Planning
Web Scraping
  • What is web scraping?
  • Techniques for web scraping
  • Techniques to prevent web scraping
  • Scraping Amazon reviews using bash script
  • Scraping using scrapy: redditbot example
  • Scraping using scrapy: shopclues example
  • Scraping using scrapy: techcrunch example
  • Data collection APIs Examples
  • Calling APIs using Python
  • Using flask to create Python APIs
Graph data collection
  • What information to collect and boundary specification
  • Sources of Graph Data and Krackhardt CSS
  • Graph Data Repositories
IoT and Sensor Data collection
  • What is IoT?
  • RFID and other sensors
  • IoT Applications: Smart Grid and Intelligent Transportation
  • IoT Applications: ANPR and Quantified Self
  • Arduino and Proteus
  • Blinking LED with Arduino+Proteus
  • Arduino Input Output
  • Using Temperature Sensors to collect temperature data

Deep Learning

Deep learning has caught a great momentum in the last few years. Research in the field of deep learning is progressing amazingly fast. Deep Learning is a rapidly growing area of machine learning. Machine learning has seen numerous successes but applying learning algorithms today often means spending a long time hand-engineering the input feature representation. This is true for many problems in vision, audio, NLP, robotics, and other areas. To address this, researchers have developed deep learning algorithms that automatically learn a good representation for the input. These algorithms are today enabling many groups to achieve ground-breaking results in vision, speech, language, robotics, and other areas.

I already discuss the basics of artificial neural networks in the machine learning module. Further, in this module, I will focus on other popular deep learning architectures like Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs) and Long Short Term Memory (LSTMs) Networks.

  • ImageNet and visual recognition problems
  • Biological inspiration for CNNs
  • Applications of CNNs
  • Why not just use MLPs for images?
  • CONV layer of a CNN
  • Details of CONV layer of a CNN
  • Stride and Pad for CONV layers of a CNN
  • Neuron view of the convolution layer
  • RELU in CNNs
  • Pooling and fully connected layers in CNNs
  • AlexNet and Hyper-parameter Optimization
  • Python Code: CNNs for Hand-written digit recognition using Tensorflow
  • Python Code: CNNs for Hand-written digit recognition using Keras
  • Python Code: Simple image classification with Inception Model
RNNs and LSTMs
  • Motivation for sequence learning models
  • Neural Language Model using MLPs
  • Introduction to Recurrent Neural Networks
  • Back-propagation for RNNs
  • RNN design options
  • CNN-RNN architecture for image captioning
  • Deep Bidirectional RNNs for opinion mining
  • Sequence Learning for machine translation using RNNs
  • Drawbacks of RNNs
  • Solutions for the exploding gradient problem
  • Memory based models: Gated Recurrent Units (GRUs)
  • Long Short-Term Memory Networks (LSTMs)
  • LSTM Variants
  • LSTM Hyperparameter tuning
  • Applications of RNNs and LSTMs: Video analytics, Hate Speech Detection, Extractive Summarization
  • Applications of RNNs and LSTMs: Translation Quality Estimation, Text Segmentation, Recommendation Systems
  • Applications of RNNs and LSTMs: Medical Social Media Analysis
  • Python Code: Classify movie reviews — binary classification using Keras.
  • Python Code: RNNs for Hand-written digit recognition using Tensorflow
  • Python Code: Bi-directional RNNs for Hand-written digit recognition using Tensorflow
  • Python Code: Next word prediction using RNNs
  • Python Code: Scalars, graphs, distributions and histograms using TensorBoard
IoT and Sensor Data collection
  • What is IoT?
  • RFID and other sensors
  • IoT Applications: Smart Grid and Intelligent Transportation
  • IoT Applications: ANPR and Quantified Self
  • Arduino and Proteus
  • Blinking LED with Arduino+Proteus
  • Arduino Input Output
  • Using Temperature Sensors to collect temperature data

Manish Gupta


He is an Adjunct Faculty at the International Institute of Infomation Technology, Hyderabad and a visiting faculty at the Indian School of Business, Hyderabad. He received his Masters in Computer Science from IIT Bombay in 2007 and his Ph.D. from the University of Illinois at Urbana-Champaign in 2013.


INR  24000

Length:  200+ Hours

Validity:  1 year (365 days)


For any good data science story, it is very important to visualize it nicely. Visualizations help us understand data and insights much better.

I cover basics of visualization in R and Python in those respective modules. In this module, I will talk about innovative ways of visualizing complex and large data.

Basics of Visualization
  • Why data visualizations?
  • Guidelines for good plots: Part 1
  • Guidelines for good plots: Part 2
  • Guidelines for good plots: Part 3
  • Maintain integrity when plotting data: Avoid misleading graphs
  • Web–based visualization libraries
  • Data Analysis/Business Intelligence and Visualization Softwares
Plotting Large Data
  • Plotting pitfalls with large data
  • Python Code: Plotting sample of NYC taxi data using bokeh
  • Python Code: Interactive Plotting of NYC taxi data using datashader and bokeh
  • Python Code: Plotting US Census data using datashader
Visualizing Graph Data
  • Graph visualization: Why?
  • Graph visualization: Challenges
  • Graph visualization: Aesthetics
  • Graph visualization: Common Layout Algorithms
  • Graph visualization: Large graphs
  • Introduction to Gephi
Twitter Sentiment Analysis

Investigation of open data from internet-based expressions and opinions could yield fascinating outcomes and bits of knowledge into the universe of popular feelings about any item, administration or identity. The blast of Web 2.0 has prompted expanded action in Podcasting, Blogging, Tagging, Contributing to RSS, Social Bookmarking, and Social Networking. Subsequently there has been a sudden increase of enthusiasm for individuals to mine these tremendous assets of information for suppositions.

Sentiment analysis or Opinion Mining is mining of sentiment polarities from online social media. In this project we will talk about a procedure which permits use and understanding of twitter information for sentiment analysis. We perform several steps of text pre-processing and then experiment with multiple classification mechanisms. Using a dataset of 50000 tweets and TFIDF features, we compare the accuracy obtained using various classifiers for this task. We find that linear SVMs provide us the best accuracy results among the various classifiers tried. Sentiment analysis classifier could be useful for many applications like a market analysis of different features of a new product or public opinion for a new movie or speech by a political cand

Diabetic Retinopathy Detection

Diabetic retinopathy is the leading cause of blindness in the working-age population of the developed world. It is estimated to affect over 93 million people. Currently, detecting DR is a time-consuming and manual process that requires a trained clinician to examine and evaluate digital color fundus photographs of the retina.

By the time human readers submit their reviews, often a day or two later, the delayed results lead to lost follow up, miscommunication, and delayed treatment. With color fundus photography as input, the goal of this project is to build an automated detection system. You are provided with a large set of high-resolution retina images taken under a variety of imaging conditions.

A left and right field is provided for every subject. Images are labeled with a subject id as well as either left or right (e.g. 1_left.jpeg is the left eye of patient id 1). A clinician has rated the presence of diabetic retinopathy in each image on a scale of 0 to 4, according to the following scale: 0 – No DR, 1 – Mild, 2 – Moderate, 3 – Severe, 4 – Proliferative DR. Your task is to create an automated analysis system capable of assigning a score based on this scale.

Course Features



Assignments/ Quizes

Email Assistance / Phone calls

Get Certified

Flexi Pay


Please note that the videos are not downloadable. Sharing your access or trying to sell or distribute videos is a legally punishable offence. Earlier we caught some people doing this and they were punished legally and a huge penalty was imposed on them.

Rohini kumar. M

I have read a lot of DS books before joining this course, I had difficulty in understanding the intuition behind some algorithms..after watching Manish sir’s teaching of those complex algorithms I have got a clear understanding of those algorithms thanks to Ravi sir for bringing this course to students.


The support from the team was very quick, the questions are answered within 24 hrs through mails/phone calls…This course helped in cracking many interviews in DS field..most of the questions asked during interviews were taught in this course.-


I have not seen a course which teaches both python and R required for ds.Mathematical explanations given for algorithms were simply awesome.Thanks to Manish sir for making concepts clear.


The course content is the vast and best which is more than required for a fresher to start their career in ds field. Thanks to Ravindra babu Ravula sir and Manish sir for providing such a large content. Manish sir explained most of the complex concepts with some history behind those concepts to cutting edge use cases of those concepts in industry.


Manish sir covered each and every concept from scratch. I have attended many interviews all the questions asked in the interview were covered in these course in the simplest way possible.

Priya Basu

The best part about this course was customer support and No prerequisite. I feel anyone who is interested in a data science course can take this course. Manish sir’s way of teaching complex and advanced concepts will just simple blew you away.

Anjali Thakur

I have taken many courses for ds/ml.. but this course like heaven to me. They covered complete end to end concepts in ds from web scrapping to building optimal ml models. My queries regarding concepts were solved within 24 hours. Thanks to team for making my concepts much more clear.


I am very happy with the course content and customer support provided by MLminds.Course videos connected all the dots.Thanks to Ravindrababu ravula sir and Manish sir for providing such a great lectures.

Seema Sen

After finishing the course content now I can confidently say that I can give first cut solution to most of the ml problems.Mathematical explanations given for ml algorithms were simply awesome.A huge thanks to Manish sir..


I am addicted to Manish sir’s way of teaching difficult concepts in the simplest way possible. Thanks to the team for resolving all the queries.


In my personal opinion those who are looking to change their careers to the data science field. Mlminds is a one-stop solution. Completely impressed by the Manish sir’s teaching.


I finished the course last month and now I could able to crack most of the data science interviews very easily.Thanks to Manish sir for combining your industry experience and knowledge and delivering it.


The best part about the course the Manish sir has explained every integrity details of the ml algorithms with code and the customer support provided by the team was super.


This course will definitely change the way you think about new deep learning algorithms that are evolving nowadays. Thanks to Manish sir for explaining at a very deep level of each and every algorithm.


Thanks to Manish sir for making my foundations strong in maths. Now I’m confident to learn any new ml algorithm through research papers. Thanks to the team of MLminds for patiently explaining even my tiny doubts regarding the videos.


All the old school maths I have learned during my schooling and college were been just a dots in my mind. Thanks a lot to Manish sir for connecting all those dots.

Everything will be taught from scratch. No prerequisite.

Data science content by Microsoft experts.

Master data science course 1000+ students registered