Course

Data Science

Course Fee

Rs 24,000/-

Basics of Business Analytics

Business analytics is the practice of iterative, methodical exploration of an organization's data, with an emphasis on statistical analysis. Business analytics is used by companies committed to data-driven decision-making. It is about using your data to derive information, insights, knowledge, and recommendations. Businesses use business analytics to improve effectiveness and efficiency of their solutions.

In this module, I will talk about how analytics has progressed from simple descriptive analytics to being predictive and prescriptive. I will also talk about multiple examples to understand these better, and discuss various industry use cases. I will also introduce multiple components of big data analysis including data mining, machine learning, web mining, natural language processing, social network analysis, and visualization in this module. Lastly, I will provide some tips for learners of data science to succeed in learning and applying data science successfully for their projects.

  • Descriptive analytics, predictive analytics, prescriptive analysis
  • Brief Introduction about Components of Big Data Analysis
  • Introduction to Hadoop and Big Data Infrastructure
  • Introduction to Data Mining
  • Introduction to Machine Learning
  • Introduction to Nature Language Processing
  • Introduction to Information Retrieval
  • Introduction to Web Mining
  • Introduction to Social Network Analytics
  • Introduction to IOT
  • Introduction to Visualization
  • Application on Big Data Analytics
  • Challenges in Applying Analytics to Business Problems
  • Tips on Career in Data Science

Python for Data Science

Python and R are the two most popular programming languages for data scientists as of now. Python is an interpreted high-level programming language for general-purpose programming. Created by Guido van Rossum and first released in 1991, Python has a design philosophy that emphasizes code readability, notably using significant whitespace. Python is open source, has awesome community support, is easy to learn, good for quick scripting as well as coding for actual deployments, good for web coding too.

In this module, I will start with basics of the Python language. We will do both theory as well as hands-on exercises intermixed. I will use Jupyter notebooks while doing hands-on. I will also discuss in detail topics like control flow, input output, data structures, functions, regular expressions and object orientation in Python. Closer to data science, I will discuss about popular Python libraries like NumPy, Pandas, SciPy, Matplotlib, Scikit-Learn and NLTK.

  • Why Python
  • Python Installation
  • Python 2.7 Vs 3.x
  • Introduction to Essential Python Libraries
  • Introduction to iPython and Jupyter Notebooks
  • Python Language Basics- Indentation, Comments, Function Calls, Variables and Argument Passing
  • Python Language Basics-Types, Duck-Typing, Import
  • Python Language Basics-Binary operators, Comparisons, Mutable
  • Python Language Basics-Standard Data types in Python
  • Python Language Basics-Command Line Arguments
  • Loops: for, while
  • Conditional Execution
  • Input, output, Eval, Print
  • repr, str, zfill
  • File IO
  • JSON I/O with Python Dictionary
  • JSON I/O with Generic objects
  • JSON I/O Serialization and Deserialization
  • JSON I/O File
  • Introduction to Pickle
  • cPickle
  • Pickle and Multi-Processing
  • Tuples
  • List
  • Sorting, Searching, Slicing
  • Built-In Functions-Enumerate, Sort, Zip, Reversed
  • Dictionary
  • Sets
  • Lists, Sets and Dict Comprehensions
  • Introduction to Functions and Variable Length Argument
  • Namespace, Scope, Local Funtions, Local vs Global Variables
  • Returning multiple vales, Pass by Reference
  • Functions are objects
  • Recursive functions, Anonymous(Lambda) Functions
  • Currying, Generators
  • Itertools Module
  • Errors and Exception Handling
  • Python Modules and Packages
  • object oriented Nature of Python
  • Class Inheritance, overriding, overloading, Data Hiding
  • Searching for patterns, matching
  • groups
  • Regular expression flags
  • split, findall, finditer
  • Repetition syntax
  • Character sets, Exclusion, Character Ranges, Escape Codes
  • Substitution
  • Greedy vs non-greedy matching
  • Backreferences and anchors
  • Capturing parts of pattern match
  • split and zero-width assertions
  • Look-arounds
  • Introduction to Numpy and ndarrays
  • Datatypes of ndarrays
  • Arithmetic operations, Indexing, Slicing
  • Boolean and fancy indexing
  • Basic ndarray operations
  • Array-oriented programming with arrays
  • Conditional, Statistical and Boolean operation
  • Sorting and set operation
  • File IO with NumPy
  • Linear Algebra for Numpy
  • Reshaping, Concatenating and Splitting Arrays
  • Broadcasting
  • Series Data Structures
  • DataFrame
  • Index objects
  • Reindexing
  • Dropping entries from an axis
  • Indexing, Selection and Filtering
  • Arithmetic and Data Alignment
  • Operations between DataFrame and Series
  • Function Application and Mapping
  • Sorting and Ranking
  • Axis indexes with duplicate labels
  • Computing Descriptive Statistics
  • pct_change(), Correlation and Covariance, Unique values, Value counts and membership
  • Introduction to Matpotlib
  • Colours, Markers and line styles
  • Customization of Matplotlib
  • Plotting with Pandas
  • Barplots, Histograms plots, Density Plots
  • Introduction to Seaborn, Style Management
  • Controlling figure aesthetics
  • Colour Palettes
  • Plotting univariate Distribution
  • Plotting bivariate Distribution
  • Visualizing pairwise relationship in pairplots
  • Plotting with Categorical Data
  • Visualizing Linear Relationships
  • Plotting on Data-aware grids
  • Other Python Visualization tools
  • Linear Algebra in SciPy
  • Sparse Matrices in SciPy
  • Constants, Cluster and FFT Packages
  • Integration using SciPy
  • Interpolation in SciPy
  • SciPy I/O, SciPy ndimage
  • Optimization and root finding
  • SciPy.Stats
  • Introduction to SciKit Learn and Machine Learning
  • Sample Dataset in SciKit Learn
  • Train Test using SciKit Learn
  • Classification IRIS using Decision Trees
  • Holdout Validation, K-fold cross Validation
  • Cross Validation using SciKit Learn
  • K-means Clustering in SciKit Learn
    • Introduction to Nature Language Processing tool kit
    • Tokenization, Lower casing and removing stop words, Lemmatization, Stemming
    • ngrams, Sentence tokenization, Part of speech tagging
    • Chunking, Named Entity Recognition
    • Introduction to WordNet, and word sense disambiguation
    • Word ladders game
    • Data Analysis and Prediction using the Loan Prediction Dataset

    R for data science

    While Python has been used by many programmers even before they were introduced to data science, R has its main focus on statistics, data analysis, and graphical models. R is meant mainly for data science. Just like Python, R has also has very good community support. Python is good for beginners, R is good for experienced data scientists. R provides the most comprehensive statistical analysis packages.

    In this module, I will again talk about both theory as well as hands-on about various aspects of R. I will use the R Studio for hands-on. I will discuss basic programming aspects of R as well as visualization using R. Then, I will talk about how to use R for exploratory data analysis, for data wrangling, and for building models on labeled data. Overall, I will cover whatever you need to do good data science using R.

    • R Vs Python
    • Basics of R
    • Data Exploration in R
    • Customizations for ggplot in R
    • Common Problems, Facets, Geoms
    • Statistical Transformation
    • Position Adjustments
    • Coordinate Systems
    • Introduction to R Studio
    • RStudio Editor
    • Keyboard shortcuts
    • RStudio Diagnostics
    • Introduction to dplyr
    • dplyr-filter
    • dplyr-arrange, select
    • dplyr-mutate
    • dplyr-summarize
    • dplyr-Grouping and Ungrouping
    • Introduction to Exploratory Data Analysis
    • Variation
    • Covariation
    • Introduction to Data Wrangling and Tibbles
    • Tibbles Vs Data Frames
    • Introduction to Readr and Read csv
    • Parsing Vector
    • Parsing a file using Readr
    • Writing to files
    • Introduction to tidy data
    • Spreading and Gathering
    • Separating and Unite
    • Missing Values
    • Relational Data in Keys
    • Mutating joins in dplyr
    • Filtering joins and Set operations
    • Introduction to Strings and Combining Strings
    • Regular Expressions
    • Creating Factors using forcats
    • Visualization and reordering of categorical variables
    • Creating Date/Time objects
    • Date/Time Components
    • Time Spans
    • Details about Pipe operator
    • Tools in magrittr
    • Functions in R
    • Conditional execution and function arguments
    • Variable Arguments in R
    • Return values in R
    • Basics of vector in R
    • Basics of Atomic vectors
    • Coercion, Test functions and Recyling rules
    • Naming and subset
    • Lists
    • Augmented vectors
    • For loop and variations
    • Passing functions as an arguments
    • Map Functions
    • Dealing with failure
    • Advanced purrr
    • other patterns of for loop
    • Introduction to modeling
    • Building your first simple model in R
    • Visualizing models in R
    • Modeling with categorical variables
    • Modeling with mix of categorical variables
    • Data Analysis using R: Why Are Low-Quality Diamonds More Expensive?

    Probability and Statistics

    Probability and statistics helps in understanding whether data is meaningful, including inference, testing, and other methods for analyzing patterns in data and using them to predict, understand, and improve results.

    We live in an uncertain and complex world, yet we continually have to make decisions in the present with uncertain future outcomes. To study, or not to study? To invest, or not to invest? To marry, or not to marry? This is what is captured mathematically using the notion of probability. Statistics on the other hand, helps us analyze data sets, and correctly interpret results to make solid, evidence-based decisions.

    In this module, I will discuss some very fundamental terms/concepts related to probability and statistics that often come across any literature related to Machine Learning and AI. Key topics include quantifying uncertainty with probability, descriptive statistics, point and interval estimation of means, central limit theorem, and the basics of hypothesis testing.

    • Introduction to Probability
    • Events, Sample space, Simple Probability, Join Probability
    • Mutually Exclusive events collectively exhaustive events marginal probability
    • Addition Rule
    • Conditional Probability
    • Multiplication Rule
    • Bayes theorem
    • Counting rules caution advanced stuff
    • What are probability distributions
    • Poisson Probability Distribution
    • Normal Probability Distribution
    • Binomial Probability Distribution
    • Central Limit Theorem
    • CLT Example
    • CLT Using R-code
    • Confidence Intervals of Mean
    • Confidence Intervals of Mean Examples
    • Confidence interval of mean in details
    • Confidence interval for the mean with population deviation unknow
    • Confidence interval using Python
    • What do confidence intervals actually mean
    • Confidence intervals for pop mean with unknown pop std dev using Python
    • what is hypothesis testing? Null and alternative hypothesis
    • Hypothesis testing for pop mean type1 and type2 errors
    • 1-tailed hypothesis testing (known sigma)
    • 2-tailed hypothesis testing (known sigma)
    • Hypothesis testing (unknown sigma)
    • 2-sample tests
    • Independent 2-sample t-tests
    • Paired 2-sample t-tests
    • Chi-squared tests of independence
    • Descriptive Vs Inferential statistics
    • Central Tendency (mean, median, mode)
    • Measures of dispresion (Range, IQR, std dev, variance)
    • Five Number summary and skew
    • Graphic displays of basic statistical descriptions
    • Correlation Analysis

    Machine Learning

    Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us self-driving cars, practical speech recognition, effective web search, and a vastly improved understanding of the human genome. Machine Learning is a first-class ticket to the most exciting careers in data science. As data sources proliferate along with the computing power to process them, automated predictions have become much more accurate and dependable. Machine learning brings together computer science and statistics to harness that predictive power. It’s a must-have skill for all aspiring data analysts and data scientists, or anyone else who wants to wrestle all that raw data into refined trends and predictions.

    In this module, broadly I will talk about supervised as well as unsupervised learning. We will talk about multiple types of classifiers like Naïve Bayes, KNN, decision trees, SVMs, artificial neural networks, logistic regression, and ensemble learning. Further, we will also talk about linear regression analysis, sequence labeling using HMMs. As part of unsupervised learning, I will discuss clustering as well as dimensionality reduction. Finally, we will also discuss briefly about semi-supervised learning, mult-task learning, architecting ML solutions, and a few ML case studies.

    • Introduction to machine learning
    • Supervised, semisupervised, unsupervised machine learning
    • Types of data sets
    • Data() in R
    • Introduction to classification
    • Introduction to Decision tree
    • Hunt's algorithm for learning a decision tree
    • Details of tree induction
    • GINI index computation
    • ID3, Entropy and information gain
    • ID3 Example
    • C4.5
    • Pruning
    • Metrics for performance Evaluation
    • Iris Decision Tree Example
    • Introduction to KNN algorithm
    • Decision boundary KNN Vs Decision tree
    • What is the best K
    • KNN Problems
    • Feature selection using KNNs
    • Wilson Editing
    • KNN Imputation
    • Speeding up KNN using KMeans
    • Coding up KNN from scratch in Python
    • KNN using sklearn
    • Digits classification using KNN in Python
    • Examples of few text classification problems
    • Classification for text using bag of words
    • Naïve Bayes for text classification
    • Multinomial Naïve Bayes
    • Multinomial Naïve Bayes Example
    • Naïve Bayes for Hand-written digit recognition
    • Naïve Bayes for weather data
    • Numeric stability issue with Naïve bayes
    • Gaussian Naïve Bayes from scratch in Python
    • Naïve Bayes using sklearn
    • Multinomial Naïve Bayes
    • Linear Classifiers
    • Margin of SVM's
    • SVM optimization
    • SVM for Data which is not linear separable
    • Learning non-linear patterns
    • Kernel Trick
    • SVM Parameter Tuning
    • Handling class imbalance in SVM's
    • SVM's pros and cons and summary
    • Linear SVM using Python
    • SVM with RBF kernel with Python
    • Learning SVM with noise data in Python
    • Introduction to Ensemble learning
    • Why Ensemble learning
    • Independently constructed ensembles for classification: Majority voting
    • Independently constructed ensembles for classification: Bagging
    • Independently constructed ensembles for classification: Random forests
    • Independently constructed ensembles for classification: Error correcting output codes
    • Sequentially constructed ensembles for classification boosting
    • Sequentially constructed ensembles for classification boosting example
    • Sequentially constructed ensembles for classification stacking
    • Introduction to gradient boosted machines (GBM)
    • Relations between GBM gradient Descent
    • GBM regression with squared loss
    • Bagging in Python
    • Random forests in Python
    • Boosting in Python
    • Feature importance using ensemble classifiers
    • XGBoost in Python
    • Parameter tuning for GBM's
    • Voting classifier using skLearn
    • Motivation for Artificial Neural Network
    • Mimicing a single neuron, integration function, Activation Function
    • Perceptron Algorithm
    • Perceptron Algorithm Example
    • Decision Boundary for a single Neuron
    • Learning Non-Linear Patterns
    • Introduction to Deep Learning
    • What can we achieve using a single hidden layers
    • MLPs with Sigmoid activation Function
    • Layers are transformation into a new space
    • Playing at the Tensorflow playground
    • Cost function, Loss function, Error Surface
    • How to learn Weights
    • Stochastic Gradient descent, Minibatch SGD, Momentum
    • Choosing a learning Rate
    • Updaters
    • Back Propagation
    • Softmax and Binary/Multi-class cross entropy loss
    • Overfitting and Regularization
    • Practical Advice on using Neural Networks
    • Autonomous Vehicles
    • Automated Feature Learning using Neural Networks
    • Deep Learning Architectures and Libraries
    • Applications of Artificial Neural Networks
    • History of Artificial Neural Networks and Revival
    • Python Code: Basic Introduction to Tensorflow: Constants, Placeholders and Variables.
    • Python Code: Learning the first Tensorflow model: Linear Regression using Tensorflow.
    • Python Code: MLP for Hand-written digit recognition with no hidden layer with 10 output neurons
    • Python Code: MLP for Hand-written digit recognition with two hidden layers
    • Python Code: Fashion Multi-class classification using MLP in Keras
    • Introduction to Linear Regression
    • Understanding the real meaning of Linear Regression
    • 𝑹^𝟐: Coefficient of Determination
    • Multiple Linear Regression and Non-linear Regression
    • Assumptions for Linear Regression
    • Using Residual to Verify the Assumptions for Linear Regression
    • Deriving Linear Regression Formulas using Ordinary Least Squares Method
    • Multiple Linear Regression
    • Underfitting, Overfitting, Bias and Variance
    • Ridge Regularization
    • Lasso Regularization, Elastic Net Regularization
    • Metrics and Practical Considerations for Regression
    • Python code: Simple Linear Regression using sklearn
    • Python code: Example to code up regression using ordinary least squares method
    • Python code: Multiple Linear Regression using Gradient Descent based approach
    • Python code: Multiple Linear Regression using sklearn
    • Python code: Ridge and Lasso Regression
    • Logistic regression vs Linear Regression
    • Can we use Regression Mechanism for Classification?
    • Logistic Regression – Deriving the Formula
    • Logistic Regression for Multi-class Classification
    • Logistic Regression Decision Boundary
    • Python Code: Logistic regression on the titanic dataset- Part 1
    • Python Code: Logistic regression on the titanic dataset- Part 2
    • Python Code: Logistic regression on the titanic dataset- Part 3
    • Python Code: Logistic regression on the titanic dataset- Part 4
    • Python Code: Visualizing a logistic regression model
    • What is feature selection? Why feature selection?
    • Feature selection vs feature extraction
    • Feature subset selection using Filter based methods
    • More Filter based methods for feature selection
    • Wrapper Methods and their Comparison with Filter Methods
    • Wrapper Methods
    • Embedded Methods
    • Model based machine learning with regularization
    • Regularization using L2
    • Regularization using L1
    • Python Code: Feature Extraction with Univariate Statistical Tests (Chi-squared for classification)
    • Python Code: Recursive Feature Elimination -- wrapper
    • Python Code: Choosing important features (feature importance)
    • Python Code: Feature Selection using Variance Threshold
    • Introduction to Sequence Learning
    • Sequence Labeling as Classification
    • Probabilistic Sequence Models
    • Hidden Markov Model
    • Details about HMMs
    • Dishonest Casino Example of an HMM
    • Three Problems of an HMM
    • Decoding Problem of an HMM and the Viterbi Algorithm
    • Evaluation Problem of an HMM
    • The Forward Algorithm
    • The Backward Algorithm and the Posterior Decoding
    • The Learning Problem of an HMM, The Baum Welch Algorithm
    • Conditional Random Fields (CRFs)
    • Why prefer CRFs over HMMs?
    • Python code: Creating a simple Gaussian HMM
    • Python code: Learning a Gaussian HMM
    • Python code: Sampling from HMM
    • Python Code: Use CoNLL 2002 data to build a NER system: Understand the dataset
    • Python Code: Use CoNLL 2002 data to build a NER system: Define features
    • Python Code: Use CoNLL 2002 data to build a NER system: Learn and evaluate the CRF
    • Python Code: Use CoNLL 2002 data to build a NER system: Hyper-parameter Optimization
    • Python Code: Use CoNLL 2002 data to build a NER system: Feature Importances
    • Applications of Clustering
    • Understanding Distance
    • Basics of Clustering
    • Hierarchical (Agglomerative) clustering Part 1
    • Hierarchical (Agglomerative) clustering Part 2
    • K-means Algorithm example
    • K-means Algorithm details
    • Problems with K-means
    • Evaluation of cluster quality
    • Engineering issues with clustering
    • Soft clustering and EM algorithm example
    • Clustering summary
    • Python code: Kmeans Example
    • Python code: Kmeans on digits Example
    • Python code: Clustering for color compression
    • Mini Batch KMeans
    • Python code: Agglomerative Hierarchical Clustering
    • Ensemble Methods for Clustering: Problem Definition
    • Ensemble Methods for Clustering: Image Segmentation
    • Ensemble Methods for Clustering: Broad Approach
    • Ensemble Methods for Clustering: Finding Corresponding Clusters
    • Ensemble Methods for Clustering: Combining Corresponding Clusters
    • Why PCA?
    • PCA: A Layman's Introduction
    • Understanding Matrix Transformations and Definition of Eigen Vectors
    • How is PCA Computed?
    • PCA Examples
    • Relationship between PCA, Curve Fitting and Entropy
    • Eigenfaces in OpenCV
    • Kernel PCA
    • Python Code: Compute PCA and show components
    • Python Code: PCA as dimensionality reduction
    • Python Code: PCA for visualization: Hand-written digits
    • Python Code: Eigenfaces
    • LDA
    • PCA vs LDA
    • 2 class LDA
    • 2 class LDA: Computing within and Between Class Scatter
    • 2 class LDA Full Example
    • LDA for C classes
    • Limitations of LDA
    • Python Code: LDA on Wine dataset
    • Python Code: LDA from Scikit Learn on Iris dataset
    • Python Code: LDA on Iris dataset from scratch
    • Machine Learning Process
    • Qualities of a Classifier
    • Technical Practical Issues in ML
    • Non-Technical Practical Issues in ML
    • Machine Learning for Healthcare – Part 1
    • Machine Learning for Healthcare – Part 2
    • Machine Learning for Internet Service Providers
    • Machine Learning for People Analytics
    • Machine Learning for Retail and Telecom – Part 1
    • Machine Learning for Retail and Telecom – Part 2
    • Machine Learning for Supply Chain Management
    • Machine Learning for Agriculture
    • Machine Learning for Education
    • Machine Learning for Transportation and self-driving cars
    • Machine Learning for Connected Cars
    • Machine Learning for Legal Domain – Part 1
    • Machine Learning for Legal Domain – Part 2
    • Machine Learning for Oil Industry
    • Machine Learning for Banking Domain – Part 1
    • Machine Learning for Banking Domain – Part 2
    • Machine Learning for Insurance
    • Machine Learning for Project Management
    • Machine Learning for Fashion Industry
    • Other use-cases of Machine Learning
    • Learning various classifiers on Iris dataset
    • MLP for hand-written digit recognition
    • Logistic regression on the titanic dataset
    • Use CoNLL 2002 data to build a NER system

    Data Mining

    The area of Data Mining specifically deals with topics like pattern mining, OLAP, data cubes, and outlier detection. Frequent pattern mining deals with mining frequent subsets, subsequences or subgraphs from transactional, sequence or graph datasets respectively. These are very useful for Basket data analysis, cross-marketing, catalog design, sale campaign analysis, Web log (click stream) analysis, and DNA sequence analysis. OLAP enables users to quickly analyze information that has been summarized into multidimensional views and hierarchies. By summarizing predicted queries into multidimensional views prior to run time, OLAP tools provide the benefit of increased performance over traditional database access tools. Outlier analysis has numerous applications in a wide variety of domains such as the financial industry, quality control, fault diagnosis, intrusion detection, web analytics, and medical diagnosis.

    In this module, I will cover basic methods for pattern mining like Apriori and FP growth. I will also cover basic concepts in OLAP and in outlier detection.

    • What is frequent pattern mining? What are the applications?
    • Understanding frequent patterns, association rules, support and confidence
    • Apriori Frequent Pattern Mining Method
    • Improving Apriori Frequent Pattern Mining Method: Less scans
    • FP Growth Frequent Pattern Mining Method: Building an FP tree
    • FP Growth Frequent Pattern Mining Method: Creating Conditional Pattern Bases
    • FP Growth Frequent Pattern Mining Method: Extracting Frequent Patterns
    • Comparing Apriori with FP Growth
    • ECLAT: Frequent Pattern Mining with Vertical Data Format
    • Which association rules are interesting? Lift, Chi Square
    • Which association rules are interesting? Null invariance
    • Understanding closed patterns and max patterns
    • Summary of frequent pattern mining
    • Python code: Hand-computing support and confidence
    • Python code: Association Rule Mining
    • Python code: Apriori
    • Python code: Evaluating lift for association rules
    • Python code: Problem on computing association rules with 100% confidence
    • Python code: Orange way of computing association rules and frequent patterns
    • Basic Concepts in Data Warehousing
    • OLTP vs OLAP
    • Data Warehouse Architecture
    • Data Warehouse Modeling: Data Cubes
    • Conceptual Modeling of Data Warehouses
    • Concept Hierarchies and Types of Measures
    • Data Cube Example
    • OLAP Operations
    • Data Warehouse: Design and Usage
    • Data Cube Computation and Query Processing
    • Data Cube Computation: Preliminary Concepts
    • Efficient Data Cube Computation
    • Multi-Way Array Aggregation
    • Bottom-Up Computation (BUC)
    • High-Dimensional OLAP – Part 1
    • High-Dimensional OLAP – Part 2
    • Introduction to Sampling Cube
    • Query Expansion in Sampling Cube
    • Python Code: Introduction to OLAP and OLAP Server API in Python Cubes 1.1
    • Python Code: Loading data, specifying model and building aggregates in Python Cubes 1.1
    • What are outliers? What is outlier analysis?
    • Broad overview of outlier detection Methods
    • Statistical Methods for Outlier Detection
    • Proximity based Methods for Outlier Detection: Distance based outliers
    • Proximity based Methods for Outlier Detection: Density based outliers
    • Clustering based Methods for Outlier Detection
    • Classification based Methods for Outlier Detection
    • Outlier Detection for high dimensional data
    • Python Code: Remove values > 2 std dev from mean
    • Python Code: Percentile based outliers vs median absolute deviation based outliers
    • Python Code: Example of using LOF for outlier detection
    • Python Code: Example of using Cluster-based Local Outlier Factor (CBLOF) for outlier detection
    • Python Code: Example of using one class SVM for outlier detection using pyod
    • Python Code: Example of using PCA for outlier detection
    • Python Code: One class SVM using scikit learn for outlier detection

    Text Mining and Analytics

    Text mining includes techniques for mining and analyzing text data to discover interesting patterns, extract useful knowledge, and support decision making, with an emphasis on statistical approaches that can be generally applied to arbitrary text data in any natural language with no or minimum human effort.

    This module will introduce the learner to text mining and text manipulation basics. We cover basics of text processing including regular expressions in the R and Python modules itself. Also, I talked about text classification in the machine learning module. Further, in this module, I will talk about further interesting topics in text mining such as n-gram models, Named Entity Recognition, Natural Language Processing, Sentiment Analysis, and Summarization.

    • n-gram models
    • Named entity recognition
    • Natural Language Processing
    • Sentiment Analysis
    • Summarization
    • Topic Modeling
    • Word Representation learning

    Web Mining

    Web Mining deals with analytics on web related data. How do search engines return relevant results so quickly for various queries? How do these search engines work? How does Amazon recommend products to its users? How are social networks formed and how do they grow? How do people influence each other on social networks? How do search engines make money through ads? How can you use the wisdom of the crowds to generate useful and credible information?

    The course will take the participants through understanding of the basic information retrieval concepts, web mining concepts, architecture of search engines, and applications. In this module aims to provide a conceptual and practical understanding of various aspects of web mining starting with the basics of web search to discussions about recent topics studied in the World Wide Web community. Topics covered will include: crawling, indexing, ranking, analysis of social networks, recommendation systems, and basics of computational advertising.

    • Text indexing
    • Crawling
    • Relevance ranking
    • Pagerank
    • Recommendation Systems
    • Social Network Analysis
    • Social Influence Analysis
    • Event Detection from Twitter
    • Location Prediction in Twitter
    • Computational Advertizing
    • Crowdsourcing
    • Mining Structured Information from the Web
    • Entity Resolution in the Web of Data

    Data Collection

    Data scientist is the sexiest job of the 21st century. When performing data science, a lot of time is spent in collecting useful data and pre-processing it. If the collected data is of bad quality, it can lead to bad quality models. Hence, it is very important to understand how to collect good quality data. Also, it is important to understand various ways in which data can be collected.

    In this module I will discuss different aspects of data collection. I will begin with discussions around decisions to make while doing data collection, data collection rules and approaches, and ways of performing data collection. Further, data can be collected from the web by scraping. Hence, we will learn how to perform basic scraping. Lastly, we will discuss briefly about collecting graph data as well data collection using IoT sensors.

    • Basics of Data Collection
    • Web Scraping
    • Twitter Scraping example
    • Graph data collection
    • Sensor Data collection
    • IoT

    Deep Learning

    Deep learning has caught a great momentum in the last few years. Research in the field of deep learning is progressing amazingly fast. Deep Learning is a rapidly growing area of machine learning. Machine learning has seen numerous successes but applying learning algorithms today often means spending a long time hand-engineering the input feature representation. This is true for many problems in vision, audio, NLP, robotics, and other areas. To address this, researchers have developed deep learning algorithms that automatically learn a good representation for the input. These algorithms are today enabling many groups to achieve ground-breaking results in vision, speech, language, robotics, and other areas.

    I already discuss the basics of artificial neural networks in the machine learning module. Further, in this module, I will focus on other popular deep learning architectures like Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs) and Long Short Term Memory (LSTMs) Networks.

    • TensorFlow
    • CNNs
    • RNNs
    • LSTMs
    • Auto-encoders

    Visualization

    For any good data science story, it is very important to visualize it nicely. Visualizations help us understand data and insights much better.

    I cover basics of visualization in R and Python in those respective modules. In this module, I will talk about innovative ways of visualizing complex and large data.

    • Complex Visualizations
    • Visualizing Large Data

    Target Audience


    The course content and Teaching Methodology is built to cater to the needs of students at various levels of expertise and varied background skills/competencies.

    Learn to Excel. You have to put your time and efforts to learn from this course as we teach from the basics and all that you need to have is a very basic knowledge of Programming and a strong determination to LEARN.

    • Here is a list of aspirants who would benefit from our course:
    • Undergraduate (BS/BTech/BE) students in Engineering, Technology and Science.
    • Post Graduate (MS/MTech/ME/MCA) students in Engineering, Technology and Science.
    • Working Professionals: Software Engineers, Business Analysts, Product & Program Managers, Enthusiasts involved in building ML Products & Services.

    COURSE FEATURES

  • Duration 200+ Hrs
  • Quizzes Yes
  • Assignments Yes
  • Projects Yes
  • Disclaimer


    Please note that the videos are not downloadable. Sharing your access or trying to sell or distribute videos is a legally punishable offence. Earlier we caught some people doing this and they were punished legally and a huge penalty was imposed on them.


    Raise a Complaint