Course Overview

Course Synopsis

This course discusses techniques for preprocessing data before mining and presents the concepts related to data warehousing, online analytical processing (OLAP), and data generalization. It presents methods for mining frequent patterns, associations, and correlations. It also presents methods for data classification and prediction, dataclustering approaches, and outlier analysis.

Course Learning Outcomes

Students will be able to:
 1. Understand Data Warehouse fundamentals, Data Mining Principles
 2. Design data warehouse with dimensional modelling and apply OLAP operations.
 3. Identify appropriate data mining algorithms to solve real world problems
 4. Compare and evaluate different data mining techniques like classification, prediction, clustering and association rule mining
 5. Describe complex data types with respect to spatial and web mining.
 6. Benefit the user experiences towards research and innovation. integration.

Course Calendar


Week 01
1

Introduction: Why Data Mining?

2

Introduction: What Is Data Mining?

3

Introduction: A MultiDimensional View of Data Mining

4

Introduction: What Kind of Data Can Be Mined?

5

Introduction: Are all Patterns are interesting?

6

Introduction: What Technology Are Used?

7

Introduction: What Kind of Applications Are Targeted?

8

Introduction: Major Issues in Data Mining

9

Data Objects and Attribute Types: Types of Data Sets

10

Data Objects and Attribute Types: Important Characteristics of Structured Data

11

Data Objects and Attribute Types: Data Objects

12

Data Objects and Attribute Types: Attributes

13

Data Objects and Attribute Types: Attribute Types

14

Data Objects and Attribute Types: Discrete vs. Continuous Attributes

Week 02
15

Data Visualization: Introduction

16

Data Visualization: PixelOriented Visualization Techniques

17

Basic Statistical Descriptions of Data: Introduction

18

Basic Statistical Descriptions of Data: Measuring the Central Tendency

19

Basic Statistical Descriptions of Data: Symmetric vs. Skewed Data

20

Basic Statistical Descriptions of Data: Measuring the Dispersion of Data

21

Basic Statistical Descriptions of Data: Box plot Analysis

22

Basic Statistical Descriptions of Data: Graphic Displays of Basic Statistical Descriptions using Histogram

23

Basic Statistical Descriptions of Data: Graphic Displays of Basic Statistical Descriptions using Quantile Plot

24

Basic Statistical Descriptions of Data: Graphic Displays of Basic Statistical Descriptions using Scatter plot

Week 03
25

Data Visualization: Geometric Projection Visualization Techniques

26

Data Visualization: IconBased Visualization Techniques

27

Data Visualization: Hierarchical Visualization Techniques

28

Data Visualization: Hierarchical Visualization examples

29

Measuring Data Similarity and Dissimilarity: Introduction ( Videos are not directly watching but just downloading)

30

Measuring Data Similarity and Dissimilarity: Data Matrix and Dissimilarity Matrix

31

Measuring Data Similarity and Dissimilarity: Proximity Measure

32

Measuring Data Similarity and Dissimilarity: Standardizing Numeric Data

33

Measuring Data Similarity and Dissimilarity: Distance on Numeric Data

34

Measuring Data Similarity and Dissimilarity: Attributes of Mixed Type

35

Measuring Data Similarity and Dissimilarity: Cosine Similarity

36

Why Preprocess the Data: Introduction

37

Why Preprocess the Data: Why Is Data Dirty?

38

Why Preprocess the Data: MultiDimensional Measure of Data Quality

39

Why Preprocess the Data: Major Tasks in Data Preprocessing

40

Data Cleaning: Introduction

41

Data Cleaning: Missing Data

42

Data Cleaning: Noisy Data

43

Data Cleaning: How to Handle Noisy data using Binning

44

Data Cleaning: How to Handle Noisy data using Regression and Cluster Analysis

Week 04
45

Data integration and transformation: Introduction

46

Data integration and transformation: Handling Redundancy in Data Integration

47

Data integration and transformation: Detect Redundancy in Data Integration using Corelation analysis

48

Data integration and transformation: Data Transformation methods

49

Data integration and transformation: Normalization Example

50

Data reduction: Introduction

51

Data reduction: Data cube aggregation

52

Data reduction: Data Compression

53

Data reduction: Dimensionality Reduction using Wavelet Transformation

54

Data reduction: Dimensionality Reduction using PCA

55

Data reduction: Numerosity Reduction

56

Data reduction: Numerosity Reduction using Regression and LogLinear Models

57

Data reduction: Numerosity Reduction using Histogram

58

Data reduction: Numerosity Reduction using Clustering

59

Data reduction: Numerosity Reduction using Sampling

Week 05
60

What is a data warehouse?: Introduction

61

What is a data warehouse?: SubjectOriented

62

Data warehouse architecture

63

What is a data warehouse?: Data Warehouse vs. Operational DBMS

64

Data warehouse architecture: Data Warehouse Models

65

Data WarehouseMetadata

66

A multidimensional data model

67

A multidimensional data model: Example of Star Schema

68

A multidimensional data model: Example of Snowflake Schema

69

A multidimensional data model: Example of Fact Constellation

71

Multi Dimensional Data Models

72

A multidimensional data model: Typical OLAP Operations

Week 06
79

Tokenization& issues in tokenization

89

Noisy channel probability

90

Bigram Based Correction

92

Text Classification Examples

94

Formalizing Text Classification

95

Bayes Classification Methods: Why?

96

Naïve Bayes Independence

97

Naïve Bayes Parameters Learning

Week 07
103

Basic Concepts of Mining Frequent patterns: Introduction

104

Basic Concepts of Mining Frequent Patterns, Association and Correlation: Why Is Freq. Pattern Mining Important?

105

Market Basket Analysis

106

Frequent Item set Mining Methods: Apriori: Example

107

Frequent Item set Mining Methods: Apriori: Pseudo Code

108

Frequent Item set Mining Methods: Mining Close Frequent Patterns and Max patterns

109

Frequent Item set Mining Methods: How to Count Supports of Candidates?

110

Frequent Item set Mining Methods: Improving the Efficiency of Apriori

111

Basic Concepts of Mining Frequent Patterns, Association and Correlation: Computational Complexity of Frequent Item set Mining

112

Frequent Item set Mining Methods: ECLAT: Frequent Pattern Mining with Vertical Data Format

113

Which Patterns Are Interesting?—Pattern Evaluation Methods: interest Measure

114

Which Patterns Are Interesting?—Pattern Evaluation Methods: interest Measure2

Week 08
115

Basic Concepts of Classification: Introduction

116

Basic Concepts of Classification: Supervised vs. Unsupervised Learning

117

Basic Concepts of Classification: A TwoStep Process

118

Classification Issues

119

Classification Methods

120

Decision Tree Induction

Week 09
121

Decision TressIntroduction

122

Decision Tree Induction Algorithm

125

Attribute Selection Introduction

129

Attribute Selection Comparison

132

Introduction to Rain Forest

Week 10
133

Example of Rain Forest

135

RuleBased Classification: Using IFTHEN Rules for Classification

136

RuleBased Classification: Rule Extraction from a Decision Tree

137

RuleBased Classification: Rule Induction: Sequential Covering Method

138

Model Evaluation and Selection: Introduction

139

Model Evaluation and Selection: Confusion Matrix

140

Model Evaluation and Selection: Accuracy, Error Rate, Sensitivity and Specificity using Evaluation matrix Matrix

141

Model Evaluation and Selection: Holdout & CrossValidation Methods

142

Model Evaluation and Selection: Bootstrap for evaluation of classifier

143

Model Evaluation and Selection: ROC Curves

144

Model Evaluation and Selection: Issues Affecting Model Selection

Week 11
145

Techniques to Improve Classification Accuracy: Ensemble Methods: Introduction

146

Techniques to Improve Classification Accuracy: Ensemble Methods: Bagging: Bootstrap Aggregation

147

Techniques to Improve Classification Accuracy: Ensemble Methods: Boosting

148

Techniques to Improve Classification Accuracy: Ensemble Methods: Random Forest

149

Classification of Imbalance data

150

Classification by Back propagation: Introduction

151

Classification by Back propagation: Neural Network as a Classifier

152

Classification by Back propagation: A MultiLayer FeedForward Neural Network

153

Classification by Back propagation: Defining a Network Topology

154

Classification by Back propagation: Back propagation

155

Neural Networks Evaluation

156

Support Vector Machines: Introduction

157

Support Vector Machines: History and Applications

158

Support Vector Machines: General Philosophy

Week 12
159

Basic Concepts of Cluster Analysis: What is Cluster Analysis?

160

Basic Concepts of Cluster Analysis: Clustering for Data Understanding and Applications

161

Basic Concepts of Cluster Analysis: Clustering as a Preprocessing Too

162

Basic Concepts of Cluster Analysis: Quality: What Is Good Clustering?

163

Basic Concepts of Cluster Analysis: Measure the Quality of Clustering

165

Basic Concepts of Cluster Analysis: Requirements and Challenges

167

Types of data of Cluster Analysis: Intervalvalued variables

168

Types of data of Cluster Analysis: Binary Variables

169

Types of data of Cluster Analysis: RatioScaled Variables

170

Partitioning Methods: Basic Concept

171

Partitioning Methods: The KMeans Clustering Method

172

Partitioning Methods: Comments on the KMeans Method

173

Partitioning Methods: Variations of the KMeans Method

174

Partitioning Methods: The KMedoids Clustering Method

175

Hierarchical Methods: Introduction

176

Hierarchical Methods: AGNES (Agglomerative Nesting)

177

Hierarchical Methods: DIANA (Divisive Analysis)

178

DensityBased Methods: Introduction

Week 13
180

DensityBased Methods: DBSCAN

181

GridBased Methods: Introduction

182

GridBased Methods: STING: A Statistical Information Grid Approach

183

GridBased Methods: Clustering by Wavelet Analysis

184

ModelBased Methods: Introduction

185

ModelBased Methods: EM — Expectation Maximization

186

ModelBased Methods: Conceptual Clustering

187

ModelBased Methods: COBWEB Clustering Method

188

ModelBased Methods: Neural Network Approach

189

ModelBased Methods: SelfOrganizing Feature Map (SOM)

190

Outlier Analysis: What Is Outlier Discovery?

191

Outlier Analysis: Statistical Approaches for outlier discovery

192

Outlier Analysis: DistanceBased Approach for outlier discovery

193

Outlier Analysis: DeviationBased Approach for outlier discovery

194

Clustering HighDimensional Data: Introduction

195

Clustering HighDimensional Data: The Curse of Dimensionality

196

Clustering HighDimensional Data: Why Subspace Clustering?

197

Clustering HighDimensional Data: CLIQUE

Week 14
201

Preprocessing in WEKA

202

WEKA ClassifierExample

203

Demo of WEKA Classifier

204

WEKA ClassificationResults

207

Association Findings in WEKA

Week 15
209

Web Mining Introduction

210

Introduction to Text Mining

212

Information Extraction

216

Types of Cyber Community

219

Strategies for Web search



