Course Overview
|
Course Synopsis
|
This course discusses techniques for preprocessing data before mining and presents the concepts related to data warehousing, online analytical processing (OLAP), and data generalization. It presents methods for mining frequent patterns, associations, and correlations. It also presents methods for data classification and prediction, data-clustering approaches, and outlier analysis.
|
Course Learning Outcomes
|
Students will be able to:
- 1. Understand Data Warehouse fundamentals, Data Mining Principles
- 2. Design data warehouse with dimensional modelling and apply OLAP operations.
- 3. Identify appropriate data mining algorithms to solve real world problems
- 4. Compare and evaluate different data mining techniques like classification, prediction, clustering and association rule mining
- 5. Describe complex data types with respect to spatial and web mining.
- 6. Benefit the user experiences towards research and innovation. integration.
|
Course Calendar
|
|
Week 01
1
|
Introduction: Why Data Mining?
|
2
|
Introduction: What Is Data Mining?
|
3
|
Introduction: A Multi-Dimensional View of Data Mining
|
4
|
Introduction: What Kind of Data Can Be Mined?
|
5
|
Introduction: Are all Patterns are interesting?
|
6
|
Introduction: What Technology Are Used?
|
7
|
Introduction: What Kind of Applications Are Targeted?
|
8
|
Introduction: Major Issues in Data Mining
|
9
|
Data Objects and Attribute Types: Types of Data Sets
|
10
|
Data Objects and Attribute Types: Important Characteristics of Structured Data
|
11
|
Data Objects and Attribute Types: Data Objects
|
12
|
Data Objects and Attribute Types: Attributes
|
13
|
Data Objects and Attribute Types: Attribute Types
|
14
|
Data Objects and Attribute Types: Discrete vs. Continuous Attributes
|
Week 02
15
|
Data Visualization: Introduction
|
16
|
Data Visualization: Pixel-Oriented Visualization Techniques
|
17
|
Basic Statistical Descriptions of Data: Introduction
|
18
|
Basic Statistical Descriptions of Data: Measuring the Central Tendency
|
19
|
Basic Statistical Descriptions of Data: Symmetric vs. Skewed Data
|
20
|
Basic Statistical Descriptions of Data: Measuring the Dispersion of Data
|
21
|
Basic Statistical Descriptions of Data: Box plot Analysis
|
22
|
Basic Statistical Descriptions of Data: Graphic Displays of Basic Statistical Descriptions using Histogram
|
23
|
Basic Statistical Descriptions of Data: Graphic Displays of Basic Statistical Descriptions using Quantile Plot
|
24
|
Basic Statistical Descriptions of Data: Graphic Displays of Basic Statistical Descriptions using Scatter plot
|
Week 03
25
|
Data Visualization: Geometric Projection Visualization Techniques
|
26
|
Data Visualization: Icon-Based Visualization Techniques
|
27
|
Data Visualization: Hierarchical Visualization Techniques
|
28
|
Data Visualization: Hierarchical Visualization examples
|
29
|
Measuring Data Similarity and Dissimilarity: Introduction ( Videos are not directly watching but just downloading)
|
30
|
Measuring Data Similarity and Dissimilarity: Data Matrix and Dissimilarity Matrix
|
31
|
Measuring Data Similarity and Dissimilarity: Proximity Measure
|
32
|
Measuring Data Similarity and Dissimilarity: Standardizing Numeric Data
|
33
|
Measuring Data Similarity and Dissimilarity: Distance on Numeric Data
|
34
|
Measuring Data Similarity and Dissimilarity: Attributes of Mixed Type
|
35
|
Measuring Data Similarity and Dissimilarity: Cosine Similarity
|
36
|
Why Preprocess the Data: Introduction
|
37
|
Why Preprocess the Data: Why Is Data Dirty?
|
38
|
Why Preprocess the Data: Multi-Dimensional Measure of Data Quality
|
39
|
Why Preprocess the Data: Major Tasks in Data Preprocessing
|
40
|
Data Cleaning: Introduction
|
41
|
Data Cleaning: Missing Data
|
42
|
Data Cleaning: Noisy Data
|
43
|
Data Cleaning: How to Handle Noisy data using Binning
|
44
|
Data Cleaning: How to Handle Noisy data using Regression and Cluster Analysis
|
Week 04
45
|
Data integration and transformation: Introduction
|
46
|
Data integration and transformation: Handling Redundancy in Data Integration
|
47
|
Data integration and transformation: Detect Redundancy in Data Integration using Corelation analysis
|
48
|
Data integration and transformation: Data Transformation methods
|
49
|
Data integration and transformation: Normalization Example
|
50
|
Data reduction: Introduction
|
51
|
Data reduction: Data cube aggregation
|
52
|
Data reduction: Data Compression
|
53
|
Data reduction: Dimensionality Reduction using Wavelet Transformation
|
54
|
Data reduction: Dimensionality Reduction using PCA
|
55
|
Data reduction: Numerosity Reduction
|
56
|
Data reduction: Numerosity Reduction using Regression and Log-Linear Models
|
57
|
Data reduction: Numerosity Reduction using Histogram
|
58
|
Data reduction: Numerosity Reduction using Clustering
|
59
|
Data reduction: Numerosity Reduction using Sampling
|
Week 05
60
|
What is a data warehouse?: Introduction
|
61
|
What is a data warehouse?: Subject-Oriented
|
62
|
Data warehouse architecture
|
63
|
What is a data warehouse?: Data Warehouse vs. Operational DBMS
|
64
|
Data warehouse architecture: Data Warehouse Models
|
65
|
Data Warehouse-Metadata
|
66
|
A multi-dimensional data model
|
67
|
A multi-dimensional data model: Example of Star Schema
|
68
|
A multi-dimensional data model: Example of Snowflake Schema
|
69
|
A multi-dimensional data model: Example of Fact Constellation
|
71
|
Multi Dimensional Data Models
|
72
|
A multi-dimensional data model: Typical OLAP Operations
|
Week 06
79
|
Tokenization& issues in tokenization
|
89
|
Noisy channel probability
|
90
|
Bigram Based Correction
|
92
|
Text Classification Examples
|
94
|
Formalizing Text Classification
|
95
|
Bayes Classification Methods: Why?
|
96
|
Naïve Bayes Independence
|
97
|
Naïve Bayes Parameters Learning
|
Week 07
103
|
Basic Concepts of Mining Frequent patterns: Introduction
|
104
|
Basic Concepts of Mining Frequent Patterns, Association and Correlation: Why Is Freq. Pattern Mining Important?
|
105
|
Market Basket Analysis
|
106
|
Frequent Item set Mining Methods: Apriori: Example
|
107
|
Frequent Item set Mining Methods: Apriori: Pseudo Code
|
108
|
Frequent Item set Mining Methods: Mining Close Frequent Patterns and Max patterns
|
109
|
Frequent Item set Mining Methods: How to Count Supports of Candidates?
|
110
|
Frequent Item set Mining Methods: Improving the Efficiency of Apriori
|
111
|
Basic Concepts of Mining Frequent Patterns, Association and Correlation: Computational Complexity of Frequent Item set Mining
|
112
|
Frequent Item set Mining Methods: ECLAT: Frequent Pattern Mining with Vertical Data Format
|
113
|
Which Patterns Are Interesting?—Pattern Evaluation Methods: interest Measure
|
114
|
Which Patterns Are Interesting?—Pattern Evaluation Methods: interest Measure-2
|
Week 08
115
|
Basic Concepts of Classification: Introduction
|
116
|
Basic Concepts of Classification: Supervised vs. Unsupervised Learning
|
117
|
Basic Concepts of Classification: A Two-Step Process
|
118
|
Classification Issues
|
119
|
Classification Methods
|
120
|
Decision Tree Induction
|
Week 09
121
|
Decision Tress-Introduction
|
122
|
Decision Tree Induction Algorithm
|
125
|
Attribute Selection Introduction
|
129
|
Attribute Selection Comparison
|
132
|
Introduction to Rain Forest
|
Week 10
133
|
Example of Rain Forest
|
135
|
Rule-Based Classification: Using IF-THEN Rules for Classification
|
136
|
Rule-Based Classification: Rule Extraction from a Decision Tree
|
137
|
Rule-Based Classification: Rule Induction: Sequential Covering Method
|
138
|
Model Evaluation and Selection: Introduction
|
139
|
Model Evaluation and Selection: Confusion Matrix
|
140
|
Model Evaluation and Selection: Accuracy, Error Rate, Sensitivity and Specificity using Evaluation matrix Matrix
|
141
|
Model Evaluation and Selection: Holdout & Cross-Validation Methods
|
142
|
Model Evaluation and Selection: Bootstrap for evaluation of classifier
|
143
|
Model Evaluation and Selection: ROC Curves
|
144
|
Model Evaluation and Selection: Issues Affecting Model Selection
|
Week 11
145
|
Techniques to Improve Classification Accuracy: Ensemble Methods: Introduction
|
146
|
Techniques to Improve Classification Accuracy: Ensemble Methods: Bagging: Bootstrap Aggregation
|
147
|
Techniques to Improve Classification Accuracy: Ensemble Methods: Boosting
|
148
|
Techniques to Improve Classification Accuracy: Ensemble Methods: Random Forest
|
149
|
Classification of Imbalance data
|
150
|
Classification by Back propagation: Introduction
|
151
|
Classification by Back propagation: Neural Network as a Classifier
|
152
|
Classification by Back propagation: A Multi-Layer Feed-Forward Neural Network
|
153
|
Classification by Back propagation: Defining a Network Topology
|
154
|
Classification by Back propagation: Back propagation
|
155
|
Neural Networks Evaluation
|
156
|
Support Vector Machines: Introduction
|
157
|
Support Vector Machines: History and Applications
|
158
|
Support Vector Machines: General Philosophy
|
Week 12
159
|
Basic Concepts of Cluster Analysis: What is Cluster Analysis?
|
160
|
Basic Concepts of Cluster Analysis: Clustering for Data Understanding and Applications
|
161
|
Basic Concepts of Cluster Analysis: Clustering as a Preprocessing Too
|
162
|
Basic Concepts of Cluster Analysis: Quality: What Is Good Clustering?
|
163
|
Basic Concepts of Cluster Analysis: Measure the Quality of Clustering
|
165
|
Basic Concepts of Cluster Analysis: Requirements and Challenges
|
167
|
Types of data of Cluster Analysis: Interval-valued variables
|
168
|
Types of data of Cluster Analysis: Binary Variables
|
169
|
Types of data of Cluster Analysis: Ratio-Scaled Variables
|
170
|
Partitioning Methods: Basic Concept
|
171
|
Partitioning Methods: The K-Means Clustering Method
|
172
|
Partitioning Methods: Comments on the K-Means Method
|
173
|
Partitioning Methods: Variations of the K-Means Method
|
174
|
Partitioning Methods: The K-Medoids Clustering Method
|
175
|
Hierarchical Methods: Introduction
|
176
|
Hierarchical Methods: AGNES (Agglomerative Nesting)
|
177
|
Hierarchical Methods: DIANA (Divisive Analysis)
|
178
|
Density-Based Methods: Introduction
|
Week 13
180
|
Density-Based Methods: DBSCAN
|
181
|
Grid-Based Methods: Introduction
|
182
|
Grid-Based Methods: STING: A Statistical Information Grid Approach
|
183
|
Grid-Based Methods: Clustering by Wavelet Analysis
|
184
|
Model-Based Methods: Introduction
|
185
|
Model-Based Methods: EM — Expectation Maximization
|
186
|
Model-Based Methods: Conceptual Clustering
|
187
|
Model-Based Methods: COBWEB Clustering Method
|
188
|
Model-Based Methods: Neural Network Approach
|
189
|
Model-Based Methods: Self-Organizing Feature Map (SOM)
|
190
|
Outlier Analysis: What Is Outlier Discovery?
|
191
|
Outlier Analysis: Statistical Approaches for outlier discovery
|
192
|
Outlier Analysis: Distance-Based Approach for outlier discovery
|
193
|
Outlier Analysis: Deviation-Based Approach for outlier discovery
|
194
|
Clustering High-Dimensional Data: Introduction
|
195
|
Clustering High-Dimensional Data: The Curse of Dimensionality
|
196
|
Clustering High-Dimensional Data: Why Subspace Clustering?
|
197
|
Clustering High-Dimensional Data: CLIQUE
|
Week 14
201
|
Preprocessing in WEKA
|
202
|
WEKA Classifier-Example
|
203
|
Demo of WEKA Classifier
|
204
|
WEKA Classification-Results
|
207
|
Association Findings in WEKA
|
Week 15
209
|
Web Mining Introduction
|
210
|
Introduction to Text Mining
|
212
|
Information Extraction
|
216
|
Types of Cyber Community
|
219
|
Strategies for Web search
|
|
|
|