Virtual University of Pakistan Data Warehousing Lecture-31

16 Slides673.50 KB

Virtual University of Pakistan Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research www.nu.edu.pk/cairindex.asp National University of Computers & Emerging Sciences, Islamabad Email: [email protected]

Data Structures in Data Mining Data matrix – Table or database – n records and m attributes, – n m . . . C1,2 C1,3 C2,1 C2,2 C2,3 C2,m C3,1 C3,2 C3,3 C3,m Cn,1 Cn,2 Cn,3 1 S1,2 S1,3 S2,1 1 S2,3 S2,n S3,1 S3,2 1 S3,n Sn,1 Sn,2 Sn,3 Similarity matrix – Symmetric square matrix – n x n or m x m C1,1 . . . C1,m . . . Cn,m S1,n . . . 1

Main types of DATA MINING Supervised Bayesian Modeling Decision Trees Neural Networks Etc. Type and number of classes are known in advance Unsupervised One-way Clustering Two-way Clustering Type and number of classes are NOT known in advance

Clustering: Min-Max Distance Intra-cluster distances are minimized outlier Inter-cluster distances are maximized Salary 20 40 Age 60

How Clustering works?

One-way clustering example Black spots are noise INPUT OUTPUT White spots are missing data

Data Mining Agriculture data clusters INPUT Clustered OUTPUT

Classification Which class? Classifier (model) Unseen Data

How Classification work? Inputs Output Confidence Level

Classification Process (1): Model Construction Relationship between shopping time and items bought Training Data Classification Algorithms (observations, measurements, etc.) NAME Time Items Gender Moin 10 2 M Munir 16 3 M Meher 15 1 F Javed 5 1 M Mahin 20 1 F Akram 20 4 M Classifier (Model) IF time/items 6 THEN gender ‘F’

Classification Process (2): Use the Model in Prediction Classifier Testing Data NAME Time Items Gender Tahir 20 1 M Younas 11 2 M Yasin 3 1 M Unseen Data (Firdous, Time 15 Items 1) Gender?

Clustering vs. Cluster Detection

Clustering vs. Cluster Detection Example A B

The K-Means Clustering

The K-Means Clustering: Example A B 10 10 9 9 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0 0 1 2 3 4 5 6 7 8 9 0 10 10 10 9 9 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 1 2 3 4 5 6 7 8 9 10 0 0 1 2 3 4 D 5 6 7 8 9 10 0 1 2 3 4 5 6 C 7 8 9 10

The K-Means Clustering: Comment

Back to top button