Matrices of Rule base classifier in data mining

Friday, February 15, 2013

Matrices of Rule base classifier in data mining

Matrices of rule based classifier

Coverage and Accuracy:

Given a tuple, X, from a class labeled data set, D, let ncovers be the number of tuples covered by R; ncorrect be the number of tuples correctly classified by R; and |D| be the number of tuples in D. We can define the coverage and accuracy of R as

Coverage (R) = ncovers / |D|;

Accuracy (R) = ncorrect / ncovers;

e.g Consider rule R1 ,which covers 2 of the 14 tuples. It can correctly classify both tuples.

Therefore coverage (R1) =2/14= 14.28%

Accuracy (R1) =2/2= 100%

Thus Accuracy of the rule is the percentage of the instances that satisfy both the antecedent and consequent of a rule.

However accuracy matrices has the limitation that it doesn’t take in to account the rule’s coverage. It has the potential problem with estimating posterior probabilities from training data. If the class conditional probability for one of the attributes is zero ,then overall posterior probability for the class vanishes. This approach is brittle especially when there are few training examples available and the number of attributes is large. To overcome this limitation of accuracy Laplace and M-estimates are used.

Laplace and m-estimates:

Laplace metric is given by:

Laplace= (n1+1) / (n+k); M-estimate= (n1+K*p) / (n+k)

Where

n= number of instances covered by rule

n1= number of positive instances covered by rule

k= number of classes

p= prior probability

source:

book: Introduction to Data Mining, addison- wesley,/ chapter-05 -classification- alternative techniques

1 comment:

UnknownMay 7, 2014 at 11:09 AM
Amazing stuff here, very nice to read this kind of blogs, thanks for sharing it. . .
best movers packers services.
ReplyDelete
Replies

Add comment

CODING EVERYTHING

Pages

Friday, February 15, 2013

Matrices of Rule base classifier in data mining

1 comment: