Friday, February 15, 2013

Matrices of Rule base classifier in data mining


Matrices of rule based classifier
Coverage and Accuracy:
Given a tuple, X, from a class labeled data set, D, let ncovers be the number of tuples covered by  R; ncorrect be the number of tuples correctly classified by R; and |D| be the number of tuples in D. We can define the coverage and accuracy of R as

Coverage (R) = ncovers / |D|;


Accuracy (R) = ncorrect / ncovers;

e.g Consider rule R1 ,which covers 2 of the 14 tuples. It can correctly classify both tuples.
Therefore coverage (R1) =2/14= 14.28%
Accuracy (R1) =2/2= 100%
Thus Accuracy of the rule is the percentage of the instances that satisfy both the antecedent and consequent of a rule.
However  accuracy matrices has the limitation that it doesn’t take in to account the rule’s coverage. It has the potential problem with estimating posterior probabilities from training data. If the class conditional probability for  one of the attributes is zero ,then overall posterior probability for the class vanishes. This approach is brittle especially when there are few training examples available and the number of attributes is large. To overcome this limitation of accuracy Laplace and M-estimates are used.

Laplace and m-estimates:

Laplace metric is given by:
Laplace= (n1+1) / (n+k);                                              M-estimate= (n1+K*p) / (n+k)
Where
 n= number of instances covered by rule
n1= number of positive instances covered by rule
k= number of classes
p= prior probability

source: 
book:   Introduction to Data Mining, addison- wesley,/ chapter-05  -classification- alternative techniques 

1 comment:

  1. Amazing stuff here, very nice to read this kind of blogs, thanks for sharing it. . .
    best movers packers services.

    ReplyDelete