Matrices of rule based classifier
Coverage
and Accuracy:
Given a
tuple, X, from a class labeled data set, D, let ncovers
be the number of tuples covered by R;
ncorrect be the number of tuples correctly classified by R; and
|D| be the number of tuples in D. We can define the coverage and accuracy
of R as
Coverage (R) = ncovers /
|D|;
Accuracy
(R) = ncorrect /
ncovers;
e.g Consider rule R1 ,which covers 2 of the 14
tuples. It can correctly classify both tuples.
Therefore coverage (R1) =2/14= 14.28%
Accuracy (R1) =2/2= 100%
Thus Accuracy of the rule is the percentage of the
instances that satisfy both the antecedent and consequent of a rule.
However
accuracy matrices has the limitation that it doesn’t take in to account
the rule’s coverage. It has the potential problem with estimating posterior
probabilities from training data. If the class conditional probability for one of the attributes is zero ,then overall
posterior probability for the class vanishes. This approach is brittle
especially when there are few training examples available and the number of
attributes is large. To overcome this limitation of accuracy Laplace and
M-estimates are used.
Laplace
and m-estimates:
Laplace metric is given by:
Laplace=
(n1+1) / (n+k); M-estimate= (n1+K*p) / (n+k)
Where
n= number of
instances covered by rule
n1= number of positive instances covered by rule
k= number of classes
p= prior probability
source:
book: Introduction to Data Mining, addison- wesley,/ chapter-05 -classification- alternative techniques
Amazing stuff here, very nice to read this kind of blogs, thanks for sharing it. . .
ReplyDeletebest movers packers services.