Friday, February 15, 2013

Matrices of Rule base classifier in data mining


Matrices of rule based classifier
Coverage and Accuracy:
Given a tuple, X, from a class labeled data set, D, let ncovers be the number of tuples covered by  R; ncorrect be the number of tuples correctly classified by R; and |D| be the number of tuples in D. We can define the coverage and accuracy of R as

Coverage (R) = ncovers / |D|;


Accuracy (R) = ncorrect / ncovers;

e.g Consider rule R1 ,which covers 2 of the 14 tuples. It can correctly classify both tuples.
Therefore coverage (R1) =2/14= 14.28%
Accuracy (R1) =2/2= 100%
Thus Accuracy of the rule is the percentage of the instances that satisfy both the antecedent and consequent of a rule.
However  accuracy matrices has the limitation that it doesn’t take in to account the rule’s coverage. It has the potential problem with estimating posterior probabilities from training data. If the class conditional probability for  one of the attributes is zero ,then overall posterior probability for the class vanishes. This approach is brittle especially when there are few training examples available and the number of attributes is large. To overcome this limitation of accuracy Laplace and M-estimates are used.

Laplace and m-estimates:

Laplace metric is given by:
Laplace= (n1+1) / (n+k);                                              M-estimate= (n1+K*p) / (n+k)
Where
 n= number of instances covered by rule
n1= number of positive instances covered by rule
k= number of classes
p= prior probability

source: 
book:   Introduction to Data Mining, addison- wesley,/ chapter-05  -classification- alternative techniques 

Tuesday, February 12, 2013

Final year B.E proposal in "Dataware house based intelligent banking analysis system"




The data warehousing & data mining have changed the decision making process in modern day business environment, which basically equip the business companies to reach their customers with the right product and right offer at the right time. This project is mainly concentrated to analyze the customer churn behavior, fraud detection and customer relationship management (CRM) in a banking system. The project will be implemented with a completely warehouse based business intelligence tools with some of the data mining algorithms implemented during reporting phase for churn prediction and anomaly detection.Since customers usually churn from one company to another quite often and this too is happening at an alarming rate and is becoming the most important issue in customer relationship management, so customer retention is the need of the hour to ponder upon. Our project will implement different visualization methods & techniques through Oracle Business Intelligence tool to analyze churn behavior. For this we will implement classification & regression tree (CART) analysis. The pattern of fraud detection will be implemented as location and time-wise. Rule-based methods such as BAYES, FOIL or RIPPER or Support Vector Machines (SVM) or unsupervised neural network (NN) algorithms such as Kohonen’s self-organizing map (SOM) NN will beused using meta-learning algorithms to improve prediction in fraud detection.

Introduction:


In Nepal, the number of banking customers are increasing day by day. As the customers’ number increases, the number  of transactions will also increase and more transactions  as well as customer's data will be added into the bank's database. This results into difficulty in managing the transaction and keeping the sound relationship with each customer. The customer dissatisfaction leads into the continuous loss and even collapse of the organization. So managers and executives of organization must be able to predict the churn behavior of his customer and must maintain a family relationship with all the customers. It costs very high if the managers use traditional approach without using the new tools and technology. Our system, that we are going to develop, will visualize and report the churn behavior, fraud detection and custom er relationship management (CRM) in a banking system.Customer is the heart and soul of any organization. The era of globalization and cut throat competition has changed the basic concept of marketing, now marketing is not confined to selling the services  to the customers, but the objective is to reach to the hearts of the customers so that they feel belonging towards the organizations and hence should remain the loyal customers. But, the ever growing databases make it difficult to analyze the data and to forecast the future trends. The solution liesin the use of Data Mining tools for predicting the churn
behavior of the customers.Churn in banking refers to a customer ceases his or her relationship with a bank. Reducing customer churn is a key business goal of every online business. The ability to predict that a particular customer is at a high risk of churning, while there is still time to do something about it, represents a huge additional potential revenue source for a bank. Besides the direct loss of revenue that results from a customer abandoning the business, the costs of initially acquiring that customer may not have already beencovered by the customer's spending to date. Furthermore, it
is always more difficult and expensive to acquire a new customer than it is to retain a current
paying customer. In order to succeed at retaining customers who would otherwise abandon the
business, our system will make the managers and executives of bank to be able to
 (a) predict in advance which customers are going to churn and
 (b) know which marketing actions will have the greatest retention impact on each particular customer.
Our project throws light on the underlying technology and the perspective applications of data mining in predicting the churn behavior of the  customers and hence paving path for better customer relationship management in today’s competitive banking environment.Another aspect of the project is fraud detection. Fraud means obtaining services/goods and/or money by unethical means, and is a growing  problems all over the world nowadays.  As a recent example  -  Himalayan Bank in Nepal has suffered from a kind of fraud in debit cards and it has stopped all the transactions related to debit card all over the branches in Nepal. Fraud deals with cases involving criminal purposes that, mostly, are difficult to identify. Credit cards are one  of
the most famous targets of fraud. Other targets are personal loan, home loan and retail. Our
system will identify the different types of fraud and notify the concerned  authority about the fraud. We will implement different types of data mining algorithm in order to catch the anomaly in credit and debit cards.


Scope of data warehouse in banking system in context of Nepal
  1.   Helps in carving the future direction of the bank and what actionable point to note for the 
  2. future.
  3.  Used in historical analysis, performance analytics, performance budgeting, product 
  4. innovation, employee performance, customer relationship management and many others. 
  5.  Detection of banking fraud such as identity theft and money laundering.
  6.   Improvement in risk management for investment, loans and bankruptcies.
  7.   Increase the efficiency of ATM services.



REFERENCES

  1. 1.      Bhambri, Vivek, (2012). Data Mining as a Tool to Predict Churn Behavior of Customers. International Journal of Computer & Organization Trends, Vol. 2, Issue 3.
  2. 2.      Bolton, R. J. and Hand, D. J., (2002). Statistical Fraud Detection: A Review. Statistical Science, Vol. 17, No. 3, 235–255.
  3. 3.      Ogwueleka, F.N., (2011). Data Mining Application in Credit Card Fraud Detection. Journal of Engineering Science & Technology, Vol. 6, No. 3, 311-322.
  4. 4.      Lane, P.; Schupmann, V. and Stuar, I. (2007). Oracle Database Data Warehousing Guide, 11g. Oracle Inc.
  5. 5.      Tsiptsis, K. and Chorianopoulos, A. (2009). Data Mining Techniques in CRM: Inside Customer Segmentation. A John Wiley and Sons, Ltd., United Kingdom.
  6. 6.      Yakuel, P. (2012). Optimove Learning Center. http://www.optimove.com/churn-prediction-prevention.aspx [accessed on 24/01/2013].
  7. 7.      Bhattarai, D., Sharma D.R. (2012). Banking and Financial Statistics. Nepal Rastra Bank, Kathmandu, Nepal.  


Download the full copy of the proposal here

Monday, February 11, 2013

Minor project report in android application project

Project reports are the important part of any projects. Each and every projects starts with the proposal report and after the completion of the project , the project is not completed unless and until proper report is prepared explaining the works done in the project.

During our academic course, as a part of our acadamic , we three partners did a major project as an android application named" Restaurant Guide". We have prepared the report after completing it to . The content of our project report is mentioned under.


Our project ‘Restaurant Guide’ is an GPS based android application which helps the user to locate the  Hotels nearby the current location and order the items available in the respective hotels.

The application tracks the current location of the user through the google maps api and it provides the nearest hotels available from that location on the basis of ascending order of the distance .The apps shows the hotels within 3 Km distance from the current location in the default screen.The user can view the menu and order the available items through Email to the hotel.

User can also view the hotel location in the map and find the distance and  route to the particular hotel selected. Apart from this, application has the features of sending SMS service to the friends / person arranging the appointments in the hotels.

This projects overall gives the easier tool for searching the hotels available nearby and saves time with its simple and effective



TABLE OF CONTENTS

COPYRIGHT……………………………………………………………………
ACKNOWLEDGEMENT………………………………………………………
ABSTRACT……………………………………………………………………
TABLE OF CONTENT……………………………………………………….
LIST OF ABBREVIATIONS…………………………………………………..
LIST OF FIGURES……………………………………………………………

1       INTRODUCTION………………………………………………………..
1.1            Literature Review……………………………………………..
1.2            Motivation ……………………………………………………
1.3 Birth of Android ……………………………………………..
1.4           Features …………………………………………………….....
    1.4.1 Application Framework ………………………
1.4.2    Dalvik Virtual Machine ………………………
1.4.3    Integrated browser ……………………………
1.4.4    Optimized Graphics ………………………….
1.4.5    SQLite …………………………..……………
1.4.6    Java Virtual Machine ………………………..
1.4.7    Development Environment ………………….
1.5        Architecture …………………………………………………
1.6        Objectives …………………………………………………...
1.7        Outline of Report…………………………………………   



2               IMPLEMENTATION……………………………………………………
2.1              Requirement Analysi………………………………………...
2.2               Methodology andTool………………………………………
2.3             BackgroundTheories…………………………………………
2.3.1    Maps, Geocoding and
Location Based Services…………………….
2.3.2    Data Storage, Retrieval
and Sharing………………………………….
2.3.3    Haversine Formula…………………………

3        RESULT AND ANALYSIS………………………………………………..
4         LIMITATIONS AND CONCLUSION …………………………………. .
5        FURTHER ENHANCEMENT ……………………………………………
REFERENCES……………………………………………………………


Android  is the emerging device in today’s world. It has occupied large percentage in the mobile use. Android has not only be used just for the making call but several application can be installed in it and user can get benefited through it.
With keeping view of the emerging growth and use of android, we decided to make the android app that is “Restaurant Guide.”.
The project similar to this project related to the search of Hotels nearby is not yet been released in Nepal. We didn’t find any similar projects directly related to it. However the application using the Gps Tracking are even  available in Nepal too. Recently it was disclosed the N-connect is being released shortly. This N-connect has the feature of Gps tracking and  user need to select the destination and it will provide the path. But this application doesnot specially targeted to the hotels, it only provides the path but here we are in need of hotels nearby our current location and order the items. So our application is helpful for those who are interested particularly to the hotels.

 REFERNCES
  • 1.      Reto Meier, “Professional Android  Application Development”, Wiley Publishing Inc., Indianapolis, Indiana, 2009.
  • 2.      Google Inc., "Using JSON with Google Data APIs", July 3, 2009.
  • 3.      Crockford, Douglas, "Introducing JSON”, May 28, 2009.
  • 4.      JSON in Java , http://www.json.org.
  • 5.      Mark Murphy , “The Busy Coder's Guide to Android Development “ Commons Ware,2009.

Saturday, February 9, 2013

Signal addition and multiplication of two signals in MATLAB


Signal addition , signal multiplication are two major operation which we need in the manipulation of signals in DSAP
During addition and signal multiplication ,we first need to arranged the both the signals in proper order
with respect to the range of the signal.

SIGNAL ADDITION:
It is implemented in Matlab by the arithematic operation ".+".HOwever the length of the two
signals must be same. i.e x1(n) and x2(n) should have same length. Moreover
a slight modification in the code should be done if the length of two signals are not the same


function [y,n] = sigadd(x1,n1,x2,n2)
% implements y(n) = x1(n)+x2(n)
% -----------------------------
% [y,n] = sigadd(x1,n1,x2,n2)
% y = sum sequence over n, which includes n1 and n2
% x1 = first sequence over n1
% x2 = second sequence over n2 (n2 can be different from n1)
%
n = min(min(n1),min(n2)):max(max(n1),max(n2)); % duration of y(n)
y1 = zeros(1,length(n)); y2 = y1; % initialization
y1(find((n>=min(n1))&(n<=max(n1))==1))=x1; % x1 with duration of y
y2(find((n>=min(n2))&(n<=max(n2))==1))=x2; % x2 with duration of y
y = y1+y2; % sequence addition


SIGNAL MULTIPLICATION

It is implemented in Matlab by the arithematic operation ".*".


function[y,n]=sigmulti(x1,n1,x2,n2);
n=min(min(n1),min(n2)):
max(max(n1),max(n2));
y1=zeros(1,length(n));
y2=y1;
y1(find((n>=min(n1))&(n<=max(n1))==1))=x1;
y2(find((n>=min(n2))&(n<=max(n2))==1))=x2;
y=y1.*y2;


Finally in the matlab command window give the input of two signals and their length and call the respective function to get the output.

logical operation of intersection “&”, relational operations like “<=” and “==”, and the find function are required to make x1(n) and x2(n) of equal length.So the above function is a genaralize function which results a appropriate answer for both equal aswell as non equall length signals.





Friday, February 8, 2013

Convolution of two signal without use of builtin conv(x,h) function in matlab

digital signal processing is very common in Matlab. Matlab has several  built in function and variables which makes the operation of signals easy and effective. Similarly  in the DSAP problem of convolution we can convolute the two signal  input signal and impulse signal by the use of builtin function conv(x,h) 
Where X= input signal
           h= impulse signal

However the main mathematics involved is hidden inside it. For it to understand properly i.e the manipulation inside the builtin function ,instead of using the function we can form our own code without using it. This own manipulation can makes us better understand the things in convolution of signals.

Convolution:
Three things are very important in convolution.
1) Mirroring of signal  i.e folding of signal w.r.t  Y-axis
2) shifting og signal
3) finally multiplying the signals and adding

these 3 things should be implemented in our code to perform the convolution of the signal  given
  The overall code is as follows:



Problems

  1. Find the convolution result of the following signal without       using basic convolution formula :
X1(n1) = [1,1,1,1,1]
n1= [-2,-1,0,1,2]

h1(n2) = [1,0,0,0,0,0,0,0,0,0]
n2 =  [-4,-3,-2,-1,0,1,2,3,4,5]
X2 is a periodic signal.
Y2= X1*X2

Solution:
                                           Sig_Mirror_Function:

% signal mirror function 

function [y,n] = Sig_Mirror_Function(x,n)
% implements y(n) = x(-n)
% -----------------------% [y,n] = sigfold(x,n)
%
y=fliplr(x);
n = -fliplr(n);
end


                   Sig_Shift_Function:



                           function [y,n] = Sig_Shift_Function(x,m,k)
                             % implements y(n) = x(n-k)
                              % -------------------------% [y,n] = sigshift(x,m,k)

                               n=m+k;
                               y = x;
                               end



                                  Sig_Mul_Function


  function [y,n] = Sig_Mul_function(x1,n1,x2,n2)
% implements y(n) = x1(n)*x2(n)
% -----------------------------% [y,n] = sigmult(x1,n1,x2,n2)
% y=product sequence over n, which includes n1 and n2
%x1=first sequence over n1
%x2=second sequence over n2 (n2 can be different from n1)
%
n=min(min(n1),min(n2)):max(max(n1),max(n2)); % duration of y(n)
y1 = zeros(1,length(n)); y2 = y1; %
y1(find((n>=min(n1))&(n<=max(n1))==1))=x1; % x1 with duration of y
y2(find((n>=min(n2))&(n<=max(n2))==1))=x2; % x2 with duration of y
y=y1 .* y2; % sequence multiplication
end



                            Sig_Conv_Function



function  Sig_Conv_Function(x1,n1,h1,n2)
l=1;
for k=(min(n1)+min(n2)): (max(n1)+max(n2));

[First_y,First_n]=Sig_Mirror_Function(h1,n2);
[Second_y,Second_n]=Sig_Shift_Function(First_y,First_n , k);
[Result_y,Result_n]=Sig_Mul_Function(x1,n1,Second_y,Second_n);
temp=0;

m=1;

y=0;
for i=min(Result_n):max(Result_n);
  
  y=temp+Result_y(m);
  temp=y; 
  m=m+1;
end
Final_Result(l)=y;
l=l+1;
end
Final_Result
Final_n=(min(n1)+min(n2)): (max(n1)+max(n2))
stem(Final_n,Final_Result)


end

 Finally in the matlab window command 
type

x1=[values as per question ]
n1=["]
h1=["]
n2=["]


Sig_Conv_Function(x1, n1, h1,n2)

and you will get the desired output