Tuesday, May 14, 2013

How to load data in data warehouse of oracle database using sqlloader in bash scripting



Loading is an essential and important part in a data warehouse. It is a part of ETL process in data warehouse design. Once all the data has been cleaned and transformed into a structure consistent with the data warehouse requirements, data is ready for loading into the data warehouse. We will load the data in data warehouse using ETL scripting like Korn scripting, Bash scripting and others scripting options as well into Oracle Database.
The key point which we should keep in mind is that there is mechanism in the client tools like sql developer that imports the data form the file  but in the data warehouse we need a automatic data loading so we should avoid such manual insertion of data through the tools . We need to write the scripting program which loads the data.
Loading of data in the oracle database:
The insertion of data in the Oracle database is supported through the SQL LOADER. Here in this project we have used the bash shell scripting. Loading of data consists of two steps
1.      Creation of the control file
2.      Execution of control file using SQL LOADER

1.      Creation of control file:
Creating of control file is a first step in loading data in oracle database using sql loader.
 This sample control file (loader.ctl) will load an external data file containing delimited data:
                        

load data

infile 'c:\data\mydata.csv'

into table emp

fields terminated by "," optionally enclosed by '"'

( empno, empname, sal, deptno

2.      Execution of control file
One can load data into an Oracle database by using the sqlldr (sqlload on some platforms) utility as

sqlldr username@server/password control=loader.ctl

e.g sqlldr BANKING_TBL01/yomari@BANKINDEV control=loader.ctl  log=loader.log bad=income.bad discard=income.discard

The above steps is the basic steps in the loading of data in oracle database. However in the case of data warehouse loading follows the same basic steps however we  create the staging and target tables in the database and hence we need to load the from the  source to staging table then to the target table through scripting like korn or bash shell scripting.

1.      Loading data from source to the staging table
The staging table consists of the identifier. The data from the source are loader through two basic steps as mentioned above.
e.g  let us consider a  table ATM_LOC_STG with
               ATM_LOC_IDNT,
             DISTT_IDNT,­­­­­­
             ZNE_IDNT,­­
             ATM_PLACE,
             ATM_NUM,
             ATM_LOC_DESC


  •     Now we create a control file load-items.ctl as

OPTIONS (skip=1, errors=100, direct=True)
LOAD DATA
INFILE 'nibl_atm_loc.csv'
TRUNCATE into table ATM_LOC_STG
FIELDS TERMINATED BY ',' optionally enclosed by '"'
(
             ATM_LOC_IDNT,
             DISTT_IDNT,
             ZNE_IDNT,
             ATM_PLACE,
             ATM_NUM,
             ATM_LOC_DESC
)
NOTE:  if the datatype is a date type , an extra code is needed as

OPTIONS (skip=1, errors=100, direct=True)
LOAD DATA
INFILE 'nibl_atm_loc.csv'
TRUNCATE into table ATM_LOC_STG
FIELDS TERMINATED BY ',' optionally enclosed by '"'
(
             ATM_LOC_IDNT,
             DISTT_IDNT,
             ZNE_IDNT,
             ATM_PLACE,
             ATM_NUM,
             ATM_LOC_DESC,
            CUST_DATE  date ‘yyyy/mm/dd’ 
                                     
)
Ø if the data is like 2013/05/23,  place ‘yyyy/mm/dd’
Ø if date is like 2013/may/23, place ‘yyyy/mon/dd’
Ø if the date is like 2013-05-23, place ‘yyyy-mm-dd’
Ø if the date is like 2013-may-23, place’yyyy-mon-dd’



  •        Then we create the bash script load-items.sh as
!# /bin/sh
sqlldr BANKING_TBL01/yomari@BANKINDEV control=load_items.ctl log=load_items.log bad=load_items.bad discard=load_items.discard


  •        finally we execute the scripting as /.load-items.sh

2.      loading data from staging to the target tables
The staging tables can contain more columns like surrogate key. Surrogate  keys are generated through sequences and triggers like in the client like sql developer

create or replace
TRIGGER ATM_LOC_TRIGGER
BEFORE INSERT ON ATM_LOC_DM
REFERENCING NEW AS NEW
FOR EACH ROW
BEGIN

  if(:NEW.ATM_LOC_KEY is null) then
  select ATM_LOC_SEQUENCE.nextval
  into :NEW.ATM_LOC_KEY
  From dual;
  end if;
END;






Finally the target tables are loaded using the query as:

insert into atm_loc_dm(atm_loc_idnt,distt_idnt,zne_idnt,atm_place,atm_num,atm_loc_desc)
select atm_loc_idnt,distt_idnt,zne_idnt,atm_place,atm_num,atm_loc_desc
from atm_loc_stg;

Note: here the target tables contains following columns
                             ATM_LOC_KEY     
             ATM_LOC_IDNT,
            DISTT_IDNT,­­­­­­
            ZNE_IDNT,­­
            ATM_PLACE,
            ATM_NUM,
            ATM_LOC_DESC




1 comment:

  1. Amazing stuff here, very nice to read this kind of blogs, thanks for sharing it. . .
    corporate relocation services.

    ReplyDelete