Loading
is an essential and important part in a data warehouse. It is a part of ETL
process in data warehouse design. Once all the data has
been cleaned and transformed into a structure consistent with the data
warehouse requirements, data is ready for loading into the data warehouse. We
will load the data in data warehouse using ETL scripting like Korn scripting,
Bash scripting and others scripting options as well into Oracle Database.
The key point which we should keep in
mind is that there is mechanism in the client tools like sql developer that
imports the data form the file but in
the data warehouse we need a automatic data loading so we should avoid such
manual insertion of data through the tools . We need to write the scripting
program which loads the data.
Loading of data in the oracle database:
The insertion of data in the Oracle
database is supported through the SQL LOADER. Here in this project we have used
the bash shell scripting. Loading of data consists of two steps
1. Creation
of the control file
2. Execution
of control file using SQL LOADER
1.
Creation
of control file:
Creating of control file is a first step
in loading data in oracle database using sql loader.
This
sample control file (loader.ctl) will load an external data file containing
delimited data:
load data
infile 'c:\data\mydata.csv'
into table emp
fields terminated by "," optionally enclosed by '"'
( empno, empname, sal, deptno )
2. Execution
of control file
One can
load data into an Oracle database by using the sqlldr (sqlload on some
platforms) utility as
sqlldr username@server/password control=loader.ctl
e.g sqlldr
BANKING_TBL01/yomari@BANKINDEV control=loader.ctl log=loader.log bad=income.bad
discard=income.discard
The above steps is the basic steps in the loading of data
in oracle database. However in the case of data warehouse loading follows the
same basic steps however we create the
staging and target tables in the database and hence we need to load the from
the source to staging table then to the
target table through scripting like korn or bash shell scripting.
1.
Loading
data from source to the staging table
The staging table consists of the identifier. The data
from the source are loader through two basic steps as mentioned above.
e.g let us consider a table ATM_LOC_STG with
ATM_LOC_IDNT,
DISTT_IDNT,
ZNE_IDNT,
ATM_PLACE,
ATM_NUM,
ATM_LOC_DESC
- Now we create a control file load-items.ctl as
OPTIONS (skip=1, errors=100, direct=True)
LOAD DATA
INFILE 'nibl_atm_loc.csv'
TRUNCATE into table ATM_LOC_STG
FIELDS TERMINATED BY ',' optionally enclosed by '"'
(
ATM_LOC_IDNT,
DISTT_IDNT,
ZNE_IDNT,
ATM_PLACE,
ATM_NUM,
ATM_LOC_DESC
)
NOTE: if the datatype is a date type , an extra
code is needed as
OPTIONS (skip=1, errors=100, direct=True)
LOAD DATA
INFILE 'nibl_atm_loc.csv'
TRUNCATE into table ATM_LOC_STG
FIELDS TERMINATED BY ',' optionally enclosed by '"'
(
ATM_LOC_IDNT,
DISTT_IDNT,
ZNE_IDNT,
ATM_PLACE,
ATM_NUM,
ATM_LOC_DESC,
CUST_DATE date ‘yyyy/mm/dd’
)
Ø if the data is like 2013/05/23, place ‘yyyy/mm/dd’
Ø if date is like 2013/may/23, place ‘yyyy/mon/dd’
Ø if the date is like 2013-05-23, place ‘yyyy-mm-dd’
Ø if the date is like 2013-may-23, place’yyyy-mon-dd’
- Then we create the bash script load-items.sh as
!# /bin/sh
sqlldr BANKING_TBL01/yomari@BANKINDEV control=load_items.ctl
log=load_items.log bad=load_items.bad discard=load_items.discard
- finally we execute the scripting as /.load-items.sh
2. loading
data from staging to the target tables
The staging
tables can contain more columns like surrogate key. Surrogate keys are generated through sequences and
triggers like in the client like sql developer
create or replace
TRIGGER ATM_LOC_TRIGGER
BEFORE INSERT ON ATM_LOC_DM
REFERENCING NEW AS NEW
FOR EACH ROW
BEGIN
if(:NEW.ATM_LOC_KEY
is null) then
select
ATM_LOC_SEQUENCE.nextval
into :NEW.ATM_LOC_KEY
From dual;
end if;
END;
Finally the target tables are
loaded using the query as:
insert into
atm_loc_dm(atm_loc_idnt,distt_idnt,zne_idnt,atm_place,atm_num,atm_loc_desc)
select atm_loc_idnt,distt_idnt,zne_idnt,atm_place,atm_num,atm_loc_desc
from atm_loc_stg;
Note:
here the target tables contains following columns
ATM_LOC_KEY
ATM_LOC_IDNT,
DISTT_IDNT,
ZNE_IDNT,
ATM_PLACE,
ATM_NUM,
ATM_LOC_DESC
Amazing stuff here, very nice to read this kind of blogs, thanks for sharing it. . .
ReplyDeletecorporate relocation services.