Error handling made easy using Row Error Logging
Here is the details on what each property means.
Every time we start with
a Data Warehouse or Data Integration project we spent lot of time
defining our error handling approach. Still there are good chances that we
might miss some scenarios because of unexpected data issues. Here in this
article, lets us discuss an approach to handle the unexpected error.
Error handling Configuration.
We do not have any error configuration
required during the mapping development. So after the mapping is created,
during the session configuration set the session properties as shown
in below image.
Here is the details on what each property means.
- Error Log Type :- Specifies the type of error log to create. It can be relational database or flat file.
- Error Log DB Connection :- Database connection for a relational log.
- Error Log Table Name Prefix :- Specifies the table name prefix for relational logs.
- Log Row Data :- Specifies whether or not to log transformation row data.
- Log Source Row Data :- Specifies whether or not to log source row data.
- Data Column Delimiter :- Data will be delimited by the specified character in DB column.
With this configuration we specified,
Informatica PowerCenter will create four different tables for error logging and
the table details as below.
- ETL_PMERR_DATA :- Stores data about a transformation row error and its corresponding source row.
- ETL_PMERR_MSG :- Stores metadata about an error and the error message.
- ETL_PMERR_SESS :- Stores metadata about the session.
- ETL_PMERR_TRANS :- Stores metadata about the source and transformation ports, when error occurs.
With this configuration, we are done
with the setting required to capture any error during the session execution.
Now lets see how do we retrieve the data from the error log tables and report
it to Business Users or IT Department.
Report the Error
Data.
Now we have the error data stored in
the error table, it is important to share the error data to the Business Users
or to IT Department. Lets see how we can pull data from these tables.
We can pull the basic error report using the SQL. We can be more fancy with the
SQL and get more information from the error tables.
select
sess.FOLDER_NAME as 'Folder
Name',
sess.WORKFLOW_NAME as 'WorkFlow
Name',
sess.TASK_INST_PATH as 'Session
Name',
data.SOURCE_ROW_DATA as 'Source
Data',
msg.ERROR_MSG as 'Error MSG'
from
from
ETL_PMERR_SESS sess
left outer join ETL_PMERR_DATA data
on data.WORKFLOW_RUN_ID = sess.WORKFLOW_RUN_ID and
left outer join ETL_PMERR_DATA data
on data.WORKFLOW_RUN_ID = sess.WORKFLOW_RUN_ID and
data.SESS_INST_ID = sess.SESS_INST_ID
left outer join ETL_PMERR_MSG msg
on msg.WORKFLOW_RUN_ID = sess.WORKFLOW_RUN_ID and
left outer join ETL_PMERR_MSG msg
on msg.WORKFLOW_RUN_ID = sess.WORKFLOW_RUN_ID and
msg.SESS_INST_ID = sess.SESS_INST_ID
where
where
sess.FOLDER_NAME = <Project Folder
Name> and
sess.WORKFLOW_NAME = <Workflow Name> and
sess.TASK_INST_PATH = <Session Name> and
sess.WORKFLOW_NAME = <Workflow Name> and
sess.TASK_INST_PATH = <Session Name> and
sess.SESS_START_TIME = <Session Run
Time>
The
above SQL can be converted to an Informatica mapping and can be scheduled to
run after all your ETL jobs are completed to get a daily error report and the
same can be scheduled for an email delivery to the data owners to take relevant
action on the errors.
Pros and Cons of this
Approach.
We should know the Pros and Cons of
this approach before applying this to your project.
Pros.
- Out of the box Solution Provided by Informatica.
- Captures all the Error, Including the unthought error.
- Less Coding and Testing efforts required by the development team.
Cons.
- If an error occurs after an active transformation, Integration service will not be able to capture source data into the erro table.
- In sessions with multiple non-pass through partitions, Integration service can not capture source data into error table.
- Added overhead to the Session performance (Which is expected and acceptable).
No comments:
Post a Comment