ODI Repositories

The repositories are database schemas which store all the information regarding the tool and it’s operations. More specifically, anything done from within ODI is stored there. The only exception, are the installation files that are used to run and configure the application itself. Each ODI repository type consists of different table structure and contains different kind of information.

There are two types of repositories:

  1. Master Repository – Contains anything that is related to servers layout, security, environments and technologies. The data contained in this repository is, mostly, static and will be going through minimal changes. Usually, the changes are configuration of new server connections.
  2. Work Repository – Operational repository, which contains very dynamic information about any change that has been made, either by the client or the server.
    Each work repository is connected to one master repository only.
    The work repository consists of two sub-types:

    • Execution – This repository stores only the scenarios and run-time related information, this repository is strictly used for executing generated scenarios. Usually, used for production and test environments. When working with an environment that is created as execution sub-type, using the designer is restricted.
    • Development – This is the development repository which contains all the information an Execution repository does, but in addition, it also contains the meta data and the source code, in other words, all the designer objects, which are used when developing with ODI.

Although, the repositories are for ODI internal use, there are documentations available and they can be used as  centralized location for retrieving information regarding statistics, monitoring and pretty much anything that you would like. The important thing is not to miss use this option, use only when necessary and the goal cannot be achieved using standard ODI capabilities.

It seems, that ODI 11g has a little more integrated support for such a use of the repositories. One indication for this is the “WORKREP” parameter that can be passed to odiRef.getJDBCConnection() to retrieve the work repository JDBC connection object and use it to query the work repository directly. This is different from 10g, because there you would need manually to set up the physical and logical schemes, just like any other source or destination, since the odiRef.getJDBCConnection() accepts only two parameters: “SRC” and “DEST”.

Of course, the way to use such a connection is in read-only mode, the repositories are part of the application and used internally. There should be no modification made to the data manually, if someone decides to alter the data on their own anyway, he might be facing serious problems later on. If still you are facing an issue that requires to alter the data at the repository manually and you cannot find another way around it, then I would advise to get a code review from the official support.

Another important thing about the repositories is to know where each object is stored. Here is a table describing how ODI stores the objects behind the scenes:

Master Work Work sub-type
Designer Versions and solutions Everything except versions and solutions Development only
Operator Versions and solutions Everything except versions and solutions Development and Execution
Topology All N/A
Security manager All N/A

For conclusion, the repositories store all the information used by ODI, getting familiar with the structure and the information which can be extracted from there is a must for anyone, who uses Oracle Data Integrator for a little more, than just mapping a few columns. This applies especially, for 10g users.


ELT Drill Down

Today I read some articles about ELT versus ETL, I have noticed the large amount of information available about ETL and the smaller amount of information out there concerning ELT data integration approach. So, without writing another article of what is the difference between them or which one is better, I have decided to introduce this approach from my point of view.

ELT stands for Extract, Load, Transform.
This approach, extracts the data from the source and then uses the database to make transformation to it, after the load process has taken place and before integrating it into the data warehouse itself:
ELT Overview

Looking at the overview, it is a good idea to to do a little drill down of the process. The first step of course is extracting the data from the source, after which comes the load process. The loading stage does not necessarily load the data into the data warehouse tables themselves, but usually this data is loaded into a staging area, which is located on the destination DB. So far, it looks like this:
ELT Destination Overview

Focusing on the staging area only, it is a common thing to have some kind of data flow within the staging area, this flow may differ from a simple table to a complicated set of different objects involved, all depending on the logic of the transformation wanted to be achieved and the tools used for this task. For example, a flow may look like this:
ELT Flow Example

Considering the fact, that the DB is not only the final destination point anymore, but becomes the heart of data integration process it is quite understandable that developing and maintaining such systems will require more attention of the DBA in charge of the data warehouse. Other things, that come to mind when thinking about ELT, are the pros and cons of such a concept. It is quite obvious that the data integration process will have the strengths and weakness of the integration tool and the database’s capabilities, so when choosing a combination of these two, one should choose well.