Today I read some articles about ELT versus ETL, I have noticed the large amount of information available about ETL and the smaller amount of information out there concerning ELTÂ data integration approach. So, without writing another article of what is the difference between them or which one is better, I have decided to introduce this approach from my point of view.
ELT stands for Extract, Load, Transform.
This approach, extracts the data from the source and then uses the database to make transformation to it, after the load process has taken place and before integrating it into the data warehouse itself:
Looking at the overview, it is a good idea to to do a little drill down of the process. The first step of course is extracting the data from the source, after which comes the load process. The loading stage does not necessarily load the data into the data warehouse tables themselves, but usually this data is loaded into a staging area, which is located on the destination DB. So far, it looks like this:
Focusing on the staging area only, it is a common thing to have some kind of data flow within the staging area, this flow may differ from a simple table to a complicated set of different objects involved, all depending on the logic of the transformation wanted to be achieved and the tools used for this task. For example, a flow may look like this:
Considering the fact, that the DB is not only the final destination point anymore, but becomes the heart of data integration process it is quite understandable that developing and maintaining such systems will require more attention of the DBA in charge of the data warehouse. Other things, that come to mind when thinking about ELT, are the pros and cons of such a concept. It is quite obvious that the data integration process will have the strengths and weakness of the integration tool and the database’s capabilities, so when choosing a combination of these two, one should choose well.
Sergey explained the ETL process quite nicely!. In my opinion, the concept of ETL is nothing more but a marketing strategy to differentiate Oracle’s product from the rest of the ETL tools. In a traditional sense, we Extract from the Source, Transform the data to make it in the format required the by destination, and then Load it into the destination tables. Oracle’s use of “Load” means that it loads data into the staging area, not the destination. Thus, to be technically correct, Oracle’s process should be called ELTL, where the first Load refers to the Staging tables, and the second Load refers to the Destination tables. This is unnecessarily confusing and complicates the understanding. Any ETL tool performs ELTL process. The first “L” just never made into the traditional abbreviation of the process. So, what’s the point then for Oracle to introduce a new abbreviation ELT? I think it is just to grab the attention of the IT community, and Oracle certainly accomplished that 🙂
Hi Igor,
I hope you enjoyed reading my article and I wanted to add a few things.
I cannot say much on the marketing strategy, because this is not my line of work and I don’t know much about that. But what I would say, is that before oracle acquired sunopsis, their product was already build on the ELT approach.
Another thing, this post is about the ELT approach and oracle’s product was never mentioned, but I will refer to it since you did. ODI is capable of doing ELT, ETL and ELTL, it all depends on how you decide to design you data integration process. Although, by default ODI is constructed to use the ELT.
About the ELTL, in the approach I talked about, the second “L” never happens, since after the first load, the data never leaves the target DB. All the transformations happen within the DB, until the data is integrated into the destination tables as part of the “T” process.
Igor, your point of view is convincing :). I am working in OWB and ODI and doing ‘ELTL’ as after extractin we load data into destination DB staging schema, then transform and load again into target data warehouse. I feel only difference between ETL and ELT that we are not using extra machine for staging area, instead we are using destination machine. If your target DB machine is not powerfull you will be in trouble.