Thursday, March 04, 2010

Sea trials begin for new data warehouse

Over the last few months I've been working on the ETL for the first phase of a new data warehouse. Today this will be deployed to production and subjected to a burn in period. During this time, I'll be watching performance and data quality. Surely there will be some refinements in store.

The ETL is performed exclusively using Pentaho Data Integration ( 3.2.0 GA) on an Open Solaris / MySQL 5.1.43 platform. Source data is extracted from 3 separate production systems to a staging area, and then posted to a fact table with 8 dimensions of various types, including slowly changing hybrids.

Although I haven't blogged in a while, I have been keeping notes of my travels and I'll be publishing those little by little.