Tuesday, April 13, 2010

Comparing Numerics in Pentaho Data Integration / Kettle

While working on a transformation I ran into a problem with comparing two (seemingly) identical numbers using the Filter Rows step. I had a case where a transformation selected two DECIMAL(13,5) values from the database and compared them.

I could see that the numbers were identical in the MySQL database, but the Filter Rows step returned false when comparing. To troubleshoot, I tried multiplying the difference of the two numbers by 10,000,000 ( in the transform) and I actually discovered a very small difference beyond the 5th decimal place. This datatype in MySQL is considered an "exact" datatype, not to be confused with a FLOAT.

My solution is to convert the two fields to Strings and do the comparison. If you don't like that, then explicitly round the numbers using the Calculator step. The Select and Alter step, doesn't seem to truncate the numbers and I don't think it was intended to alter the raw data anyway.

I haven't had time to figure out if or how PDI is converting the numbers, but my gut feeling is that the road to flakiness passes through the MySQL jdbc driver installed with PDI 3.2. Its somewhat old, and my experiments with newer versions (of the driver) introduced new problems.

The link below is an example tranformation ( version 3.2) showing what I am talking about. See the Generate rows step and notice the very small difference between the two numbers

> numeric_compare_filter_values.ktr

Enjoy.

1 comment:

Roland Bouman said...

Hi!

are you using the Number data type in Kettle? Perhaps you should use Bignumber instead.