In the past, I've joked with colleagues about the trap doors hidden in the TIMESTAMP column, and today I fell through one of them.
I think there are some advantages to using timestamps, especially in large tables as part of a key. If you are not too worried about its limited range, the meager 4 bytes it uses can be an advantage.
But there are some issues with this datatype, since it also comes with some default behaviors to watch out for.
Today I was adding in some data from a stage table to an aggregation table using INSERT...SELECT...ON DUPLICATE KEY UPDATE. In my case, both tables had identical structure, including column names.
I've worked up an example of todays puzzle. Here is the prerequisite table definitions and pre-population:
create table jd_stage (
customer_id int not null ,
order_date timestamp not null ,
order_amount decimal(10,4) not null ,
primary key ( customer_id , order_date )
) ;
create table jd_agg (
customer_id int not null ,
order_date timestamp not null ,
order_amount decimal(10,4) not null ,
primary key ( customer_id , order_date )
) ;
insert into jd_agg values
( 1 , "2010-05-01 09:00:00" , 49.25 ) ,
( 1 , "2010-05-01 10:00:00" , 75.50 ) ,
( 2 , "2010-05-01 09:00:00" , 95.40 );
insert into jd_stage values
( 1 , "2010-05-01 09:00:00" , 50.75 ) ,
( 1 , "2010-05-01 10:00:00" , 24.50 ) ,
( 2 , "2010-05-01 09:00:00" , 4.60 );
In this example, I want to aggregate the data in jd_stage to jd_agg, and I've set the data up so that the final result will be 100.00 for the order_amounts.
My first attempt failed because of how I composed the query, which seems innocent enough:
insert into jd_agg ( customer_id, order_date , order_amount )
select customer_id , order_date , order_amount
from jd_stage
on duplicate key update
order_amount = order_amount + values(order_amount) ;
My punishment for this was :
ERROR 1052 (23000) at line 39: Column 'order_amount' in field list is ambiguous
So, I resolved this by trying....
insert into jd_agg ( customer_id, order_date , order_amount )
select customer_id , order_date , order_amount
from jd_stage
on duplicate key update jd_agg.order_amount = jd_agg.order_amount + values(order_amount) ;
And my reward was:
ERROR 1062 (23000) at line 49: Duplicate entry '1-2010-05-12 20:50:15' for key 1
What? I don't have any data with the date/time of "2010-05-12 20:50:15". Plus, that just so happened to be the time that I executed the query.
Once again, fooled by timestamps.
According to MySQL Scripture, which I've read a hundred times, but somehow seem to keep forgetting: If you don't want automatic initialization or updating of a timestamp column, then set the default to 0. So I should have done this:
create table jd_agg (
customer_id int not null ,
order_date timestamp not null DEFAULT 0,
order_amount decimal(10,4) not null ,
primary key ( customer_id , order_date )
) ;
Then the second form of the insert, shown above, is error free:
select * from jd_agg ;
+-------------+---------------------+--------------+
(sorry for formatting...blogger is not cooperating).
So, what do you do if the evil DBA won't let you add that DEFAULT? I found that adding the order_date in the "update" section also works:
insert into jd_agg ( customer_id, order_date , order_amount )
select customer_id , order_date , order_amount
from jd_stage
on duplicate key update
jd_agg.order_date = jd_agg.order_date ,
jd_agg.order_amount = jd_agg.order_amount + values(order_amount) ;
I think there are some advantages to using timestamps, especially in large tables as part of a key. If you are not too worried about its limited range, the meager 4 bytes it uses can be an advantage.
But there are some issues with this datatype, since it also comes with some default behaviors to watch out for.
Today I was adding in some data from a stage table to an aggregation table using INSERT...SELECT...ON DUPLICATE KEY UPDATE. In my case, both tables had identical structure, including column names.
I've worked up an example of todays puzzle. Here is the prerequisite table definitions and pre-population:
create table jd_stage (
customer_id int not null ,
order_date timestamp not null ,
order_amount decimal(10,4) not null ,
primary key ( customer_id , order_date )
) ;
create table jd_agg (
customer_id int not null ,
order_date timestamp not null ,
order_amount decimal(10,4) not null ,
primary key ( customer_id , order_date )
) ;
insert into jd_agg values
( 1 , "2010-05-01 09:00:00" , 49.25 ) ,
( 1 , "2010-05-01 10:00:00" , 75.50 ) ,
( 2 , "2010-05-01 09:00:00" , 95.40 );
insert into jd_stage values
( 1 , "2010-05-01 09:00:00" , 50.75 ) ,
( 1 , "2010-05-01 10:00:00" , 24.50 ) ,
( 2 , "2010-05-01 09:00:00" , 4.60 );
In this example, I want to aggregate the data in jd_stage to jd_agg, and I've set the data up so that the final result will be 100.00 for the order_amounts.
My first attempt failed because of how I composed the query, which seems innocent enough:
insert into jd_agg ( customer_id, order_date , order_amount )
select customer_id , order_date , order_amount
from jd_stage
on duplicate key update
order_amount = order_amount + values(order_amount) ;
My punishment for this was :
ERROR 1052 (23000) at line 39: Column 'order_amount' in field list is ambiguous
So, I resolved this by trying....
insert into jd_agg ( customer_id, order_date , order_amount )
select customer_id , order_date , order_amount
from jd_stage
on duplicate key update jd_agg.order_amount = jd_agg.order_amount + values(order_amount) ;
And my reward was:
ERROR 1062 (23000) at line 49: Duplicate entry '1-2010-05-12 20:50:15' for key 1
What? I don't have any data with the date/time of "2010-05-12 20:50:15". Plus, that just so happened to be the time that I executed the query.
Once again, fooled by timestamps.
According to MySQL Scripture, which I've read a hundred times, but somehow seem to keep forgetting: If you don't want automatic initialization or updating of a timestamp column, then set the default to 0. So I should have done this:
create table jd_agg (
customer_id int not null ,
order_date timestamp not null DEFAULT 0,
order_amount decimal(10,4) not null ,
primary key ( customer_id , order_date )
) ;
Then the second form of the insert, shown above, is error free:
select * from jd_agg ;
+-------------+---------------------+--------------+
| customer_id | order_date | order_amount |
+-------------+---------------------+--------------+
| 1 | 2010-05-01 09:00:00 | 100.0000 |
| 1 | 2010-05-01 10:00:00 | 100.0000 |
| 2 | 2010-05-01 09:00:00 | 100.0000 |
+-------------+---------------------+--------------+ (sorry for formatting...blogger is not cooperating).
So, what do you do if the evil DBA won't let you add that DEFAULT? I found that adding the order_date in the "update" section also works:
insert into jd_agg ( customer_id, order_date , order_amount )
select customer_id , order_date , order_amount
from jd_stage
on duplicate key update
jd_agg.order_date = jd_agg.order_date ,
jd_agg.order_amount = jd_agg.order_amount + values(order_amount) ;
2 comments:
shouldnt the red 'default 0' be on the order_date instead of on the order_amount?
Good Catch. I messed that up while fumbling with the text formatting with blogger. Thanks!
Post a Comment