<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-29711306</id><updated>2012-02-01T19:34:03.921-07:00</updated><category term='Solaris DTrace'/><category term='Fishing'/><category term='Hunting'/><category term='data quality'/><category term='mysql'/><category term='Pentaho'/><category term='Dodge Cummins Diesel'/><title type='text'>johndz</title><subtitle type='html'>Data, Trucks, and uh...Ducks.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://www.johndz.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://www.johndz.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>John Dzilvelis</name><uri>http://www.blogger.com/profile/06400950302176530075</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>24</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-29711306.post-6250175045933961811</id><published>2012-01-18T14:21:00.001-07:00</published><updated>2012-01-18T14:22:01.748-07:00</updated><title type='text'>Coolest Picture Ever</title><content type='html'>&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-xgWG6GKVCW8/Txc3nRcNReI/AAAAAAAABQY/wGWWbkNE7vY/s1600/sopa.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="200" src="http://4.bp.blogspot.com/-xgWG6GKVCW8/Txc3nRcNReI/AAAAAAAABQY/wGWWbkNE7vY/s200/sopa.jpg" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/29711306-6250175045933961811?l=www.johndz.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.johndz.com/feeds/6250175045933961811/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=29711306&amp;postID=6250175045933961811' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/6250175045933961811'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/6250175045933961811'/><link rel='alternate' type='text/html' href='http://www.johndz.com/2012/01/blog-post.html' title='Coolest Picture Ever'/><author><name>John Dzilvelis</name><uri>http://www.blogger.com/profile/06400950302176530075</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-xgWG6GKVCW8/Txc3nRcNReI/AAAAAAAABQY/wGWWbkNE7vY/s72-c/sopa.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-29711306.post-9041891685292088252</id><published>2011-07-27T00:23:00.005-06:00</published><updated>2011-07-27T00:48:43.453-06:00</updated><title type='text'>USS Doyle Decommissioning.</title><content type='html'>This week, the USS Doyle will be decommissioned after 28 years of service. During my time aboard, we made her first Med cruise which took us to Sicily, Italy, Greece, Spain, France, Rumania, and Israel. Here are a few pictures from my scrapbook. &lt;br /&gt;&lt;br /&gt;&lt;embed type="application/x-shockwave-flash" src="https://picasaweb.google.com/s/c/bin/slideshow.swf" width="600" height="400" flashvars="host=picasaweb.google.com&amp;captions=1&amp;noautoplay=1&amp;hl=en_US&amp;feat=flashalbum&amp;RGB=0x000000&amp;feed=https%3A%2F%2Fpicasaweb.google.com%2Fdata%2Ffeed%2Fapi%2Fuser%2Fanalyticarts%2Falbumid%2F5633895500244017441%3Falt%3Drss%26kind%3Dphoto%26authkey%3DGv1sRgCLe5tK3IronuAw%26hl%3Den_US" pluginspage="http://www.macromedia.com/go/getflashplayer"&gt;&lt;/embed&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/29711306-9041891685292088252?l=www.johndz.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.johndz.com/feeds/9041891685292088252/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=29711306&amp;postID=9041891685292088252' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/9041891685292088252'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/9041891685292088252'/><link rel='alternate' type='text/html' href='http://www.johndz.com/2011/07/uss-doyle-decommissioning.html' title='USS Doyle Decommissioning.'/><author><name>John Dzilvelis</name><uri>http://www.blogger.com/profile/06400950302176530075</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-29711306.post-205801357897112531</id><published>2010-11-19T04:58:00.004-07:00</published><updated>2010-11-19T05:31:27.839-07:00</updated><title type='text'>Exporting / importing in the Pentaho Data Integration 4.0.1 Repository</title><content type='html'>After upgrading from PDI 3.x to 4.0.1, I've found that the old way of exporting an repository subfolder in a single step is gone. It's still important for me to be able to do this since I use separate repositories for Development, Test, and Production, and these systems must remain isolated.&lt;br /&gt;&lt;br /&gt;As a work around, here is what I am doing for now:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;You can still keep your jobs and transforms organized in repository folders/sub-folders. But you have to now export each (open) job and transformation individually using File -&gt; Export -&gt; To XML&lt;br /&gt;&lt;br /&gt;I just save these files to an empty temporary directory on the file system.&lt;br /&gt;&lt;br /&gt;Then I disconnect from the source ( development) repository and connect to the destination repository, ( ie. Test or Production).&lt;br /&gt;&lt;br /&gt;In there, use :&lt;br /&gt;&lt;br /&gt;Tools -&gt; Repository -&gt; Import Repository&lt;br /&gt;&lt;br /&gt;In the dialog that displays:&lt;br /&gt;Navigate to your temporary file system directory, and select all files you want to import. You can select more then one / all files by holding the shift key,&lt;br /&gt;much like selecting multiple rows in a spreadsheet.&lt;br /&gt;&lt;br /&gt;Click on the "Open" button&lt;br /&gt;&lt;br /&gt;This will import your jobs and transformations at once. It will even create your sub-folders in the repository if they don't exist.&lt;br /&gt;&lt;br /&gt;---------------------------------------------------&lt;br /&gt;The first step, of course, is to backup your repository. My repositories  use MySQL databases, so here is the form of the command I use:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;font-size:85%;" &gt;mysqldump  --user=youruser --host=your.host --password=your.password --skip-opt   --add-locks  --create-options  --databases DATABASENAME  &gt;  backupfilename.out&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt; Between release cycles, I verify my backups by using them to  restore/refresh my Development and Test repositories from Production.&lt;br /&gt;&lt;br /&gt;---------------------------------------------------&lt;br /&gt;&lt;br /&gt;I attended the Pentaho Agile-BI tour in Denver this week and met some other folks using PDI and the BI Suite, which was great. One of the discussions some of us had was about forming a Pentaho user group in the Denver / Boulder Area. If anyone is interested look me up on LinkedIn.com or send me an email and lets at least start an online discussion group.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/29711306-205801357897112531?l=www.johndz.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.johndz.com/feeds/205801357897112531/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=29711306&amp;postID=205801357897112531' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/205801357897112531'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/205801357897112531'/><link rel='alternate' type='text/html' href='http://www.johndz.com/2010/11/exporting-importing-in-pentaho-data.html' title='Exporting / importing in the Pentaho Data Integration 4.0.1 Repository'/><author><name>John Dzilvelis</name><uri>http://www.blogger.com/profile/06400950302176530075</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-29711306.post-4859125740981347862</id><published>2010-08-11T07:33:00.005-06:00</published><updated>2010-08-21T08:48:56.521-06:00</updated><title type='text'>Connecting Pentaho Data Integration to hive / hadoop</title><content type='html'>My latest data integration challenge has been with a new node in my data landscape: a hadoop/hive installation. Since PDI has become my favorite hammer for many different tasks, I thought it would be handy to get connected to the hive database via jdbc. With that ability, I can enhance hive output by including lookups and joins with operational ( MySQL ) databases.&lt;br /&gt;&lt;br /&gt;Unfortunately, I didn't have much  luck using standard connections with jdbc and table input steps. I suppose this is because the hive jdbc driver is still in the embryonic stage.&lt;br /&gt;&lt;br /&gt;The turning point for my effort was the discovery of the new User Defined Java Class in Pentaho 4.0 GA. I struggled a bit before getting this to work, but I now have a simple working example that returns the result of a hive query to the stream. There was quite a bit of late night thrashing, so excuse the un-refined code.&lt;br /&gt;&lt;br /&gt;In summary, the keys to getting the udjc to work were:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Copy the ${HIVE_HOME}/lib/*.jar and $HADOOP_HOME/hadoop-*-core.jar  files into the data_integration/libext directory.&lt;/li&gt;&lt;li&gt;The test case ktr file posted at the bottom of &lt;a href="http://forums.pentaho.com/showthread.php?76943-User-Defined-Java-Class-Preview-works-Output-Wrong.&amp;amp;highlight=udjc"&gt;this thread&lt;/a&gt; on the Pentaho forum was also helpful.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.ibridge.be/?p=180"&gt;Matt Casters blog on udjc&lt;/a&gt; for the intro to using this step.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;Here is the un-refined code for the udjc step:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;br /&gt;import java.sql.SQLException;&lt;br /&gt;import java.sql.Connection;&lt;br /&gt;import  java.sql.ResultSet;&lt;br /&gt;import java.sql.Statement;&lt;br /&gt;import  java.sql.DriverManager;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;public boolean  processRow(StepMetaInterface smi, StepDataInterface sdi) throws  KettleException&lt;br /&gt;{&lt;br /&gt;    Object[] inrow = getRow();&lt;br /&gt;    if  (inrow == null) {&lt;br /&gt;        setOutputDone();&lt;br /&gt;       return false;&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    try {&lt;br /&gt;       String driverName =  "org.apache.hadoop.hive.jdbc.HiveDriver";&lt;br /&gt;        Class.forName(driverName);&lt;br /&gt;      } catch (Exception e) {&lt;br /&gt;      e.printStackTrace();&lt;br /&gt;     System.exit(1);&lt;br /&gt;   }&lt;br /&gt;&lt;br /&gt;try {  &lt;br /&gt;Connection  con =  DriverManager.getConnection("jdbc:hive://localhost:10000/default", "",  "");&lt;br /&gt;   Statement stmt = con.createStatement();&lt;br /&gt;   String sql ;&lt;br /&gt;    String msg ;&lt;br /&gt;&lt;br /&gt;   // select * query&lt;br /&gt;   sql = "select sum(id)  from jd_test " ;&lt;br /&gt;   ResultSet res = stmt.executeQuery(sql);&lt;br /&gt;    long idval ;&lt;br /&gt;   Object[]  outrow = createOutputRow(inrow,  data.outputRowMeta.size());&lt;br /&gt;&lt;br /&gt;   while (res.next()) {&lt;br /&gt;       outrow = RowDataUtil.createResizedCopy(outrow,  data.outputRowMeta.size());&lt;br /&gt;      idval = res.getLong(1) ;&lt;br /&gt;       msg = "result id: " + idval ; &lt;br /&gt;      logBasic(msg);&lt;br /&gt;      get(Fields.Out,  "id").setValue(outrow, idval );     &lt;br /&gt;       putRow(data.outputRowMeta, outrow);&lt;br /&gt;   }&lt;br /&gt;} catch (  SQLException se ) { se.printStackTrace() ; System.exit(1) ;}&lt;br /&gt;    return true;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/29711306-4859125740981347862?l=www.johndz.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.johndz.com/feeds/4859125740981347862/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=29711306&amp;postID=4859125740981347862' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/4859125740981347862'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/4859125740981347862'/><link rel='alternate' type='text/html' href='http://www.johndz.com/2010/08/connecting-pentaho-data-integration-to.html' title='Connecting Pentaho Data Integration to hive / hadoop'/><author><name>John Dzilvelis</name><uri>http://www.blogger.com/profile/06400950302176530075</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-29711306.post-3724948011721870996</id><published>2010-07-12T22:46:00.014-06:00</published><updated>2010-07-12T23:27:04.822-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Fishing'/><title type='text'>Farewell to Themistocles</title><content type='html'>&lt;span style="font-size:180%;"&gt;T&lt;/span&gt;hey say that the happiest day of your life is when you buy your boat, and the second happiest day is when you sell it.&lt;br /&gt;&lt;br /&gt;In this case, I'm not sure they are right. Tonight I did the unthinkable and sold my boat to a gentleman who will give Themistocles a good home on the Texas Gulf Coast. As much as I loved this boat, I just don't find much time for boating now that I live in Colorado, and I hated to see it just sitting in the driveway.&lt;br /&gt;&lt;br /&gt;The name &lt;a href="http://en.wikipedia.org/wiki/Themistocles"&gt;Themistocles&lt;/a&gt; happened to be the name of the ship that brought my grandfather to Ellis Island from Greece in 1917, and a fitting name for my boat. I painstakingly painted the name in Greek on the stern.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_3oz1OTvwqJ4/TDvylacT2AI/AAAAAAAABAo/uNV3IUkZZt8/s1600/Themistocles.jpg"&gt;&lt;img style="float: left; margin: 0pt 0pt 10px 10px; cursor: pointer; width: 400px; height: 265px;" src="http://2.bp.blogspot.com/_3oz1OTvwqJ4/TDvylacT2AI/AAAAAAAABAo/uNV3IUkZZt8/s400/Themistocles.jpg" alt="" id="BLOGGER_PHOTO_ID_5493250895075596290" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;A quick picture with my kids after catching some Spanish Mackerel.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_3oz1OTvwqJ4/TDvwEoIx1jI/AAAAAAAABAg/iU235VRxOsw/s1600/macattack3.jpg"&gt;&lt;img style="float: left; margin: 0pt 0pt 10px 10px; cursor: pointer; width: 400px; height: 265px;" src="http://1.bp.blogspot.com/_3oz1OTvwqJ4/TDvwEoIx1jI/AAAAAAAABAg/iU235VRxOsw/s400/macattack3.jpg" alt="" id="BLOGGER_PHOTO_ID_5493248132792833586" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;And a day on Carter Lake, in Colorado.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_3oz1OTvwqJ4/TDvv6UOIhZI/AAAAAAAABAY/yAwRM-eiSG0/s1600/100_2205.JPG"&gt;&lt;img style="float: left; margin: 0pt 0pt 10px 10px; cursor: pointer; width: 400px; height: 265px;" src="http://3.bp.blogspot.com/_3oz1OTvwqJ4/TDvv6UOIhZI/AAAAAAAABAY/yAwRM-eiSG0/s400/100_2205.JPG" alt="" id="BLOGGER_PHOTO_ID_5493247955647890834" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Cruising of the coast of Tarpon Springs, Florida, near Anclote Key.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;object width="320" height="266" class="BLOG_video_class" id="BLOG_video-4d7943fbe80ef76e" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"&gt;&lt;param name="movie" value="http://www.youtube.com/get_player"&gt;&lt;param name="bgcolor" value="#FFFFFF"&gt;&lt;param name="allowfullscreen" value="true"&gt;&lt;param name="flashvars" value="flvurl=http://v23.nonxt5.googlevideo.com/videoplayback?id%3D4d7943fbe80ef76e%26itag%3D5%26app%3Dblogger%26ip%3D0.0.0.0%26ipbits%3D0%26expire%3D1331234607%26sparams%3Did,itag,ip,ipbits,expire%26signature%3D3F3C4709F6EE4C1A3D96A9AB37EC2E4146DFB828.5206D50AE9749B48680D86793532C54EDE6F562B%26key%3Dck1&amp;amp;iurl=http://video.google.com/ThumbnailServer2?app%3Dblogger%26contentid%3D4d7943fbe80ef76e%26offsetms%3D5000%26itag%3Dw160%26sigh%3Dh_QvIoisXQkrCLLOnENP3AOHZ_M&amp;amp;autoplay=0&amp;amp;ps=blogger"&gt;&lt;embed src="http://www.youtube.com/get_player" type="application/x-shockwave-flash"width="320" height="266" bgcolor="#FFFFFF"flashvars="flvurl=http://v23.nonxt5.googlevideo.com/videoplayback?id%3D4d7943fbe80ef76e%26itag%3D5%26app%3Dblogger%26ip%3D0.0.0.0%26ipbits%3D0%26expire%3D1331234607%26sparams%3Did,itag,ip,ipbits,expire%26signature%3D3F3C4709F6EE4C1A3D96A9AB37EC2E4146DFB828.5206D50AE9749B48680D86793532C54EDE6F562B%26key%3Dck1&amp;iurl=http://video.google.com/ThumbnailServer2?app%3Dblogger%26contentid%3D4d7943fbe80ef76e%26offsetms%3D5000%26itag%3Dw160%26sigh%3Dh_QvIoisXQkrCLLOnENP3AOHZ_M&amp;autoplay=0&amp;ps=blogger"allowFullScreen="true" /&gt;&lt;/object&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/29711306-3724948011721870996?l=www.johndz.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.johndz.com/feeds/3724948011721870996/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=29711306&amp;postID=3724948011721870996' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/3724948011721870996'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/3724948011721870996'/><link rel='alternate' type='text/html' href='http://www.johndz.com/2010/07/farewell-to-themistocles.html' title='Farewell to Themistocles'/><author><name>John Dzilvelis</name><uri>http://www.blogger.com/profile/06400950302176530075</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_3oz1OTvwqJ4/TDvylacT2AI/AAAAAAAABAo/uNV3IUkZZt8/s72-c/Themistocles.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-29711306.post-165666667196076579</id><published>2010-05-12T21:56:00.021-06:00</published><updated>2010-05-20T08:42:44.677-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Insert...Select...On Duplicate key with timestamps</title><content type='html'>&lt;div&gt;In the past, I've joked with colleagues about the trap doors hidden in the TIMESTAMP column, and today I fell through one of them.&lt;br /&gt;&lt;br /&gt;I think there are some advantages to using timestamps, especially in large tables as part of a key. If you are not too worried about its limited range, the meager 4 bytes it uses can be an advantage.&lt;br /&gt;&lt;br /&gt;But there are some issues with this datatype, since it also comes with some default behaviors to watch out for.&lt;br /&gt;&lt;br /&gt;Today I was adding in some data from a stage table to an aggregation table using INSERT...SELECT...ON DUPLICATE KEY UPDATE. In my case, both tables had identical structure, including column names.&lt;br /&gt;&lt;br /&gt;I've worked up an example of todays puzzle. Here is the prerequisite table definitions and pre-population:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt;create table jd_stage (&lt;br /&gt;customer_id    int           not null ,&lt;br /&gt;order_date     timestamp     not null ,&lt;br /&gt;order_amount   decimal(10,4) not null ,&lt;br /&gt;primary key ( customer_id , order_date )&lt;br /&gt;) ;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;create table jd_agg&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;(&lt;/span&gt; &lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;customer_id    int           not null ,&lt;/span&gt;&lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;order_date     timestamp     not null ,&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;order_amount   decimal(10,4) not null ,&lt;/span&gt; &lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;primary key ( customer_id , order_date )&lt;/span&gt;&lt;/span&gt;&lt;span style="font-size:85%;"&gt; &lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;) ;&lt;/span&gt;&lt;/span&gt;&lt;span style="font-size:85%;"&gt; &lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;insert into jd_agg&lt;/span&gt;&lt;/span&gt;&lt;span style="font-size:85%;"&gt; &lt;span style="font-family:courier new;"&gt;values&lt;/span&gt; &lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;( 1 , "2010-05-01 09:00:00" , 49.25 ) ,&lt;/span&gt;&lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;( 1 , "2010-05-01 10:00:00" , 75.50 ) ,&lt;/span&gt; &lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;( 2 , "2010-05-01 09:00:00" , 95.40 )&lt;/span&gt;&lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt;;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;insert into jd_stage&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;values&lt;/span&gt; &lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;( 1 , "2010-05-01 09:00:00" , 50.75 ) ,&lt;/span&gt;&lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;( 1 , "2010-05-01 10:00:00" , 24.50 ) ,&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;( 2 , "2010-05-01 09:00:00" , 4.60 )&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;;&lt;/span&gt; &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In this example, I want to aggregate the data in jd_stage to jd_agg, and I've set the data up so that the final result will be 100.00 for the order_amounts.&lt;br /&gt;&lt;br /&gt;My first attempt failed because of how I composed the query, which seems innocent enough:&lt;br /&gt;&lt;br /&gt;&lt;span style=";font-family:courier new;font-size:85%;"  &gt;insert into jd_agg ( customer_id, order_date , order_amount )&lt;br /&gt;select customer_id , order_date , order_amount&lt;br /&gt;from jd_stage&lt;br /&gt;on duplicate key update&lt;br /&gt;order_amount = order_amount + values(order_amount) ;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;My punishment for this was :&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:85%;" &gt;&lt;span style="font-family:courier new;"&gt;ERROR 1052 (23000) at line 39: Column 'order_amount' in field list is ambiguous&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;So, I resolved this by trying....&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt;insert into jd_agg ( customer_id, order_date , order_amount )&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;select customer_id , order_date , order_amount&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;from jd_stage&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;on duplicate key update&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;span style="color: rgb(255, 0, 0); font-weight: bold;"&gt;jd_agg&lt;/span&gt;.order_amount = &lt;span style="color: rgb(255, 0, 0); font-weight: bold;"&gt;jd_agg&lt;/span&gt;.order_amount + values(order_amount)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;;&lt;/span&gt; &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;And my reward was:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:85%;" &gt;&lt;span style="font-family:courier new;"&gt;ERROR 1062 (23000) at line 49: Duplicate entry '1-2010-05-12 20:50:15' for key 1&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;What? I don't have any data with the date/time of "2010-05-12 20:50:15". Plus, that just so happened to be the time that I executed the query.&lt;br /&gt;&lt;br /&gt;Once again, fooled by timestamps.&lt;br /&gt;&lt;br /&gt;According to &lt;a href="http://dev.mysql.com/doc/refman/5.0/en/timestamp.html"&gt;MySQL Scripture&lt;/a&gt;, which I've read a hundred times, but somehow seem to keep forgetting: &lt;span style="font-style: italic; font-weight: bold;"&gt;If you don't want automatic initialization or updating of a timestamp column, then set the default to 0&lt;/span&gt;. So I should have done this:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt;create table jd_agg&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;(&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;customer_id     int           not null ,&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;order_date      timestamp     not null &lt;/span&gt;&lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt; &lt;span style="color: rgb(255, 0, 0); font-weight: bold;"&gt;DEFAULT 0,&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;order_amount    decimal(10,4) not null ,&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;primary key ( customer_id , order_date )&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;) ;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;Then the second form of the insert, shown above, is error free:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt;select * from jd_agg ;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt;+-------------+---------------------+--------------+&lt;/span&gt; &lt;/span&gt;&lt;div&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt;| customer_id | order_date                            | order_amount |&lt;/span&gt; &lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt;+-------------+---------------------+--------------+&lt;/span&gt; &lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt;| 1                               | 2010-05-01 09:00:00 |     100.0000             |&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt;|           1                               | 2010-05-01 10:00:00 |     100.0000            |&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt;|           2           | 2010-05-01 09:00:00 |     100.0000             |&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt;+-------------+---------------------+--------------+&lt;/span&gt; &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;(sorry for formatting...blogger is not cooperating).&lt;br /&gt;&lt;br /&gt;So, what do you do if the evil DBA won't let you add that DEFAULT? I found that adding the order_date in the "update" section also works:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt;insert into jd_agg ( customer_id, order_date , order_amount ) &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;select customer_id , order_date , order_amount &lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&lt;br /&gt;from jd_stage&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;on duplicate key update&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;span style="color: rgb(255, 0, 0); font-weight: bold;"&gt;&lt;br /&gt;jd_agg.order_date = jd_agg.order_date &lt;/span&gt;, &lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&lt;br /&gt;jd_agg.order_amount = jd_agg.order_amount + values(order_amount) &lt;/span&gt;&lt;span style="font-family:courier new;"&gt;;&lt;/span&gt; &lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/29711306-165666667196076579?l=www.johndz.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.johndz.com/feeds/165666667196076579/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=29711306&amp;postID=165666667196076579' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/165666667196076579'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/165666667196076579'/><link rel='alternate' type='text/html' href='http://www.johndz.com/2010/05/insertselecton-duplicate-key-with.html' title='Insert...Select...On Duplicate key with timestamps'/><author><name>John Dzilvelis</name><uri>http://www.blogger.com/profile/06400950302176530075</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-29711306.post-989120166219919657</id><published>2010-05-04T22:13:00.001-06:00</published><updated>2010-05-04T22:16:15.341-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='data quality'/><category scheme='http://www.blogger.com/atom/ns#' term='Pentaho'/><title type='text'>Part 2: Comparing Numerics in Pentaho Data Integration</title><content type='html'>As a followup to my &lt;a href="http://www.johndz.com/2010/03/comparing-numerics-in-pentaho-data.html"&gt;previous post&lt;/a&gt; about comparing numeric values, I've since discovered a little more about the problem. To repeat my original problem: certain numeric field values that should be equal are being detected as different in the Filter rows step.  I think it's important to be able to perform accurate comparisons since it is a frequent task in data quality analysis.&lt;br /&gt;&lt;br /&gt;Originally, I assumed this had something to do with jdbc. However, since I can re-produce the issue without any SQL, I'm sure this has nothing to do with the version of the MySQL Connector/J jdbc driver. I tried the 5.0.8 version of the driver and I observed the same behavior. I couldn't even get my transform to work correctly with the 5.1.12 version of the connector -- it does not recognize column aliases in my SQL query.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Now for the rest of the story: &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;My comparison of numeric data was between 2 fields from two data streams, initiated by two separate SQL table inputs.&lt;ul&gt;&lt;li&gt;The first data stream is from a "raw" table. From the input step it is passed through a "sort rows" and then a "group by" step to aggregate the numeric values.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;The second data stream is from a "rollup" table where the raw data is summarized.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;The two streams are then merged ( by a unique id) and compared in order to validate the data in the rollup.&lt;br /&gt;&lt;br /&gt;At this point, the problem seems more related to the metadata of the fields. I found two resolutions to choose from:&lt;br /&gt;&lt;br /&gt;Use "group by" in SQL. The data types of the output numerics are magically set to BigNumber.&lt;br /&gt;&lt;br /&gt;or&lt;br /&gt;&lt;br /&gt;Place a "Select/Rename Values" step &lt;span style="font-style: italic;"&gt;after &lt;/span&gt;the sort and &lt;span style="font-style: italic;"&gt;before &lt;/span&gt;the "Group By" step to coerce the metadata of the fields to be of  BigNumber type.&lt;br /&gt;&lt;br /&gt;Personally, I prefer the second option because I like to extract the data as quickly as possible. Pentaho can handle sorting and grouping of somewhat large datasets just fine.&lt;br /&gt;&lt;br /&gt;I have a new example transform here: &lt;a href="http://docs.google.com/leaf?id=0Bwxr6BDRJ4-VYzQ1ZGIxNDUtYzdjNi00NTM0LTk1ODAtYzkwY2E2NmUyMjJh&amp;amp;hl=en"&gt;numeric_compare_filter_values_try2.ktr&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Here is a picture of the example transform:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_3oz1OTvwqJ4/S-DtikYnaJI/AAAAAAAAA9g/0Vba3Cet8eU/s1600/numerics_transform.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 513px; height: 344px;" src="http://4.bp.blogspot.com/_3oz1OTvwqJ4/S-DtikYnaJI/AAAAAAAAA9g/0Vba3Cet8eU/s400/numerics_transform.gif" alt="" id="BLOGGER_PHOTO_ID_5467631125766039698" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/29711306-989120166219919657?l=www.johndz.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.johndz.com/feeds/989120166219919657/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=29711306&amp;postID=989120166219919657' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/989120166219919657'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/989120166219919657'/><link rel='alternate' type='text/html' href='http://www.johndz.com/2010/05/part-2-comparing-numerics-in-pentaho.html' title='Part 2: Comparing Numerics in Pentaho Data Integration'/><author><name>John Dzilvelis</name><uri>http://www.blogger.com/profile/06400950302176530075</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_3oz1OTvwqJ4/S-DtikYnaJI/AAAAAAAAA9g/0Vba3Cet8eU/s72-c/numerics_transform.gif' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-29711306.post-7288536554416550355</id><published>2010-04-26T12:32:00.002-06:00</published><updated>2010-04-26T12:45:02.042-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Hunting'/><title type='text'>Grilled Wild Duck with Spicy Orange Glaze</title><content type='html'>This isn't really a recipe, but rather a chronicle of my experience grilling wild duck. I've tried a few different ways of grilling duck and up to now the results have been "ok", but less than perfect. &lt;br /&gt;&lt;br /&gt;Last night was different, possibly my finest hour at the grill.  &lt;br /&gt;&lt;br /&gt;The lucky duck was a small mallard hen that I shot during hunting season. For those of you who have never had wild game, it is not like any domestic bird you can buy. The meat is lean and dark red. The skin has a layer of fat and is quite thick. I'm sure that a big reason why ducks have no trouble swimming in near-freezing water. &lt;br /&gt;&lt;br /&gt;Cooking these birds can be tricky because if you over-cook the meat it tastes like liver. To compound this, the juices flow red/pink if you cook it properly, and we all know that a big no-no when grilling chicken. I know that I had a tendency to overcook duck because of the "clear juice" hang-up. Conventional wisdom says that wild duck should be cooked until it appears like rare steak. While there are recipes on the internet, there's not a lot of information about specific cooking techniques. &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Here's how I did it. I'm the only one in the family who will eat game birds, so this recipe feeds one:&lt;br /&gt;&lt;br /&gt;Step one, the spice. There's no need to re-invent the wheel here. My favorite spice is Emeril's Essence. I made a batch for some seafood pasta a few months ago and I've since found it to be useful for many dishes. You can make it by following the instructions here:&lt;br /&gt; &lt;br /&gt;&lt;a href="http://www.foodnetwork.com/recipes/emeril-lagasse/emerils-creole-seasoning-essence-recipe3/index.html"&gt;http://www.foodnetwork.com/recipes/emeril-lagasse/emerils-creole-seasoning-essence-recipe3/index.html&lt;br /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Before you start the next steps, get the duck cleaned up and resting to room temperature. Don't cook a cold duck. They just don't like it. &lt;br /&gt;&lt;br /&gt;Next, prepare the orange sauce. Combine 2/3 cup Orange Juice and a chicken bouillon cube in a pan and heat until there is no evidence that you stooped so low as to use a bouillon cube instead of chicken stock. The spice is going to make up for it. Nobody will know. &lt;br /&gt;&lt;br /&gt;Add about 1/3 cup of Orange Marmalade to the pan and simmer over medium heat until the sauce becomes syrup-like, stirring frequently.  &lt;br /&gt;&lt;br /&gt;Once that done, add two teaspoons of the Emeril Essence spice. Watch the orange sauce come to life. &lt;br /&gt;&lt;br /&gt;Get the grill started and ready. HOT, HOT, HOT. About 450 degrees. Keep the lid down. &lt;br /&gt;&lt;br /&gt;Prepare the Duck: Use shears or a strong knife to split the bottom, and crack the bones so that it lays flat, breast side up. This means there is no “cavity”. You don't want to cook this like a whole chicken because the cavity will require you to cook it longer and that guarantees the liver taste. Use a sharp knife and "gently" make some criss-cross cuts on the skin. This is going to help the fat flow a little. It's also going to make Mrs. Mallard flammable. Make sure the skin is dry too by patting with a paper towel. &lt;br /&gt;&lt;br /&gt;Now brush on the orange spicy sauce on both sides, especially the skin. Don't be shy. Put on a good layer. &lt;br /&gt;&lt;br /&gt;Pour your beer now, and get a stop watch. You will want to time this carefully, without distraction. &lt;br /&gt;&lt;br /&gt;Ready. Set. Grill! Place the duck on the hot grill, Breast side up/Bone side down. Work quick and get that lid closed ASAP. Cook for 10 minutes and then flip. Quickly so you don't lose too much heat!&lt;br /&gt;&lt;br /&gt;Now things may get violent. I did have a few times when the duck actually caught on fire. A little bit is ok, but you've got to pay attention and move it to a higher rack for a few moments, but always try to keep it in a high temperature with the lid closed. After about 6 minutes, it should be done. The skin will be slightly burnt, but still quite savory, and the juices will flow red. The meat will look like rare steak. &lt;br /&gt;&lt;br /&gt;Some points to ponder:&lt;br /&gt;&lt;br /&gt;No need for butter or oil. The fat in the skin is more then enough. &lt;br /&gt;&lt;br /&gt;Cook fast and hot. Turning only once. I think this made the difference. &lt;br /&gt;&lt;br /&gt;No body cavity. By laying flat on the grill, the duck cooks evenly and much faster. &lt;br /&gt;&lt;br /&gt;Most of the wild game sauces that I've tried seem to mask the taste of the duck. The spice coupled with a simple, mild orange sauce really helped highlight the dish.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/29711306-7288536554416550355?l=www.johndz.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.johndz.com/feeds/7288536554416550355/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=29711306&amp;postID=7288536554416550355' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/7288536554416550355'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/7288536554416550355'/><link rel='alternate' type='text/html' href='http://www.johndz.com/2010/04/grilled-wild-duck-with-spicy-orange.html' title='Grilled Wild Duck with Spicy Orange Glaze'/><author><name>John Dzilvelis</name><uri>http://www.blogger.com/profile/06400950302176530075</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-29711306.post-7277183797836170454</id><published>2010-04-13T20:55:00.000-06:00</published><updated>2010-04-13T20:54:52.525-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='Pentaho'/><title type='text'>Comparing Numerics in Pentaho Data Integration / Kettle</title><content type='html'>&lt;div&gt;While working on a transformation I ran into a problem with comparing two (seemingly) identical numbers using the Filter Rows step. I had a case where a transformation selected two DECIMAL(13,5) values from the database and compared them.&lt;br /&gt;&lt;br /&gt;I could see that the numbers were identical in the MySQL database, but the Filter Rows step returned false when comparing.  To troubleshoot, I tried multiplying the difference of the two numbers by 10,000,000 ( in the transform) and I actually discovered a very small difference beyond the 5th decimal place. This datatype in MySQL is considered an "exact" datatype, not to be confused with a FLOAT.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;My solution is to convert the two fields to Strings and do the comparison. If you don't like that, then explicitly round the numbers using the Calculator step. The Select and Alter step, doesn't seem to truncate the numbers and I don't think it was intended to alter the raw data anyway.&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;I haven't had time to figure out if or how PDI is converting the numbers, but my gut feeling is that the road to flakiness passes through the MySQL jdbc driver installed with PDI 3.2. Its somewhat old, and my experiments with newer versions (of the driver) introduced new problems.&lt;br /&gt;&lt;div&gt;&lt;br /&gt;The link below is an example tranformation ( version 3.2) showing what I am talking about. See the Generate rows step and notice the very small difference between the two numbers&lt;br /&gt;&lt;br /&gt;&gt; &lt;a href="http://docs.google.com/leaf?id=0Bwxr6BDRJ4-VNGNiYjE3ZGYtZWZmOC00MWU1LTkxOWItNTZmZGJmNzU5Mjk0&amp;amp;hl=en"&gt;numeric_compare_filter_values.ktr&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Enjoy.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/29711306-7277183797836170454?l=www.johndz.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.johndz.com/feeds/7277183797836170454/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=29711306&amp;postID=7277183797836170454' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/7277183797836170454'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/7277183797836170454'/><link rel='alternate' type='text/html' href='http://www.johndz.com/2010/03/comparing-numerics-in-pentaho-data.html' title='Comparing Numerics in Pentaho Data Integration / Kettle'/><author><name>John Dzilvelis</name><uri>http://www.blogger.com/profile/06400950302176530075</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-29711306.post-6078662924290059098</id><published>2010-03-04T05:03:00.002-07:00</published><updated>2010-03-04T05:16:56.123-07:00</updated><title type='text'>Sea trials begin for new data warehouse</title><content type='html'>Over the last few months I've been working on the ETL for the first phase of a new data warehouse. Today this will be deployed to production and subjected to a burn in period. During this time, I'll be watching performance and data quality.  Surely there will be some refinements in store.&lt;br /&gt;&lt;br /&gt;The ETL is performed exclusively using Pentaho Data Integration ( 3.2.0 GA)  on an Open Solaris / MySQL 5.1.43 platform. Source data is extracted from 3 separate production systems to a staging area,  and then posted to a fact table with 8 dimensions of various types, including slowly changing hybrids.&lt;br /&gt;&lt;br /&gt;Although I haven't blogged in a while, I have been keeping notes of my travels and I'll be publishing those little by little.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/29711306-6078662924290059098?l=www.johndz.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.johndz.com/feeds/6078662924290059098/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=29711306&amp;postID=6078662924290059098' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/6078662924290059098'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/6078662924290059098'/><link rel='alternate' type='text/html' href='http://www.johndz.com/2010/03/sea-trials-begin-for-new-data-warehouse.html' title='Sea trials begin for new data warehouse'/><author><name>John Dzilvelis</name><uri>http://www.blogger.com/profile/06400950302176530075</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-29711306.post-998743593312307742</id><published>2009-11-09T21:10:00.005-07:00</published><updated>2009-11-09T21:50:21.350-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='data quality'/><title type='text'>Data Quality -- Manufacturing Perspective</title><content type='html'>&lt;span style="font-size:180%;"&gt;O&lt;/span&gt;&lt;span id="SPELLING_ERROR_0" class="blsp-spelling-error"&gt;ver&lt;/span&gt; the years I’&lt;span id="SPELLING_ERROR_1" class="blsp-spelling-error"&gt;ve&lt;/span&gt; had many conversations, and a few arguments, about the subject of Data Quality. The topic often arises after an executive suffers a brain explosion during a how-could-this-happen? meeting.&lt;br /&gt;&lt;br /&gt;Some folks look at Data Quality as the effort to find problems in the database, and once they are found and fixed(maybe), case closed, game over. Actually, to truly improve the quality of data, you'll need a bit of a corporate lifestyle change, and some pain and resistance to go with it. To realize what you are getting into, it's probably helpful to view information systems in a different light. So lets step away from the computer for a moment and let this thought simmer:&lt;br /&gt;&lt;br /&gt;&lt;em&gt;&lt;strong&gt;If you have data intensive operations going on in your organization then you are manufacturing one or more information products.&lt;br /&gt;&lt;/strong&gt;&lt;/em&gt;&lt;br /&gt;Reflect on this analogy [1] and notice how it changes the view of your company. The software, operating systems, and databases are the machinery. The entire staff becomes Customers – functionally dependent consumers of at least some of your information products for day to day ops or strategic decision making. Your paying customers suddenly become consumers of products you don’t even sell such as web-based order entry or any website with a “Customer &lt;span id="SPELLING_ERROR_2" class="blsp-spelling-error"&gt;Login&lt;/span&gt;” screen.&lt;br /&gt;&lt;br /&gt;The manufacturing perspective expands your product line too. Think of every quantifiable unit of information provided to the Customers and you’ll see a whole new product list. Consider every information product that your organization manufactures and ask yourself a simple question: &lt;strong&gt;&lt;em&gt;Is it right?&lt;/em&gt;&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;For Reference&lt;br /&gt;&lt;span style="font-size:85%;"&gt;1. The manufacturing perspective &lt;span id="SPELLING_ERROR_3" class="blsp-spelling-error"&gt;isn&lt;/span&gt;’t my idea. I stumbled on this back in 2003 and I think it is still very &lt;span id="SPELLING_ERROR_4" class="blsp-spelling-corrected"&gt;relevant&lt;/span&gt; as a foundation for a data quality system:&lt;br /&gt;&lt;/span&gt;&lt;a href="http://web.mit.edu/tdqm/www/tdqmpub/WangCACMFeb98.pdf"&gt;&lt;span style="font-size:85%;"&gt;A Product Perspective on Total Data Quality Management ( Richard Y. Yang)&lt;/span&gt;&lt;/a&gt;&lt;span style="font-size:85%;"&gt; &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/29711306-998743593312307742?l=www.johndz.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.johndz.com/feeds/998743593312307742/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=29711306&amp;postID=998743593312307742' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/998743593312307742'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/998743593312307742'/><link rel='alternate' type='text/html' href='http://www.johndz.com/2009/11/data-quality-manufacturing-perspective.html' title='Data Quality -- Manufacturing Perspective'/><author><name>John Dzilvelis</name><uri>http://www.blogger.com/profile/06400950302176530075</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-29711306.post-7905000255502525440</id><published>2009-10-26T22:40:00.000-06:00</published><updated>2009-10-26T22:47:44.594-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='Pentaho'/><title type='text'>Setting up Development and Production Pentaho PDI Repositories</title><content type='html'>I’ve been setting up a Pentaho Data Integration system with the goals of supporting collaboration with my team, allowing easy deployment to test or production, and enabling remote monitoring and troubleshooting of jobs and tranformations.&lt;br /&gt;&lt;br /&gt;I’ve finally figured out a way to achieve these goals, so I’ll try to pass this on now. I found the book "&lt;em&gt;Pentaho Solutions: Business Intelligence and Data Warehousing with Pentaho and MySQL&lt;/em&gt;", by Roland Bouman and Jos van Dongen to be a big help in figuring out how to export/import. It definitely helped me get up and running quickly.&lt;br /&gt;&lt;br /&gt;My first decision was to bet the farm on the use of a repository. A file based system would probably work, but I felt that it would require too much file distribution and usage of remote terminals. So I’ve setup two separate repositories hosted on MySQL databases: One for development (DEV), and one for Production (PRD). Here are the steps I followed.&lt;br /&gt;&lt;br /&gt;Create the DEV repository first and then use the Repository Explorer to configure the database connections. Simply add each connection, one each for each database referenced by your ETL process. Configure these connections point to the development database instances. The gotcha I found was that any local connections I had defined outside the repository were eventually loaded in the repository. The workaround is to simply stop maintaining connection information in your user Kettle configuration, which is maintained in the file: ~/.kettle/shared.xml. I removed this file after deciding to “Go Repo”. So we have Recommendation #1:&lt;br /&gt;&lt;br /&gt;&lt;em&gt;-- Use the repository to store database connections and do not use localy defined connections.&lt;br /&gt;&lt;/em&gt;&lt;br /&gt;In the repository explorer, create your user accounts. I created separate “administrative” accounts for my colleagues, and a special “etl_user” account to be used only for execution ( using kitchen.sh) of the jobs.&lt;br /&gt;&lt;br /&gt;Create the Production (PRD) Repository. You can cheat a bit, by using mysqldump to backup your development database and simply restore it on the host database that will be used for the production repository.&lt;br /&gt;&lt;br /&gt;Connect to the new PRD repository and open the repository explorer. Then modify the existing database connections so that they have the same name, but point to production instances of the databases. &lt;strong&gt;It is very important that you not change the name of the connections.&lt;/strong&gt; Your jobs and transformations will reference these “standardized” names. When you deploy a job to PRD ( explained below), you will not have to change the database connections if you do it this way.&lt;br /&gt;&lt;br /&gt;Now you have two (almost) identical repositories, one for Development, and one for Production. So what are you waiting for? Start developing!&lt;br /&gt;&lt;br /&gt;In the new development repository, create a subdirectory for your first real job and its transformation steps. This will help exploit the export functionality of Spoon. It is very easy to dump all of the jobs in a repository subdirectory. And that brings me to my next recommendation:&lt;br /&gt;&lt;br /&gt;&lt;em&gt;-- Bundle a job and its individual steps ( other jobs and transformations) into a single repository subdirectory. Basically anything that is used by a single line invocation of kitchen.sh belongs in this subdirectory.&lt;br /&gt;&lt;/em&gt;&lt;br /&gt;After development and testing of the job is done, you are ready to deploy this to production. This is a 2 phase process: Export your job and its components from the DEV repository, then import into the PRD repository. First some prerequisites. We need to tidy up the global options of Spoon. In spoon, choose the menu option:&lt;br /&gt;&lt;br /&gt;Edit -&gt; Options&lt;br /&gt;&lt;br /&gt;Make sure the following two options are checked:&lt;br /&gt;“Only save used connections to XML”&lt;br /&gt;“Ask about replacing existing connections on open/import”&lt;br /&gt;&lt;br /&gt;And &lt;strong&gt;&lt;span style="color:#ff0000;"&gt;UNCHECK&lt;/span&gt;&lt;/strong&gt; ( please or this whole process will break) :&lt;br /&gt;“Replace existing connections on open/import”&lt;br /&gt;&lt;br /&gt;Now you are ready to export your development job. Using the repository explorer on your DEV repository, expand the “Jobs” folder. Select the sub-folder you want to export. Right click on it and select “Export all objects to an XML file” . Using the displayed dialog box, browse to a temporary directory where you want to save to, and give the file a name. Any name will do. For good luck, call it johndz. Click OK. This automatically includes the transformations that are in the jobs. A very cool feature in my opinion.&lt;br /&gt;&lt;br /&gt;Disconnect from your DEV repository.&lt;br /&gt;&lt;br /&gt;Connect to your PRD repository, and open the repository explorer. The repository explorer has a single menu item: File. Select the menu option:&lt;br /&gt;&lt;br /&gt;File -&gt; Import repository into directory&lt;br /&gt;&lt;br /&gt;Navigate to the file that you just exported and Click OK. A dialog “Directory Selection dialog” appears. Select the root “/” line, and click OK.&lt;br /&gt;&lt;br /&gt;For first time imports, a new dialog appears: “Create Directory” . This will ask to create a new repository directory. Answer “Yes” .&lt;br /&gt;&lt;br /&gt;The import will proceed. Sometimes it can take a few minutes. When the last line of the “Repository Import” window says “Import finished”, you are done.&lt;br /&gt;&lt;br /&gt;To invoke your new job using kitchen, reference the repository, and the job. Here is a sample kitchen.sh line out of my script. ( remember that kitchen.sh references ~/.kettle/repositories.xml ) :&lt;br /&gt;&lt;br /&gt;#!/bin/ksh&lt;br /&gt;installdir=/usr/local/pentaho/data-integration/&lt;br /&gt;${installdir}/kitchen.sh -rep=dev_pentaho -user=etl_user -pass=whatever -dir='source_customers_to _staging' -job='job_load_stage_customers' -level=Basic&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/29711306-7905000255502525440?l=www.johndz.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.johndz.com/feeds/7905000255502525440/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=29711306&amp;postID=7905000255502525440' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/7905000255502525440'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/7905000255502525440'/><link rel='alternate' type='text/html' href='http://www.johndz.com/2009/10/setting-up-development-and-production.html' title='Setting up Development and Production Pentaho PDI Repositories'/><author><name>John Dzilvelis</name><uri>http://www.blogger.com/profile/06400950302176530075</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-29711306.post-1071770291102928621</id><published>2009-10-15T12:06:00.002-06:00</published><updated>2010-04-15T09:13:40.641-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='Pentaho'/><title type='text'>Displaying stored procedure result set meta data in Pentaho</title><content type='html'>This probably won't be a very well written post since I am working frantically on a proof of concept using Pentaho Data Integration / Kettle for the etl in a new data warehouse project. I have just a couple days to get it to work or I'll end up having to use perl...which will make me hurl.&lt;br /&gt;&lt;br /&gt;I want to use a mysql stored procedure for the transformation input, which is easy to do with the "Table Input" step ( just CALL the stored proc in the SQL section), but the field names of the result set don't show up downstream in subsequent steps. When I right click on a downstream step and select "show input fields", an "I Can't find any fields" messagebox pops up.&lt;br /&gt;&lt;br /&gt;Some may find this a minor annoyance, but it makes subsequent steps difficult to deal with if you cant visualize the structure of the data stream in the transformation.&lt;br /&gt;&lt;br /&gt;I saw some posts recommending the use of a "Select Values" Step, but for some reason, I could not get that to work. I spent way too much time trying several different permutations in this step.&lt;br /&gt;&lt;br /&gt;This can't really consider this a bug since it seems unreasonable to expect software to reverse engineer a stored procedure to extract result set metadata.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;RESOLUTION:&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;My work around seems to be working ok so far:&lt;br /&gt;&lt;br /&gt;1) Create a "Generate Rows" Step, with 0 ( zero) rows output. In this step, configure the fields and datatypes to match the result set of your stored procedure. This step is essentially nothing more than a specification.&lt;br /&gt;&lt;br /&gt;2) Create a "Table Input" step that calls your stored procedure.&lt;br /&gt;&lt;br /&gt;3 ) Create an "Append Streams" step that merges 1) and 2) above. Set the Head Hop to 1) above, and the Tail Hop to the step from 2) above.&lt;br /&gt;&lt;br /&gt;4) create your downstream step and create a hop to it from the "Append Streams". If you right click on this step and select "show input field fields", you can now see the structure of the stream.&lt;br /&gt;&lt;br /&gt;Here is a rough picture:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://2.bp.blogspot.com/_3oz1OTvwqJ4/Stdqjvq-EVI/AAAAAAAAA84/pwv-GYBqiUU/s1600-h/image002.jpg"&gt;&lt;img style="MARGIN: 0px 10px 10px 0px; WIDTH: 320px; FLOAT: left; HEIGHT: 240px; CURSOR: hand" id="BLOGGER_PHOTO_ID_5392896241124118866" border="0" alt="" src="http://2.bp.blogspot.com/_3oz1OTvwqJ4/Stdqjvq-EVI/AAAAAAAAA84/pwv-GYBqiUU/s320/image002.jpg" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://2.bp.blogspot.com/_3oz1OTvwqJ4/Stdqjvq-EVI/AAAAAAAAA84/pwv-GYBqiUU/s1600-h/image002.jpg"&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://2.bp.blogspot.com/_3oz1OTvwqJ4/Stdqjvq-EVI/AAAAAAAAA84/pwv-GYBqiUU/s1600-h/image002.jpg"&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/29711306-1071770291102928621?l=www.johndz.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.johndz.com/feeds/1071770291102928621/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=29711306&amp;postID=1071770291102928621' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/1071770291102928621'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/1071770291102928621'/><link rel='alternate' type='text/html' href='http://www.johndz.com/2009/10/displaying-stored-procedure-result-set.html' title='Displaying stored procedure result set meta data in Pentaho'/><author><name>John Dzilvelis</name><uri>http://www.blogger.com/profile/06400950302176530075</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_3oz1OTvwqJ4/Stdqjvq-EVI/AAAAAAAAA84/pwv-GYBqiUU/s72-c/image002.jpg' height='72' width='72'/><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-29711306.post-2044260084606922931</id><published>2009-09-21T23:03:00.000-06:00</published><updated>2009-09-21T23:31:47.812-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Hunting'/><title type='text'>Grilled Dove</title><content type='html'>&lt;span style="font-size:180%;"&gt;&lt;strong&gt;D&lt;/strong&gt;&lt;/span&gt;ove season opened on September 1 for Colorado hunters, and I was anxious to get my new 12 Ga. out in the field. I've never hunted for doves before, but my friend Mark promised a fun day and something good to take home for dinner. 130 or so shots later, I have some nice dove breasts in the freezer.&lt;br /&gt;&lt;br /&gt;Sadly, it looks like I will be eating them myself. It turns out that friends and family alike find doves to be similar to pidgeons: slow, ugly, sidewalk grazing, garbage eating, and rather un-appetizing.&lt;br /&gt;&lt;br /&gt;For some reason, The Dove never made it into the class of sexy game birds such as Pheasant, Grouse, or Quail. I think I'll try to give a little boost to my new friend, the bird of peace.&lt;br /&gt;&lt;br /&gt;First of all, they fly fast and furious, rarely stopping, except to laugh at me after I've emptied my gun. So lets call "the slow thing" Myth #1.&lt;br /&gt;&lt;br /&gt;As far as eating habits go, the 35 acre property that we hunted on was lush with wild seed bearing plants, and when we cleaned the doves we'd shot it was amazing to see how gorged they were with these seeds. So, I'd say that qualifies as Free-Range, 100% Organic.&lt;br /&gt;&lt;br /&gt;On to my favorite part of hunting, which happens to take place in the kitchen. To me, wild game is a treat, and should be prepared accordingly. I couldn't really find a good recipe on the internet, but I did find one that sounded good for the ever popular Chicken, which just so happens to be the gold-standard for determining fitness for human consumption.&lt;br /&gt;&lt;br /&gt;This recipe comes from "&lt;em&gt;Webers Big Book of Grilling&lt;/em&gt;" by Jamie Purviance and Sandra S. McRae. Its called Orange-Pine nut Chicken Breasts, but I've adapted it for the far more tasty Dove.&lt;br /&gt;&lt;br /&gt;Combine 1/2 cups each of Orange Juice, white wine, and throw in a tablespoon each of minced garlic, finely chopped Italian Parsley, and Olive Oil. Finally, add a teaspoon each of grated orange zest, dried thyme leaves and a half teaspoon of Kosher Salt.&lt;br /&gt;&lt;br /&gt;Mix all of those ingredients together and marinate the dove breasts for at least an hour ( I used a plastic sealed bag).&lt;br /&gt;&lt;br /&gt;After that long hour, I skewered the breasts and lightly brushed on some olive oil. (Save the Marinade!! )&lt;br /&gt;&lt;br /&gt;&lt;a href="http://2.bp.blogspot.com/_3oz1OTvwqJ4/Srhcwrk5KHI/AAAAAAAAA8I/Rtyeu7QV4NQ/s1600-h/dove_1.JPG"&gt;&lt;img id="BLOGGER_PHOTO_ID_5384155345922369650" style="MARGIN: 0px 10px 10px 0px; WIDTH: 320px; CURSOR: hand; HEIGHT: 240px" alt="" src="http://2.bp.blogspot.com/_3oz1OTvwqJ4/Srhcwrk5KHI/AAAAAAAAA8I/Rtyeu7QV4NQ/s320/dove_1.JPG" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;While the breasts were grilling on indirect medium heat, I toasted 1/3 cup of pine nuts and then added the entire contents of the marinade, along with a few squirts of honey. This was simmered very gently until it thickened. I also brushed some of this on the dove breasts when they were close to being done.&lt;br /&gt;&lt;br /&gt;&lt;div&gt;&lt;a href="http://2.bp.blogspot.com/_3oz1OTvwqJ4/SrhdMDkjNPI/AAAAAAAAA8Q/gorQ5pXnAgo/s1600-h/dove_3.JPG"&gt;&lt;img id="BLOGGER_PHOTO_ID_5384155816219849970" style="MARGIN: 0px 10px 10px 0px; WIDTH: 320px; CURSOR: hand; HEIGHT: 240px" alt="" src="http://2.bp.blogspot.com/_3oz1OTvwqJ4/SrhdMDkjNPI/AAAAAAAAA8Q/gorQ5pXnAgo/s320/dove_3.JPG" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;In spite of slightly over-cooking the doves, I have to say this was my finest hour on the grill. The taste was pleasant, not too gamey ( even though I like "gamey" ), tender and lean. And the pine nut sauce defintely added an exotic flavor. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;a href="http://3.bp.blogspot.com/_3oz1OTvwqJ4/Srhdbdqn8ZI/AAAAAAAAA8Y/14qibGUeORI/s1600-h/dove_4.JPG"&gt;&lt;img id="BLOGGER_PHOTO_ID_5384156080922685842" style="MARGIN: 0px 10px 10px 0px; WIDTH: 320px; CURSOR: hand; HEIGHT: 240px" alt="" src="http://3.bp.blogspot.com/_3oz1OTvwqJ4/Srhdbdqn8ZI/AAAAAAAAA8Y/14qibGUeORI/s320/dove_4.JPG" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/29711306-2044260084606922931?l=www.johndz.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.johndz.com/feeds/2044260084606922931/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=29711306&amp;postID=2044260084606922931' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/2044260084606922931'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/2044260084606922931'/><link rel='alternate' type='text/html' href='http://www.johndz.com/2009/09/grilled-dove.html' title='Grilled Dove'/><author><name>John Dzilvelis</name><uri>http://www.blogger.com/profile/06400950302176530075</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_3oz1OTvwqJ4/Srhcwrk5KHI/AAAAAAAAA8I/Rtyeu7QV4NQ/s72-c/dove_1.JPG' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-29711306.post-7293831241176228215</id><published>2009-08-26T19:53:00.000-06:00</published><updated>2009-08-26T08:16:07.920-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>file io tracing on centos ( part 1)</title><content type='html'>I'm now working in a CentOS Linux environment, and since my Solaris days are over, I'm feeling a bit handicapped by not having DTrace and iosnoop. It still amazes me that monitoring MyISAM table data and index IO is not more common, especially in heavy duty environments.&lt;br /&gt;&lt;br /&gt;My search for an analogous tracing system on Linux lead me to &lt;span style="FONT-WEIGHT: bold;font-family:courier new;" &gt;strace.&lt;/span&gt; I have been playing with this a little and although I am disappointed overall, I have been able to figure out how to gather what I need. It isn't pretty.&lt;br /&gt;&lt;br /&gt;So far, my favored invocation of strace on an active mysqld process is :&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="FONT-WEIGHT: bold;font-family:arial;" &gt;strace -q -tt -e trace=open,close,read,write,pread64,pwrite64 -s 256 -f -v -p &lt;mysql&gt;2&gt;&amp;amp;1 &lt;/mysql&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;strace by default directs it's output to STDERR, unless you specify a file ( -o ). I wanted the output to go to STDOUT so I can pipe it into a script that will roll up the output and report by the minute. My goal will be to print an example output something like :&lt;br /&gt;(blogger doesn't usually honor my formatting..not sure how this will look)&lt;br /&gt;&lt;br /&gt;&lt;span style="FONT-WEIGHT: bold;font-size:85%;" &gt;&lt;span style="font-family:arial;"&gt;time file MbReadMin ReadMin MbWriteMin WriteMin &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;==== ======================== &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;13:04 /mnt/mysql1/customers.MYI 26 134 0 0 &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;13:04 /mnt/mysql1/sales_summary.MYD 0 0 15 124 &lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;This involves a bit of parsing of the output depending on the syscall. The open() syscall gives the actual file name and the "id" , which can be cross referenced with the pread64() and pwrite64() functions, where I am getting the size.&lt;br /&gt;&lt;br /&gt;The problem is that you can only get the file name if the table was not open before you invoke strace. And beware, this is a cpu hog.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/29711306-7293831241176228215?l=www.johndz.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.johndz.com/feeds/7293831241176228215/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=29711306&amp;postID=7293831241176228215' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/7293831241176228215'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/7293831241176228215'/><link rel='alternate' type='text/html' href='http://www.johndz.com/2009/08/file-io-tracing-on-centos-part-1.html' title='file io tracing on centos ( part 1)'/><author><name>John Dzilvelis</name><uri>http://www.blogger.com/profile/06400950302176530075</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-29711306.post-3168667941020253701</id><published>2009-08-15T08:52:00.000-06:00</published><updated>2009-08-16T21:16:33.998-06:00</updated><title type='text'>parting shots</title><content type='html'>This was my last week working at Wiland Direct and as a parting shot, I really wanted to get this project out of the way. The adventures started on Saturday with the rebuilding of a data warehouse server. The goal was to convert the existing ZFS file system to UFS, and standardize the disk layouts of the MyISAM tables to a new scheme that was more friendly to our day to day processes. This dw server, one of 5, was an experiment with ZFS. My beef with ZFS was that I could not trace table level data and index IO using Dtrace ( &lt;a href="http://opensolaris.org/jive/thread.jspa?threadID=70175"&gt;http://opensolaris.org/jive/thread.jspa?threadID=70175&lt;/a&gt; ) .&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The game plan was pretty simple:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Halt Processing&lt;/li&gt;&lt;li&gt;Backup&lt;/li&gt;&lt;li&gt;Validate backup&lt;/li&gt;&lt;li&gt;Shutdown MySql&lt;/li&gt;&lt;li&gt;Capture "before" catalog of all database files. ( my backup system does this). &lt;/li&gt;&lt;li&gt;Remove existing file systems and database files. &lt;/li&gt;&lt;li&gt;Create new file systems. &lt;/li&gt;&lt;li&gt;Touch and symbolic link "empty" database files to the default data directory. &lt;/li&gt;&lt;li&gt;Restore ( cp -p honors symbolic links) &lt;/li&gt;&lt;li&gt;Validate restore. &lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;This went well. And it was so much fun, that we decided to do the remaining 4 servers, at the same time, on Tuesday night. It was a long day, but I went fishing during the backup phase and caught this guy:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;img id="BLOGGER_PHOTO_ID_5370763302473636834" style="DISPLAY: block; MARGIN: 0px auto 10px; WIDTH: 200px; CURSOR: hand; HEIGHT: 150px; TEXT-ALIGN: center" alt="" src="http://4.bp.blogspot.com/_3oz1OTvwqJ4/SojIxKClG-I/AAAAAAAAA8A/rgNwAFvbg-8/s200/0811091927.jpg" border="0" /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;When all was said and done, 5 transaction servers, with 53,000 tables consuming 2.5 Tera-bytes of MyISAM data and index were backed up, rebuilt, and restored.&lt;br /&gt;-----------------------------------------------------&lt;br /&gt;&lt;br /&gt;Friday was my last day before starting a new job at Lijit Networks. The many kind wishes from my former colleagues leaves me truly humbled. This was a really great gig with some awesome folks.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/29711306-3168667941020253701?l=www.johndz.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.johndz.com/feeds/3168667941020253701/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=29711306&amp;postID=3168667941020253701' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/3168667941020253701'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/3168667941020253701'/><link rel='alternate' type='text/html' href='http://www.johndz.com/2009/08/parting-shots.html' title='parting shots'/><author><name>John Dzilvelis</name><uri>http://www.blogger.com/profile/06400950302176530075</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_3oz1OTvwqJ4/SojIxKClG-I/AAAAAAAAA8A/rgNwAFvbg-8/s72-c/0811091927.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-29711306.post-6251720295244222021</id><published>2008-11-14T07:07:00.001-07:00</published><updated>2009-08-14T00:18:03.887-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>On-the-fly Mysql Query tracing with DTrace</title><content type='html'>Hearing that the MySQL database server is delivering corrupt data is not how I like to start the day, but that was the greeting when I arrived to work a couple days ago.&lt;br /&gt;&lt;br /&gt;The claim was that an INSERT...SELECT query buried in a batch process was producing a different result than running the same query manually via the mysql client.&lt;br /&gt;&lt;br /&gt;Since this is a production system, and a busy one at that, we don't keep the general log enabled. Restarting the server to enable logging is not an option during peak loads either.&lt;br /&gt;&lt;br /&gt;Luckily, I stumbled on a DTrace script out there in internet land and it let me see exactly what was happening. I wish I can remember where I got it so I could give due credit. Here is the script:&lt;br /&gt;&lt;br /&gt;#!/usr/sbin/dtrace -qs&lt;br /&gt;pid$1::*dispatch_command*:entry&lt;br /&gt;{&lt;br /&gt;printf("%d::%s\n",tid,copyinstr(arg2));&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;Thats it. Just execute it on your Solaris 10 server passing the mysqld pid as a command line parameter and you see each query issued to the server. When you have seen enough, just cntl + c to quit.&lt;br /&gt;&lt;br /&gt;I used this to trace the batch process and in no time, we determined that there was not one, but two inserts buried within. MySQL was acquitted. Case closed.&lt;br /&gt;&lt;br /&gt;----------&lt;br /&gt;This hardly scratches the surface of what I believe DTrace can be used for. I've used it a bit to trace some function calls in mysqld ( using the pid provider), and it does offer an interesting glimpse into MySQL's inner self.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/29711306-6251720295244222021?l=www.johndz.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.johndz.com/feeds/6251720295244222021/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=29711306&amp;postID=6251720295244222021' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/6251720295244222021'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/6251720295244222021'/><link rel='alternate' type='text/html' href='http://www.johndz.com/2008/11/dtrace-saves-day_14.html' title='On-the-fly Mysql Query tracing with DTrace'/><author><name>John Dzilvelis</name><uri>http://www.blogger.com/profile/06400950302176530075</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-29711306.post-1709374848433517791</id><published>2008-09-25T20:43:00.001-06:00</published><updated>2009-08-14T00:18:40.338-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Preventing self-inflicted wounds in MySQL</title><content type='html'>&lt;span style="font-size:180%;"&gt;My&lt;/span&gt; 8 year old son is learning to ride a skateboard and his most recent trick was to skip wearing a helmet. As a father, I had no choice but to enforce that safety requirement. I'll gladly be unpopular in order to avoid seeing my son with a head injury.&lt;br /&gt;&lt;br /&gt;Today, a user accidentally deleted 100 or so rows from a critical table in the MySQL database, with the "aid" of one of the popular and free GUI query utilities available. While I loath the use of these utilities, I understand that sometimes ad-hoc updates are unavoidable, especially in new systems.&lt;br /&gt;&lt;br /&gt;It wasn't a problem for me to recover, and actually in a sick way, I enjoy recoveries. Wierdly, I even practice them sometimes, mostly so I don't look like a fool when it's showtime for real. Fun stuff as long as the backups are good!&lt;br /&gt;&lt;br /&gt;But the fact is that I would be negligent as a DBA if I didn't try to find ways to motivate our teams to operate safely. I wish the popular query utilities had built in safety mechanisms, but since they don't, I have asked the teams to execute these statements in our InnoDB databases before performing uncontrolled data manipulation:&lt;br /&gt;&lt;br /&gt;&lt;span style="COLOR: rgb(51,0,153)"&gt;-&lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="COLOR: rgb(51,0,153);font-family:courier new;" &gt;- requires explicit "commit" for changes to take affect.&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;SET autocommit = 0 ;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&lt;span style="COLOR: rgb(51,0,153)"&gt;-- forces "well formed" dml queries&lt;/span&gt; &lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&lt;br /&gt;SET sql_mode='TRADITIONAL';&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="COLOR: rgb(51,0,153);font-family:courier new;" &gt;-- requires that you specify key values in the WHERE clause of your update statements&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;SET sql_safe_updates=1 ;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;START TRANSACTION ;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;insert...&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&lt;br /&gt;update....&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;select row_count() as "rows_updated" ;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;delete ....&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&lt;br /&gt;select row_count() as "rows_deleted" ;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&lt;span style="COLOR: rgb(51,0,153)"&gt;-- review results:&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;select....from...where....&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="COLOR: rgb(51,0,153)"&gt;-- Flip commented lines after initial review. &lt;/span&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&lt;br /&gt;rollback ;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;-- commit ; &lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Finally, I try to encourage folks to contact me ahead of time so we can coordinate a backout plan. Often I can make a hot backup of the data before the change so we can backtrack if things go wrong.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/29711306-1709374848433517791?l=www.johndz.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.johndz.com/feeds/1709374848433517791/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=29711306&amp;postID=1709374848433517791' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/1709374848433517791'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/1709374848433517791'/><link rel='alternate' type='text/html' href='http://www.johndz.com/2008/09/preventing-self-inflicted-mysql-wounds_25.html' title='Preventing self-inflicted wounds in MySQL'/><author><name>John Dzilvelis</name><uri>http://www.blogger.com/profile/06400950302176530075</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-29711306.post-5887189545183824204</id><published>2008-09-16T08:28:00.000-06:00</published><updated>2009-08-09T23:06:42.892-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Every DBA's Nightmare -- Got backups?</title><content type='html'>Here's a wakeup call to all the good folks on planet MySQL :&lt;br /&gt;&lt;br /&gt;&lt;a href="http://neighbors.denverpost.com/viewtopic.php?f=56&amp;amp;t=123003000"&gt;Denver Post: Gone: Comments...&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Today's a great day to:&lt;br /&gt;&lt;br /&gt;1) Review your backup logs.&lt;br /&gt;2) Perform a recovery drill on a development server.&lt;br /&gt;3) Validate the backed up data by running checksums, comparing tables or file sizes, etc.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/29711306-5887189545183824204?l=www.johndz.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.johndz.com/feeds/5887189545183824204/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=29711306&amp;postID=5887189545183824204' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/5887189545183824204'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/5887189545183824204'/><link rel='alternate' type='text/html' href='http://www.johndz.com/2008/09/every-dba-nightmare-got-backups.html' title='Every DBA&amp;#39;s Nightmare -- Got backups?'/><author><name>John Dzilvelis</name><uri>http://www.blogger.com/profile/06400950302176530075</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-29711306.post-4140814693577928168</id><published>2008-08-15T06:03:00.000-06:00</published><updated>2009-08-14T00:19:26.097-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='Solaris DTrace'/><title type='text'>MyISAM table level IO on Solaris using Dtrace</title><content type='html'>I've stumbled around for awhile searching for a way to observe IO at the table or file level on a MyISAM engine. Since I'm running on Solaris 10 I decided to explore dtrace and in my travels I found some handy scripts that made it easier.&lt;br /&gt;&lt;br /&gt;I used iosnoop which can be found here : &lt;a href="http://brendangregg.com/dtrace.html#DTraceToolkit"&gt;http://brendangregg.com/dtrace.html#DTraceToolkit&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;iosnoop can be pointed at a specific pid, in my case the pid of the mysqld process:&lt;br /&gt;&lt;br /&gt;iosnoop -v -p123456&lt;br /&gt;&lt;br /&gt;Since the probe fires when IO happens, this will send out quite a bit of output to STDOUT. So I wrote a perl script to call iosnoop and summarize Read and write IO Counts and Bytes by the minute. I still have more testing and refinements but it seems to work correctly. I have probably over-scaled the bytes by converting to Mbytes.&lt;br /&gt;&lt;br /&gt;Here is a sample output:&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt;&lt;br /&gt;HR:MN FILE R_Mbyte R_count W_Mbyte W_count&lt;br /&gt;00:08 /p732/cust/cust_xref_3.MYI 0 0 0 3&lt;br /&gt;00:08 /p749/cust/sales_fact_6.MYI 1 152 0 0&lt;br /&gt;00:08 /p744/cust/sales_fact_2.MYD 2 326 0 0&lt;br /&gt;00:08 /p745/cust/sales_fact_3.MYD 27 3527 0 0&lt;br /&gt;00:08 /p746/cust/sales_fact_4.MYD 12 1554 0 0&lt;br /&gt;00:08 /p743/cust/sales_fact_1.MYD 4 572 0 0&lt;br /&gt;00:08 /p738/cust/cust_xref_9.MYI 0 1 0 5&lt;br /&gt;00:08 /p747/cust/sales_fact_4.MYI 1 214 0 0&lt;br /&gt;00:08 /p741/cust/sales_fact_9.MYD 11 1546 0 0&lt;br /&gt;00:08 /p740/cust/sales_fact_7.MYI 1 175 0 0&lt;br /&gt;00:08 /p747/cust/sales_fact_5.MYD 4 596 0 0&lt;br /&gt;00:08 /p743/cust/sales_fact_0.MYI 0 130 0 0&lt;br /&gt;00:09 /p745/cust/sales_fact_2.MYI 1 166 0 0&lt;br /&gt;00:09 /p737/cust/cust_xref_8.MYI 0 0 0 7&lt;br /&gt;00:09 /p742/cust/sales_fact_0.MYD 15 1963 0 0&lt;br /&gt;00:09 /p749/cust/sales_fact_7.MYD 4 653 0 0&lt;br /&gt;00:09 /p742/cust/sales_fact_9.MYI 0 114 0 0&lt;br /&gt;00:09 /p748/cust/sales_fact_6.MYD 11 1514 0 0&lt;br /&gt;00:09 /p744/cust/sales_fact_1.MYI 1 220 0 0&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/29711306-4140814693577928168?l=www.johndz.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.johndz.com/feeds/4140814693577928168/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=29711306&amp;postID=4140814693577928168' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/4140814693577928168'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/4140814693577928168'/><link rel='alternate' type='text/html' href='http://www.johndz.com/2008/08/myisam-table-level-io-on-solaris-using.html' title='MyISAM table level IO on Solaris using Dtrace'/><author><name>John Dzilvelis</name><uri>http://www.blogger.com/profile/06400950302176530075</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-29711306.post-5285429981418234367</id><published>2008-02-03T00:21:00.000-07:00</published><updated>2009-08-14T01:13:12.798-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Dodge Cummins Diesel'/><title type='text'>Valley Cover Gasket Replacement</title><content type='html'>&lt;div&gt;&lt;div&gt;&lt;div&gt;My Dodge Diesel developed an oil leak on the drivers side of the engine, just below the head gasket. At first I thought it might actually be a blown head gasket, but after looking deeper I learned that it was leaking around the gasket of the "valley cover" also known as the Pushrod Cover.&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This was definitely the most daunting repair I've ever attempted, mostly because I also had to remove the vp44 injection pump. I looked around on the internet and found this &lt;a href="http://www.youtube.com/watch?v=lPZR5an9LCY"&gt;instruction video&lt;/a&gt;, and I followed this procedure as precisely as a could...right down to replacing the camshaft position sensor as he recommends.&lt;br /&gt;&lt;/div&gt;&lt;div&gt;What made this project so interesting was: &lt;/div&gt;&lt;ul&gt;&lt;li&gt;This was my first winter in Colorado and it was freakin' cold. I had to do the repairs in my driveway between snow storms. &lt;/li&gt;&lt;br /&gt;&lt;li&gt;I got to do it twice. Yes, twice. After I got the entire system back together, I started the engine and it still leaked oil. As it turned out, the actual cover was slightly warped. So the second time around, I replaced the cover as well as the gasket. The cost of the pushrod cover was something like $40...so if you have to do this repair, I'd recommend you splurge and do the same.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;I took some pictures, mostly to help in case I forgot how something went together. &lt;/p&gt;&lt;div&gt;Here the intake manifold is gone, next is the injection lines:&lt;/div&gt;&lt;img id="BLOGGER_PHOTO_ID_5369708952255842210" style="DISPLAY: block; MARGIN: 0px auto 10px; WIDTH: 320px; CURSOR: hand; HEIGHT: 240px; TEXT-ALIGN: center" alt="" src="http://4.bp.blogspot.com/_3oz1OTvwqJ4/SoUJ15Zxx6I/AAAAAAAAA7M/RnMUU-m1gJ4/s320/100_1507.JPG" border="0" /&gt; &lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;The injection pump is gone...the pushrod cover, with the rust stain is behind it.&lt;br /&gt;&lt;img id="BLOGGER_PHOTO_ID_5369709849136801698" style="DISPLAY: block; MARGIN: 0px auto 10px; WIDTH: 320px; CURSOR: hand; HEIGHT: 240px; TEXT-ALIGN: center" alt="" src="http://3.bp.blogspot.com/_3oz1OTvwqJ4/SoUKqGi3J6I/AAAAAAAAA7U/mE2nhU2MCDU/s320/100_1515.JPG" border="0" /&gt;&lt;br /&gt;I bought plastic caps to cover the ports on the injection pump. I placed the injection lines in plastic garbage bags and sealed them shut.... One spider can ruin a very expensive injection system:&lt;/div&gt;&lt;img id="BLOGGER_PHOTO_ID_5369710472695128434" style="DISPLAY: block; MARGIN: 0px auto 10px; WIDTH: 320px; CURSOR: hand; HEIGHT: 240px; TEXT-ALIGN: center" alt="" src="http://1.bp.blogspot.com/_3oz1OTvwqJ4/SoULOZewIXI/AAAAAAAAA7c/VT5hom_cBYo/s320/100_1516.JPG" border="0" /&gt; &lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;Pushrods exposed after I removed the cover: &lt;/div&gt;&lt;img id="BLOGGER_PHOTO_ID_5369711088959683218" style="DISPLAY: block; MARGIN: 0px auto 10px; WIDTH: 320px; CURSOR: hand; HEIGHT: 240px; TEXT-ALIGN: center" alt="" src="http://3.bp.blogspot.com/_3oz1OTvwqJ4/SoULyRPq-pI/AAAAAAAAA7k/_n6J0N7E1iQ/s320/100_1518.JPG" border="0" /&gt; &lt;div&gt;&lt;/div&gt;&lt;p&gt;&lt;/p&gt;&lt;div&gt;Running Engine after I got it back together.&lt;/div&gt;&lt;div&gt; &lt;/div&gt;&lt;p&gt;&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;br /&gt;&lt;object width="320" height="266" class="BLOG_video_class" id="BLOG_video-ee894f011221d29b" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"&gt;&lt;param name="movie" value="http://www.youtube.com/get_player"&gt;&lt;param name="bgcolor" value="#FFFFFF"&gt;&lt;param name="allowfullscreen" value="true"&gt;&lt;param name="flashvars" value="flvurl=http://v10.nonxt2.googlevideo.com/videoplayback?id%3Dee894f011221d29b%26itag%3D5%26app%3Dblogger%26ip%3D0.0.0.0%26ipbits%3D0%26expire%3D1331234607%26sparams%3Did,itag,ip,ipbits,expire%26signature%3D690F35673C5F5B2168A08E2B16281F8581865C75.20AFAA24F662E570E024E49F4412A30F187424CF%26key%3Dck1&amp;amp;iurl=http://video.google.com/ThumbnailServer2?app%3Dblogger%26contentid%3Dee894f011221d29b%26offsetms%3D5000%26itag%3Dw160%26sigh%3DPcBkWEWM5x6S-MWZOWXHF-j9qMw&amp;amp;autoplay=0&amp;amp;ps=blogger"&gt;&lt;embed src="http://www.youtube.com/get_player" type="application/x-shockwave-flash"width="320" height="266" bgcolor="#FFFFFF"flashvars="flvurl=http://v10.nonxt2.googlevideo.com/videoplayback?id%3Dee894f011221d29b%26itag%3D5%26app%3Dblogger%26ip%3D0.0.0.0%26ipbits%3D0%26expire%3D1331234607%26sparams%3Did,itag,ip,ipbits,expire%26signature%3D690F35673C5F5B2168A08E2B16281F8581865C75.20AFAA24F662E570E024E49F4412A30F187424CF%26key%3Dck1&amp;iurl=http://video.google.com/ThumbnailServer2?app%3Dblogger%26contentid%3Dee894f011221d29b%26offsetms%3D5000%26itag%3Dw160%26sigh%3DPcBkWEWM5x6S-MWZOWXHF-j9qMw&amp;autoplay=0&amp;ps=blogger"allowFullScreen="true" /&gt;&lt;/object&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/29711306-5285429981418234367?l=www.johndz.com' alt='' /&gt;&lt;/div&gt;</content><link rel='enclosure' type='video/mp4' href='http://www.blogger.com/video-play.mp4?contentId=ee894f011221d29b&amp;type=video%2Fmp4' length='0'/><link rel='replies' type='application/atom+xml' href='http://www.johndz.com/feeds/5285429981418234367/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=29711306&amp;postID=5285429981418234367' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/5285429981418234367'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/5285429981418234367'/><link rel='alternate' type='text/html' href='http://www.johndz.com/2008/02/valley-cover-gasket-replacement.html' title='Valley Cover Gasket Replacement'/><author><name>John Dzilvelis</name><uri>http://www.blogger.com/profile/06400950302176530075</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_3oz1OTvwqJ4/SoUJ15Zxx6I/AAAAAAAAA7M/RnMUU-m1gJ4/s72-c/100_1507.JPG' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-29711306.post-5607230998392792406</id><published>2008-01-31T20:33:00.000-07:00</published><updated>2009-08-09T22:26:17.305-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Unusable space in MySQL Key Buffers</title><content type='html'>&lt;span style="font-family:arial;"&gt;What started out as a key buffer efficiency study turned to a fruitless search for ways to overcome a somewhat high ( ~20%) amount of unusable key buffer space. This behavior seems to be platform and version independent as I observed this with versions 5.0.45, 5.0.52, and 5.1.22 on Solaris 10, Windows XP and Linux.&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:arial;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;I'm not the first person to discover this, but I haven't had much luck finding an explanation. Just like this poor soul whose post fell into the swirling vortex of the un-interesting:&lt;br /&gt;&lt;/span&gt;&lt;a href="http://forums.mysql.com/read.php?20,127934"&gt;&lt;span style="font-family:arial;"&gt;http://forums.mysql.com/read.php?20,127934&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family:arial;"&gt; &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;Finding usage, and physical read/write information about individual key buffers is slightly obscure. The only way I know of is to run "mysqladmin debug" and view the information in the MySQL server error log.&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:arial;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;The output looks like this: &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;font-size:85%;"&gt;Key caches: &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;font-size:85%;"&gt;default&lt;br /&gt;Buffer_size: 16777216&lt;br /&gt;Block_size: 1024&lt;br /&gt;Division_limit: 100&lt;br /&gt;Age_limit: 300&lt;br /&gt;blocks used: 0&lt;br /&gt;not flushed: 0&lt;br /&gt;w_requests: 0&lt;br /&gt;writes: 0&lt;br /&gt;r_requests: 0&lt;br /&gt;reads: 0&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;font-size:85%;"&gt;custcache&lt;br /&gt;Buffer_size: 27262976&lt;br /&gt;Block_size: 1024&lt;br /&gt;Division_limit: 100&lt;br /&gt;Age_limit: 300&lt;br /&gt;blocks used: 23513&lt;br /&gt;not flushed: 0&lt;br /&gt;w_requests: 0&lt;br /&gt;writes: 0&lt;br /&gt;r_requests: 24479&lt;br /&gt;reads: 24479&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;I experimented by creating and populating an indexed MYISAM table , and then binding the index to key caches of a varying sizes. My hopes where to stumble on a way to harvest that other 20 percent. I tried adjusting the division_limit, which changes the replacement strategy, but found this had no effect.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;As a result, I've given up trying to use the entire configured key buffer size...for now. Instead, I am exploring how the "MAX_ROWS" option in the CREATE TABLE statement affects index size. I'd rather get some more empirical data before saying much more. But I have found that by sizing MAX_ROWS to be slightly higher than the actual number of rows, the index size is reduced and therefore can fit in a smaller key buffer.&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;The gory details of my experiments:&lt;br /&gt;---------------------------------&lt;br /&gt;&lt;strong&gt;TEST 1: KEY BUFFER TOO SMALL:&lt;/strong&gt;&lt;br /&gt;---------------------------------&lt;br /&gt;customers index file is 52 meg. created custcache with 8 meg. loaded customers index.&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&lt;span style="font-size:85%;"&gt;set @@global.custcache.key_buffer_size=8*1024*1024 ;&lt;br /&gt;cache index customers in custcache ;&lt;br /&gt;load index into cache customers ;&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size:+0;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:arial;"&gt;Result from MySQL error log after running " mysqladmin debug":&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;font-size:85%;"&gt;custcache&lt;br /&gt;Buffer_size: 8388608&lt;br /&gt;Block_size: 1024&lt;br /&gt;Division_limit: 100&lt;br /&gt;Age_limit: 300&lt;br /&gt;blocks used: 7173&lt;br /&gt;not flushed: 0&lt;br /&gt;w_requests: 0&lt;br /&gt;writes: 0&lt;br /&gt;r_requests: 51795&lt;br /&gt;reads: 51795&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;Ran query to see hit rate ( explain executed to show plan) :&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;font-size:85%;"&gt;select sql_no_cache customer_id from customers ;&lt;br /&gt;mysql&gt; explain select sql_no_cache customer_id from customers ;&lt;br /&gt;+----+-------------+-----------+-------+---------------+---------+---------+------+--------+-------------+&lt;br /&gt;id select_type table type possible_keys key key_len ref rows Extra&lt;br /&gt;+----+-------------+-----------+-------+---------------+---------+---------+------+--------+-------------+&lt;br /&gt;1 SIMPLE customers index NULL PRIMARY 4 NULL 921338 Using index&lt;br /&gt;+----+-------------+-----------+-------+---------------+---------+---------+------+--------+-------------+&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;Result from MySQL error log after running " mysqladmin debug":&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;font-size:85%;"&gt;custcache&lt;br /&gt;Buffer_size: 8388608&lt;br /&gt;Block_size: 1024&lt;br /&gt;Division_limit: 100&lt;br /&gt;Age_limit: 300&lt;br /&gt;blocks used: 7173&lt;br /&gt;not flushed: 0&lt;br /&gt;w_requests: 0&lt;br /&gt;writes: 0&lt;br /&gt;r_requests: 113466&lt;br /&gt;reads: 62146&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;&lt;strong&gt;Interpretation&lt;/strong&gt;: The index was not completely cached so there were some physical index reads during the select statement.&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;---------------------------------&lt;br /&gt;&lt;strong&gt;TEST 2: KEY BUFFER TOO BIG:&lt;br /&gt;&lt;/strong&gt;---------------------------------&lt;br /&gt;restarted mysql server. created custcache with 80 meg. loaded customers index.&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;font-size:85%;"&gt;set @@global.custcache.key_buffer_size=80*1024*1024 ;&lt;br /&gt;cache index customers in custcache ;&lt;br /&gt;load index into cache customers ;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;Result from MySQL error log after running " mysqladmin debug":&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;font-size:85%;"&gt;custcache&lt;br /&gt;Buffer_size: 83886080&lt;br /&gt;Block_size: 1024&lt;br /&gt;Division_limit: 100&lt;br /&gt;Age_limit: 300&lt;br /&gt;blocks used: 51795&lt;br /&gt;not flushed: 0&lt;br /&gt;w_requests: 0&lt;br /&gt;writes: 0&lt;br /&gt;r_requests: 51795&lt;br /&gt;reads: 51795&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;ran query to see hit rate:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;font-size:85%;"&gt;select sql_no_cache customer_id from customers ;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;Result from MySQL error log after running " mysqladmin debug":&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;font-size:85%;"&gt;custcache&lt;br /&gt;Buffer_size: 83886080&lt;br /&gt;Block_size: 1024&lt;br /&gt;Division_limit: 100&lt;br /&gt;Age_limit: 300&lt;br /&gt;blocks used: 51795&lt;br /&gt;not flushed: 0&lt;br /&gt;w_requests: 0&lt;br /&gt;writes: 0&lt;br /&gt;r_requests: 113466&lt;br /&gt;reads: 51795&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;&lt;strong&gt;Interpretation:&lt;/strong&gt; The index was completely cached so there were no physical index reads during the select statement. Note also that the blocks used section is very close to the actual index file (customers.MYI) size.&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;---------------------------------&lt;br /&gt;&lt;strong&gt;TEST 3: KEY BUFFER SLIGHTLY LARGER THAN THE INDEX:&lt;/strong&gt;&lt;br /&gt;--------------------------------- &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;restarted mysql server. created custcache with 54 meg. loaded customers index.&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;font-size:85%;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;font-size:85%;"&gt;set @@global.custcache.key_buffer_size=54*1024*1024 ;&lt;br /&gt;cache index customers in custcache ;&lt;br /&gt;load index into cache customers ;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;font-size:85%;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;Result from MySQL error log after running " mysqladmin debug":&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;font-size:85%;"&gt;custcache&lt;br /&gt;Buffer_size: 56623104&lt;br /&gt;Block_size: 1024&lt;br /&gt;Division_limit: 100&lt;br /&gt;Age_limit: 300&lt;br /&gt;blocks used: 48346&lt;br /&gt;not flushed: 0&lt;br /&gt;w_requests: 0&lt;br /&gt;writes: 0&lt;br /&gt;r_requests: 51795&lt;br /&gt;reads: 51795&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;ran query to see hit rate:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&lt;span style="font-family:courier new;font-size:85%;"&gt;select sql_no_cache customer_id from customers ;&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;Result from MySQL error log after running " mysqladmin debug":&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;font-size:85%;"&gt;custcache&lt;br /&gt;Buffer_size: 56623104&lt;br /&gt;Block_size: 1024&lt;br /&gt;Division_limit: 100&lt;br /&gt;Age_limit: 300&lt;br /&gt;blocks used: 48346&lt;br /&gt;not flushed: 0&lt;br /&gt;w_requests: 0&lt;br /&gt;writes: 0&lt;br /&gt;r_requests: 113466&lt;br /&gt;reads: 53099&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;&lt;strong&gt;Interpretation&lt;/strong&gt;: There were some physical reads. Even though the key buffer should be big enough to hold the index, there is some empty space.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/29711306-5607230998392792406?l=www.johndz.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.johndz.com/feeds/5607230998392792406/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=29711306&amp;postID=5607230998392792406' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/5607230998392792406'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/5607230998392792406'/><link rel='alternate' type='text/html' href='http://www.johndz.com/2008/01/unusable-space-in-mysql-key-buffers.html' title='Unusable space in MySQL Key Buffers'/><author><name>John Dzilvelis</name><uri>http://www.blogger.com/profile/06400950302176530075</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-29711306.post-1209516891133502648</id><published>2008-01-14T21:06:00.000-07:00</published><updated>2009-08-14T00:20:14.883-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>MySQL version 5.0.52 Upgrade</title><content type='html'>Last night I performed an upgrade of 5 of our production servers to version 5.0.52 ( solaris 10 / 64 bit x86). We have been testing for over a month and in the 11th hour discovered that there is a bug in the &lt;strong&gt;mysql&lt;/strong&gt; command line utility.&lt;br /&gt;&lt;br /&gt;I was creating a blank copy of one of our production databases ( schema only) on an upgraded test system using a script generated by our design tool. During this, I encountered an interesting error that prevented the creation of a table with a column named "extra_file_delimiter". Further testing revealed that the 5.0.52 through 5.0.54 releases have a bug in the &lt;strong&gt;mysql&lt;/strong&gt; client which parses the word "delimiter" incorrectly.&lt;br /&gt;&lt;br /&gt;As a work-around, I've also deployed the 5.0.50 release in a separate directory and replaced the 5.0.52 &lt;strong&gt;mysql&lt;/strong&gt; utility with a symbolic link to the 5.0.50 version. Ugly for sure, but we decided to do that since we had so much time invested in the pre-deployment testing of the 5.0.52 version.&lt;br /&gt;&lt;br /&gt;MySQL tech support has created a bug for it so I'll have to wait and see where that goes.&lt;br /&gt;&lt;br /&gt;Nevertheless, the upgraded servers appear to be performing well. The only real glitch I have been fighting is the fact that the MySQL Enterprise Monitor agent won't run on one of the upgraded servers...the other 4 work fine so right now I'm baffled.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/29711306-1209516891133502648?l=www.johndz.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.johndz.com/feeds/1209516891133502648/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=29711306&amp;postID=1209516891133502648' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/1209516891133502648'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/1209516891133502648'/><link rel='alternate' type='text/html' href='http://www.johndz.com/2008/01/mysql-version-5052-upgrade.html' title='MySQL version 5.0.52 Upgrade'/><author><name>John Dzilvelis</name><uri>http://www.blogger.com/profile/06400950302176530075</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-29711306.post-6658958026077860235</id><published>2008-01-02T22:08:00.000-07:00</published><updated>2009-08-14T00:20:01.330-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='data quality'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>SQL_MODE and MySQL Data Quality</title><content type='html'>&lt;p&gt;&lt;span style="font-family:arial;"&gt;As my former boss will attest, I have a reputation for being a bit of a data quality zealot. The storage of data that is unfit for use leads to many problems, but I suppose that’s another subject for another day.&lt;br /&gt;&lt;br /&gt;It’s tough enough to manage data quality problems introduced by source code errors, system failures, and requirements misunderstandings…But a default installation of MySQL introduces a new and exciting way to give us data quality evangelists fits: It allows unfit data to be inserted in the database. That’s the bad news. The good news is that by making a simple configuration change you can prevent this, and override the setting when you don’t care.&lt;br /&gt;&lt;br /&gt;In a default MySQL installation, the value of the SQL_MODE system variable is set to ‘’. This allows you to force inserts and updates that may violate the intended design of the table. This point is more philosophical than technical, but in a mission critical database I believe that your first line of defense against poor data is requiring well formed data manipulation statements.&lt;br /&gt;&lt;br /&gt;Here is an example:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;font-size:85%;"&gt;Create table jd_test ( id int NOT NULL , name varchar(30) NOT NULL ) ;&lt;br /&gt;&lt;br /&gt;Insert into jd_test ( id ) values ( 1) ;&lt;br /&gt;&lt;br /&gt;Select * from jd_test ;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;In my opinion, this insert is ill-formed since it is missing non-optional columns in the specification. Therefore, I prefer that the statement fail and return an error to the application. In this example, with SQL_MODE = ‘’, MySQL returns a warning but the data still gets inserted with an empty string assigned to the name column.&lt;br /&gt;&lt;br /&gt;Try that example with a DATETIME column and you will see similar behavior.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Recommendations&lt;/strong&gt;:&lt;br /&gt;&lt;br /&gt;First of all, check the value of SQL_MODE to see what it is set at. This is defined at both the global ( server-wide) and session level. Some third party utilities ( like SQLYog) set the session level value to ‘’, overriding the server value.&lt;br /&gt;&lt;br /&gt;Set the global value of the SQL_MODE to ‘TRADITIONAL’ in the server configuration file ( /etc/my.cnf). Hold it! Be careful here because if your MySQL database serves an existing production system, then you may have to test the effects of this in a development or test environment first to see what breaks. For new databases, start off with this setting on your development, testing, and production systems. It’s best to be in the habit of working with this setting before your system goes live in production.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Encourage your developers to explicitly check the session value of this variable :&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;p align="left"&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-family:courier new;"&gt;Select @@session.sql_mode&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Discourage development of application code that specifically overrides the value of SQL_MODE at the session level with the exception of clearly defined processes requiring it.&lt;br /&gt;&lt;br /&gt;When developing stored procedures and triggers, explicitly set the SQL_MODE to ‘TRADITIONAL’ at the top of the source code file before the “CREATE…” statement. These server objects enforce the value of SQL_MODE in effect at the time of creation.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;Set session SQL_MODE = ‘TRADITIONAL’ ;&lt;br /&gt;CREATE PROCEDURE myproc()&lt;br /&gt;…..&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Recommended Reading:&lt;br /&gt;&lt;br /&gt;Section 5.2.6 “SQL Modes” of the Mysql 5.1 Reference Manual&lt;br /&gt;&lt;/p&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/29711306-6658958026077860235?l=www.johndz.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.johndz.com/feeds/6658958026077860235/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=29711306&amp;postID=6658958026077860235' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/6658958026077860235'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/29711306/posts/default/6658958026077860235'/><link rel='alternate' type='text/html' href='http://www.johndz.com/2008/01/sqlmode-and-mysql-data-quality.html' title='SQL_MODE and MySQL Data Quality'/><author><name>John Dzilvelis</name><uri>http://www.blogger.com/profile/06400950302176530075</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
