
Fri, August 11, 2006 - 5:48 AM
I wake up every morning at 5 a.m. I know that sounds strange, but I truly am both a morning and an evening person. I get up early each morning and I go to bed late each night. Five to six hours sleep the day is more than enough for me. In fact if I get more than that I'm cranky all day long.
Anyway, normally when I get up I check my Blackberry to see what messages I received during the night. My team has set up all of our computers at work to send us e-mails to let us know the status of things, especially errors that occur. Thus, if disk space is getting low on a system we'll get an e-mail; if an application gets an error we'll get an e-mail; even if a computer crashes and halts we'll get an e-mail.
Yesterday morning when I checked my e-mail at 5 a.m., I found the system, our main merchandising system, had gotten an extremely serious error at 4:47 a.m., just minutes before I woke up. One of my guys was already on it and he and I spent the next six hours working through the issue.
You see, each day our stores enter orders into the system so they have the products on the shelves later in the day. Each store will order dairy, cheese, bread, frozen, and so forth at least once per day. When they're finished entering in an order, they send it to the system which then creates the appropriate purchase orders, trucking orders, delivery orders and so forth.
The e-mail message indicated that the system had detected that over 80% of the store order lines, the detail lines, had been rejected by the systems application. This was bad because the warehouses would receive the orders, but the orders would be missing detail lines. So the warehouses would think they got the complete orders when they didn't. They would go ahead and fulfill them and ship them to the stores who then discover that they were missing major parts of their orders. Fortunately, we were smart enough to code in an error message to detect this condition so that we can handle it well in advance of the stores discovering that the system had an error like this. Undetected, an error such as this could cost hundreds of thousands of dollars in lost sales.
As it turned out, the error was introduced during our application update the night before. It seems that a particular program module was corrected, and that correction was fine, but the application person didn't realize that that module was used by another routine. That other routine wasn't handled properly by the changed routine. So the error took three hours to find, less than five minutes to fix, and another three hours to recover the data that had been damaged by the error.
On top of that, I got some personal news, as I described in my last couple of entries, which was very depressing. So by the middle of the afternoon I was a little bit in despair. But then my mind began working and I thought "well, we got through the work stuff, and this trip is important to me". So with a little work and a little strategy I figured things out and by the end of the day was doing pretty well.
I'm taking today off. Have no idea what I'm going to do, except it won't be work. At least I hope not.
Jocelyn Hall (Fri, August 11, 2006 - 8:24 AM)
I hope you don't have to work today, either. :)
Unless otherwise noted, all photos and text is Copyright © Richard G Lowe, Jr.