Sunday, February 27, 2011

Intermittent Bugs

Like a bad ear worm that won't go away, intermittent bugs are resurfacing to force any progress into a standstill. After running the last set of logs there was a strange conundrum: according to screen (a unix utility that allows you to track something on the screen after your remote session logs off) the full run went through smoothly. But all logs after the baseline were MIA. Head scratching conference followed with Dr. Rivoire. We decide to run the test again. Could have just been a fluke. Open up a new instance of screen, begin the test again, check in later on the logs...we have the same problem only this time there is just one set of logs for one frequency.

What's a lowly research assistant to do? This all worked fine not more than a week ago, nothing has changed. Except for the bought of bad weather for the first fluke and that does not excuse the weirdness with the second run (which had fine weather). Time to stalk the logs real time! Sadly there will be no British narrator doing a voice over as if this was a National Geographic documentary. Research is far from error proof which will be a hard lesson for the young research assistant to learn.

Stalking the logs real time is easier than combing through a huge batch of them once a run is finished. The reason for doing it real time is that when the bug occurs it is highly visible. The moment something pops up in the error log you can begin investigating while letting the log run onward. For example: if during the baseline in the error log this gem pops up:

wattsup: [error] Reading final time stamp: Bad address

We can begin with checking the date simultaneously on both rickroll and lolcat to see if maybe they've gotten out of sync. That potential cause for error is then ruled out when timestamps from both show the correct time. Here begins another round of head scratching. Because in the error logs following baseline there are no more errors to be found (using regular expressions as a quick way to look at all the error logs at once: tail testSpockCat*.err). Too bad the idea of naming the run Spock didn't spark some fear into the computers. Now if you'll excuse me I've got to resume stalking the logs for that most elusive and shy of prey, the pesky error.

No comments:

Post a Comment