Sunday, February 27, 2011

Intermittent Bugs

Like a bad ear worm that won't go away, intermittent bugs are resurfacing to force any progress into a standstill. After running the last set of logs there was a strange conundrum: according to screen (a unix utility that allows you to track something on the screen after your remote session logs off) the full run went through smoothly. But all logs after the baseline were MIA. Head scratching conference followed with Dr. Rivoire. We decide to run the test again. Could have just been a fluke. Open up a new instance of screen, begin the test again, check in later on the logs...we have the same problem only this time there is just one set of logs for one frequency.

What's a lowly research assistant to do? This all worked fine not more than a week ago, nothing has changed. Except for the bought of bad weather for the first fluke and that does not excuse the weirdness with the second run (which had fine weather). Time to stalk the logs real time! Sadly there will be no British narrator doing a voice over as if this was a National Geographic documentary. Research is far from error proof which will be a hard lesson for the young research assistant to learn.

Stalking the logs real time is easier than combing through a huge batch of them once a run is finished. The reason for doing it real time is that when the bug occurs it is highly visible. The moment something pops up in the error log you can begin investigating while letting the log run onward. For example: if during the baseline in the error log this gem pops up:

wattsup: [error] Reading final time stamp: Bad address

We can begin with checking the date simultaneously on both rickroll and lolcat to see if maybe they've gotten out of sync. That potential cause for error is then ruled out when timestamps from both show the correct time. Here begins another round of head scratching. Because in the error logs following baseline there are no more errors to be found (using regular expressions as a quick way to look at all the error logs at once: tail testSpockCat*.err). Too bad the idea of naming the run Spock didn't spark some fear into the computers. Now if you'll excuse me I've got to resume stalking the logs for that most elusive and shy of prey, the pesky error.

Friday, February 18, 2011

Progress is the Name, Success is the Game.

After God knows how long of toiling with PERL and Unix, and keeping our Google search bible on hand at all times, the script we have been aiming for is complete! The PERL script (which as you can see also runs Shell commands from within it) is pasted below. It is a continuous code that will change as we make our testing more efficient / effective. But, the point is, it gets the job done for now!

Essentially, this script will read a configuration text file that contains the names of 3 different files that contain our consumption information (AC, DISK, and CPU), which comes from Stephanie's tests, and then gives them a makeover by getting rid of the junk. After all makeovers are done, the outputted text files are joined up and copied into a .csv file so that they fit nicely into an Excel spreadsheet. Rinse and repeat until the end of the configuration text file is reached!

Now that we have this script, we can organize the log files that Stephanie has created, place their names into the configuration text file, and spit out 10's of 100's of CSV files in a few minutes. From there, we will run a regression analysis, and have our first batch, although admittedly hideous at first, of power consumption statistics! Now you know what CS majors accomplish on Friday nights!

Cheers to a successful week of progression - Hopefully this is the trend for the rest of the semester!


---------------------------------------


#!/usr/bin/perl

use Shell;
package test;

$finish = 0;

open(config, "foreach $line ()
{
($filetype, $junk, $filename) = split(' ',$line);

#AC TEST

if ($filetype eq "ac")
{
open(ac, "$filename") || die("Could not open AC file!");
system("./testac.pl $filename > actest.txt");
}

#CPU TEST

if ($filetype eq "cpu")
{
open(cpu, "$filename") || die("Could not open CPU file!");
system("./testcpu.pl $filename > cputest.txt");
}


#DISK TEST

if ($filetype eq "disk")
{
open(disk, "$filename") || die("Could not open DISK file!");
system("./testdisk.pl $filename > disktest.txt");
system('join -t"," disktest.txt actest.txt > test.txt');
system('join -t"," test.txt cputest.txt > final.txt');
system("cat final.txt > $finish.csv");
$finish++;
}
}

Wednesday, February 9, 2011

Moving forward at last!

It's not always finding the big bugs that lead to breakthroughs on research. Sometimes it's the little ones that trail into different problems and at long last the real cause of all those headaches is found. Last semester the scripts gave a strange error during one of the test runs to collect data. Killall had nothing to kill. Strange, according to the machine collecting data the machine under test was suppose to be in the middle of running the disk at a particular frequency and collecting the AC measurements once every minute. Curious, I was sent off to comb the logs.

The logs revealed a bit of oversight and assumption coming back to bite us squarely you know where. The two computers time stamps were not in sync. Not just the hour was off but the minute, second, they had drifted apart by several minutes. Ubuntu checks the time of a computer against an external server only on boot up. We leave rickroll and lolcat running all the time. Not a problem, server administrators the world over solve this problem of time drifting before. Google to the rescue and help from the lovely folks on the Ubuntu community forums later, we had a quick fix. Thankfully this fix was quick and easy, add a line to the chron.daily directory. Chron is a unix utility that runs the scrips inside its directories at a set interval like daily, hourly etc.

Wait overnight. Come to find the timestamps don't match the next day. Move the fix to the chron.hourly directory. Repeat waiting. No dice the next day. Discover the network time protocol daemon hadn't be installed! Install ntp, wait over night. I did a little dance the next morning to see rickroll and lolcat's timestamps. They actually stayed in sync!

Now once Vince finishes off his script to parse our data logs into a format easier to read, we'll be able to take data from a run and actually analyse it. Of course the first model we create with testZulu will be ugly, but at least then we'll be moving forward!

Saturday, February 5, 2011

Welcome Back !

Good evening, and welcome to the new year everyone!

After getting settled this past week for the new semester, Stephanie and I have begun to start working towards our end goal of modeling the power consumption of GPUs. Our current task is to create a bash script that runs through our 3 separate PERL parsing scripts and joins the files into an Excel file, as well as cleaning up our file names in the log database that is quite large.

Unfortunately, we also discovered a slight issue towards the end of our work last semester. For the sake of efficiency, we opted to run our benchmark programs on one CPU, and parse the data on another. In doing so, the CPU timestamps have seemingly drifted apart during longer benchmark runs. Thus, our main concern is alleviating this issue as soon as possible in order to be able to accurately parse our data in the future.

Lastly, we are quite excited to be gearing up to attend the 2011 Tapia Conference! We will be spending 3 days in San Francisco with other CREU recipients from around the country and meeting with some of the industry's leading innovators!

With all of this exciting potential this semester, we will hopefully be updating you with great news as the days progress. With that being said - Stay tuned!