Friday, August 5, 2011

Research = Rainbow Bright?

Analyzing data is not the photographic "fun" side of research--at least on the outside. Chugging away at imputing the new gpu-aware data in an excel sheet can be a little mind numbing. 10 variables (Watts, Disk Utilization, GPU Temperature, GPU Fan Speed, Utilization GPU, Utilization Memory, CPU0, CPU1, CPU2, CPU3) requiring 7 calculations (Min, 1st quartile, Median, Mean, 3rd quartile, Max) each for 12 freq PER excel sheet (6 total). Math makes these moments hurt more because they feel like they'll take forever.

Looking at the graphs the data plots though, like this:


Makes you wonder what the heck is going on with this workload!

Monday, July 25, 2011

Recipe for Debugging

How do you take a working script and convince yourself it is broken?

1. Try to use outdated sub-scripts in another directory
2. Don't check the test inputs as a potential error
3. All of the above.

In other words, the CSV generating scripts finally have headers! Better analysis here we come. The other reason for the victory dance atmosphere on the project is due to the fact (knock on wood) all the scripts are working and we're collecting the last run of GPU-aware workloads!! The end of data collection is finally nigh. I'm sure I'l regret this phase ending when I have to sit down and write about the analysis. Meantime it wouldn't be unreasonable to feel a little burn out on stalking data logs.

Thursday, July 21, 2011

Cable Drama (Not of the Soap Variety)

Let me begin with a sincere thank you to to the folks at Microsoft Research in Mountain View for donating a WattsUp meter friendly USB cable. Anyone who has to depend on these meters knows their cables can't be easily replaced with a quick trip to the store. This is due to the recessed port on the actual meter.

The story behind this donation revolves around two WattsUp meters [see left in the red circles].

One of these meters began to return too many errors and bad packets. Not a problem, we avoided it by changing the setup so the Machine Under Test (MUT) uses the "good" meter. Switching the two computers between this "good" meter did cause a minor headache because it left room for errors to happen. For each workload we wanted to run on both boxes (meaning all benchmarks) we would have to swap meters. This meant shutdown the computers, unplug a great many cables and hope things didn't get messed up when the cables were then plugged into their new configuration.

How does a new cable play into this meter drama? Last time we re-configured the wiring our "good" meter began to exhibit the same behavior as the "bad" meter. This time the only thing changed were the usb cables on the meters relaying the data from the MUT's meter to the Data Acquisition Machine. Turns out both meters work fine, it was a dud cable.

Dr. Rivoire, our faculty adviser, in her research at Microsoft, had recently replaced cables on the WattsUp meters used in her research there because of a similar problem. QED let's change out the cable. In short the bottleneck on progress has be resolved thanks to a coworker of Dr. Rivoire's for cutting and refitting the casing on a USB cable to fit WattsUp's unique recessed port.

Workloads ahoy I'll finally have a chance to use my new parsing script written in perl abusing some tricks of regular expressions & unique logs.

Friday, July 1, 2011

Progress Summary

Our project's model is currently trained on models where the CPU is the main consumer of dynamic power. Results are fragmented between the two test machines, lolcat and rickroll, due to data collection errors. The next steps planned are finishing data collection, examining GPU benchmarks possibly with the added benefit of instrumenting the GPU, and analyzing oddness within the results.

For FDTD3D (GPU benchmark) rickroll’s MSE, rMSE/mean and DRE display a delta in the results. The MSE at frequency 2000 is 4.13 and goes to 318.29 at frequency 2200.  DRE repeats this delta at the two frequencies moving from 0.10 to 0.63.  Root MSE/Mean’s 2000 to 2200 frequency delta is a change from 0.01 to 0.12. A reasonable explanation would be to hypothesize that before 2200 the CPU is bound. Other data presently does not support this explanation.

Using two benchmarks, nbody and binomialOptions, as sets of train and test  (same model both train and test, using nbody as train but binomialOption as test and vise versa) lolcat’s results stress how unaware the model is of the GPU’s influence on the expected power (but the power does correlate well to CPU & disk for this workload). The model cannot predict a reasonable expected power when the GPU is stressed in addition to the CPU, or the GPU is stressed but not the CPU.

Once calibration data recollection on locat finishes and is analyzed for errors, the next step will be proceeding on GPU awareness. For more insight on the GPU’s role with power consumption NVIDIA-smi will be implemented for GPU instrumentation.  The model can’t predict beyond the CPU exercising at 100% but if a GPU aware component is added the prediction should be less erroneous.

Friday, June 24, 2011

Resolving a Heisenbug

Resolving a Heisenbug can be tricky since by its nature it resists the debugger. Instead we decided to go the unscientific way of throwing a bunch of solutions at the problem, then in true programmer fashion not questioning what fixed Wattsup. 

Part one of the solution was to rewrite the packet handling code within wattsup to fix several line errors and how wattsup was handling mis-sized packets. Originally the code would give up and kill the run if the packet size (packets being what the watts data was sent inside) changed due to the program falling out of sync. Now Wattsup recaptures the mis-sized packet and concatenates it with the following packet (which for the majority of the time will also be mis-sized and follow the first packet in data order).

Part two involved unplugging the meters for a while to clear their caches. While unplugging the meters we checked just in case the physical meters were over heating. One meter did feel warm to the touch which could be a result of overheating or normal heat left over from finishing a workload. A last minute idea when we went back to reconnect all the cables included switching the meters measuring each computers. In addition the work space was also rearranged to provide more air flow around the meters since they previously felt warm to the touch.


Wattsup now runs without problems! What actually fixed the Heisenbug? Probably a combination of all three solutions (rewriting code, switching meters and increasing air circulation). The important question now move toward analyzing all the data that we have finished collecting.

Here are a few graphs from our data that will inspire the future headaches from analysis:




Saturday, June 18, 2011

Heisenbug and Martha Stewart Cleaning


Bottle necks are incredibly frustrating, they hamper progress because the research can go only as fast as the slowest component. If the slowest component has intermittent bugs then that pace is a frustrating crawl. At long last the bottle neck on this research is software no one involved on the project wrote! That shouldn't be cause for celebration, but as childish as it sounds, it's nice for once to not have the blame be on the research team.

There's even a name for the type of bug causing our bottle neck:

Heisenbug - A bug that disappears or alters its behavior when one attempts to probe or isolate it ... the use of a debugger sometimes alters a program's operating environment significantly enough that buggy code ... behaves quite differently.

Turn the debugger on, the problem vanishes. Turn the debugger off, problems resurface but we can't recreate the error with the debugger back on. This is incredibly frustrating to realize, try to debug and work around.

Sadly it is our data collection software, Wattsup, that is the one dealing with said Heisenbug. The research uses Wattsup to take a reading of the power (in watts) at a set time interval during the length of the benchmark run. The logs show a normal run. The error logs only have one error that appears randomly during the course of a run. But look inside the .ac file (where a timestamp and a measure of watts at that time are expected) and lo, three to four minutes before the end of the run the data cuts off.

What does one do while isolating Heisenbugs? You have let the run go and tail the logs but these runs go from twenty minutes to hours. Can't sit there hitting ls -l on the logs. Instead it's the perfect time to go Martha Stewart spring cleaning on all the versions of research code between rickroll and lolcat. Basically clean up the update logs, output and naming conventions from automatically generated files. No more should we need a nifty flow chart.

Organization and version control are crucial! Actually executing organization and version control however involves hoping between multiple languages. Unix, perl, python, and some regular expressions made life easy while the brain began to feel the differences in syntax. Hopefully from this point onward (once Wattsup is "debugged" for good) the project can get some solid data from lolcat. Then onwards to find fresh headaches analyzing our model against actual data from rickroll and lolcat.

Tuesday, June 7, 2011

The Wonder of More Time

Not even a week back at the research and the progress is unbelievable! What could possibly be the reason behind so much happening in only two days? Was it the short break away from the data? The new eight hours the research is getting each day? Who knows! I just hope the progress keeps running full tilt. Let's recap so far what's happened since coming back from the brief hiatus at the end of last semester and the parting of Vince.

Day 1 
Back at hitting the code actually turned into spring cleaning. In the mad dash of running ahead with the progress last semester....things got a little hairy in the directories. Test runs going everywhere and scripts written in haste proved they needed to be rewritten. So the first day back was spent cleaning up directories, sorting/rerunning data runs and patching mischievous scripts that needed dire documentation. It's amazing feeling the difference of having the time to not only spot errors but correct them right away. This compared to fighting for time to do research and homework and projects during the semester? I'll happily take my eight hour work day this summer over the last two crazy 10 hour a week semesters.

Day 2
Analyzing data has never been so interesting!  The day began with basically filling out a very fancy excel sheet and utilizing functions in R. This was all done on data from our calibration, CPU and GPU benchmarks for Rickroll. With more initial data to go on than previously, two things jumped out:

1) Our GPU numbers for mean squared error and dynamic range error are curiously opposite of what we initially predicted (for certain frequencies the numbers are inversed: we expected higher numbers at lower frequencies and lower numbers at higher frequencies).

2)Due to the behavior of how specJBB stresses the CPU and what happens with the power consumption, there is serious discussion of adding another CPU benchmark to really stress the CPU further.

I can only imagine how much further we'll be at the end of this week.

Friday, May 20, 2011

Brief Hiatus but Not Senioritis

While Vince will be leaving the project I'll be staying on for the summer (sadly not graduating until next semester). After a brief hiatus to marathon through finals I'll be resuming work on the project for the summer. This time however besides a change in personnel there will also be a change with me moving from part time (10 hours week) to full time (40 hours per week) which will be an exciting way to experience at how graduate students conduct their research.


Tentative result from one of the machines used to record measurements show not a lot of difference predicting FDTD3d power vs. specjbb (two benchmarks being used to stress the power consumption of the GPU and CPU's). If these results hold for the next sets of analysis on most/all frequencies I'll look at potentially installing some more GPU and CPU-intensive benchmarks. Adding benchmarks could help aid the understanding what about the model is good and bad at in more detail.


In addition to analysis, more data collecting and benchmark experimentation I'll also be looking to improve some of the existing code. A busy summer indeed!

Thursday, May 19, 2011

Morrow, out!

Ok, so the Seacrest reference is terrible in of itself, but it is bitter sweet because this particular blog post will also be my last.

It has been a long year of working towards modeling GPU power consumption, and just as the last week of semester gives way, leaving 9 days before Stephanie and I graduate, we are at the cusp of accomplishing our goal. At times I've wished that French was as easy as coding in PERL, but when I realized that sometimes PERL and the running of our benchmarks come up with things missing out of the blue, and 'fix' themselves after cutting/pasting the same exact code into the script only to have it magically work, I reconsider the ramifications of such a request.

This past week and a half Stephanie has been running the NVidia GPU benchmarks (FDTD3d), stressing the GPU with basic graphics runs, in order to gather data about the power usage of the GPU. In doing so, this kicks out .csv (commea separated values) files that are accessible via Microsoft Excel, as well as R (statistics program), that we can then manipulate and create models with. Creating these models basically tells us that the reading of the GPU power consumption is horrendously inaccurate (as expected), and creates another goal to strive towards - Creating (somewhat) accurate models of GPU power consumption. This will be the goal that Stephanie works towards over the summer.

As for myself? I shall be traveling to my home state of Arizona from the end of July through the middle of September to work a wine harvest (Wine in Arizona...?) in Sonoita, AZ, and then travel back to Rutherford, CA to work another wine harvest until December. After that, the plan is New Zealand or Australia, and then the wine pinnacle of the world, France. How this all relates to CREU? Being involved with CREU has helped me gain the confidence of learning a new way of perspective and thinking (via PERL), as well as the discipline and motivation to learn said language on my own - We didn't take any classes or training courses for it. Because of this experience, I've also gained more confidence in tackling other tasks, such as that of learning French and blending the use of it into my career path; marketing, wine, and computers. Quel bien (How nice) !

It has been a wonderful experience my last year in college, and we certainly couldn't have done reached our goal without the support of CREU, our advisor, Dr. Rivoire, and our hordes and throngs of fans. It has been a pleasure - Thank you and au revoir!

Tuesday, May 3, 2011

Dude Where's My Data?

Coming down from the high of industry talks, vacationing and socializing with the family, it was back to the binary trenches. Because errors and code bugs never rest!

Parting of using scripts to automate the data retrieval aspect from the benchmarks so we can run them in successive order across controlled frequencies is that when things go wrong...they go missing. No more non-compiling code or segfaults just missing records that aren't apparent even when stalking the logs real time! Recently our error logs were deceptively empty because the data logs were empty as well. The power meter (delightfully from a brand called WattsUp) was fine. Maybe the cable was shorting out? Back to running baseline tests we resolved that the meter hiccuped and settled down to running the few troublesome frequencies one at a time instead of a set. This resolved the problem and began a new one.

Reading is not hard but for some reason it's deceptive when sleep deprived. When the time came to run the benchmarks related to the GPU (as apposed to the troublesome CPU). Well, that didn't run so smoothly either as does any plan when it meets contact with the enemy on the field. Forgetting to rename the tests surprisingly did not harm anything because the test failed after one frequency and only destroyed an otherwise very small dataset that can easily be recovered.

Sunday, April 24, 2011

April Showers....Recapping an Insane Month

With Tapia, Midterms, and now Spring Break finally behind us we are digging in deep for the home stretch of the semester. As I prepare to graduate in less than 34 days, we are pushing to wrap up the bugs that have been keeping us from preparing a data set that includes not just the CPU models, but the GPU as well. We are aiming to have at least a few models to go off of in order to send Steph off into the summer program with a good foundation for increasing the accuracy of the GPU models.

Given that she was back home in Southern CA for the break, and I spent mine working and traveling the bay area, we are refreshed and ready to hit the ground running this week, even in the midst of finals and graduation coming up.

As far as the Tapia SF Conference was concerned, we had the wonderful opportunity to meet with our research adviser, as well as her adviser (our grandpa!) for coffee beforehand. Following that, over the course of 3 days, we attended symposiums on topics varying from robotic climate modeling in Antarctica, to 3D rendering and photo matching programs people can use to link their photos to specific street locations on Google Maps. Each day was non-stop from 8am to 10pm, and definitely filled with more information than we expected. All in all, it was an amazing experience that allowed us to network, make new friends, and solidify some of our goals for the future around computing.

With that being said, progress reports will be posted soon, hopefully with some new pretty data sets!


Sunday, March 27, 2011

Half empty, or half full?

As we near the half way point of the semester, with the Tapia conference and spring break both right around the corner, it certainly is warranting of another moment to reflect on everything that we have done. This is especially true and fitting being that we are waiting on our GPU programs to become agreeable and new data sets to become available to analyze.

As I mentioned in my previous post we have achieved the milestone we essentially have been aiming for since the beginning of this research - Model the CPU, AC power, and Disk usage of a computer. In doing so, we have paved way for the preparation of (attempting) modeling the GPU as well, on some recognizable, not so horrific data set. With exactly 2 months until commencement, the time is ticking on producing said models. But, we are confident that we will do so by then.

To expand upon that, our goal right now is to have these readily available by the beginning of spring break (April 18th), in order to be able to improve upon these models when we return the following week. Theoretically, if we are able to do so, we will have surpassed the previous gains from last year's SSU CREU research. Eventually, we will hopefully discover better ways to model GPU power consumption and make, if even slightly, a little more sense of a relatively unexplored topic. For now, unfortunately, we are at the mercy of NVIDIA's GPU programs and CUDA not quite agreeing the way NVIDIA would with more established OS's.

To be continued.... !

Monday, March 14, 2011

Friday, March 11, 2011

Meet the team!

After the good news you'll hear about from Vince, it's hard to go back to the dark side of research. Once the victory dances and celebratory cheering are over, the return to grunt work is a little daunting. With every step forward there is a laundry list of new things to do like installing benchmark software, more data modeling, running new tests...so instead of talking about that we'll introduce you to the intrepid research team behind this project! Pictures in large part and with great thanks to Roger Mamer.

First, meet our research advisor, professor at SSU and fearless leader Dr. Suzanne Rivoire! Dr. Rivoire is the one armed with the Diet Pepsi on the far right.Disclosure: Pepsi Co. is not funding our project. Nor are they probably even aware of our project. If Pepsi Co. would like to invest in our research they are more than welcome to of course!

Next say hi to Vincent Morrow, research assistant and fellow SSU student! He's obviously pointing out something important and very interesting with the cursor. What could possibly be on this screen? It doesn't look like code or benchmarks, so I'm guessing it's probably the Ubuntu community forum. They're the silent, helpful, overlooked honorary partner in this project. Big shout out and thanks to all those awesome folks! They've helped immensely when the drivers get cranky over kernel updates.

This is also yet another view of our physical set up. We're located in a corner of secret room in the Darwin building basement. Pretty roomy I'd say for undergraduate research. A nice table for the two computers, a monitor and spare parts. We even have our own whiteboard!

If you thought research was only people you'd be mistaken! Meet Lolcat. Also sometimes called Lolcat the cranky, Lolcat why can't you work?! and Lolcat...ugh. Looking past the two turned off monitors Lolcat is the tower behind the monitor with the cat sticker. This is how to tell apart computers at a glance when they camouflage themselves against non-researcher intruders.

Lastly we have Stephanie Schmidt and Rickroll! Steph is the human to the right looking confused about why her picture is being taken. Steph is also the other research assistant on the team and SSU student. Rickroll is the silver tower beside Steph (hiding behind the sans-sticker monitor) and her favorite part about this project. Rickroll's been nicknamed The Perfect Child after only one hiccup where he really wouldn't let go of Fedora. Some things really do live up to their namesake.

We, the team behind Modeling the Power Consumption of Computer Systems with Graphics Processing Units Project, hope you got to know us better. Our research happens in secret rooms in basements with lots of caffeine and code. Look forward to our future posts made of more win and awesome graphs modeling our success.

Sunday, February 27, 2011

Intermittent Bugs

Like a bad ear worm that won't go away, intermittent bugs are resurfacing to force any progress into a standstill. After running the last set of logs there was a strange conundrum: according to screen (a unix utility that allows you to track something on the screen after your remote session logs off) the full run went through smoothly. But all logs after the baseline were MIA. Head scratching conference followed with Dr. Rivoire. We decide to run the test again. Could have just been a fluke. Open up a new instance of screen, begin the test again, check in later on the logs...we have the same problem only this time there is just one set of logs for one frequency.

What's a lowly research assistant to do? This all worked fine not more than a week ago, nothing has changed. Except for the bought of bad weather for the first fluke and that does not excuse the weirdness with the second run (which had fine weather). Time to stalk the logs real time! Sadly there will be no British narrator doing a voice over as if this was a National Geographic documentary. Research is far from error proof which will be a hard lesson for the young research assistant to learn.

Stalking the logs real time is easier than combing through a huge batch of them once a run is finished. The reason for doing it real time is that when the bug occurs it is highly visible. The moment something pops up in the error log you can begin investigating while letting the log run onward. For example: if during the baseline in the error log this gem pops up:

wattsup: [error] Reading final time stamp: Bad address

We can begin with checking the date simultaneously on both rickroll and lolcat to see if maybe they've gotten out of sync. That potential cause for error is then ruled out when timestamps from both show the correct time. Here begins another round of head scratching. Because in the error logs following baseline there are no more errors to be found (using regular expressions as a quick way to look at all the error logs at once: tail testSpockCat*.err). Too bad the idea of naming the run Spock didn't spark some fear into the computers. Now if you'll excuse me I've got to resume stalking the logs for that most elusive and shy of prey, the pesky error.

Friday, February 18, 2011

Progress is the Name, Success is the Game.

After God knows how long of toiling with PERL and Unix, and keeping our Google search bible on hand at all times, the script we have been aiming for is complete! The PERL script (which as you can see also runs Shell commands from within it) is pasted below. It is a continuous code that will change as we make our testing more efficient / effective. But, the point is, it gets the job done for now!

Essentially, this script will read a configuration text file that contains the names of 3 different files that contain our consumption information (AC, DISK, and CPU), which comes from Stephanie's tests, and then gives them a makeover by getting rid of the junk. After all makeovers are done, the outputted text files are joined up and copied into a .csv file so that they fit nicely into an Excel spreadsheet. Rinse and repeat until the end of the configuration text file is reached!

Now that we have this script, we can organize the log files that Stephanie has created, place their names into the configuration text file, and spit out 10's of 100's of CSV files in a few minutes. From there, we will run a regression analysis, and have our first batch, although admittedly hideous at first, of power consumption statistics! Now you know what CS majors accomplish on Friday nights!

Cheers to a successful week of progression - Hopefully this is the trend for the rest of the semester!


---------------------------------------


#!/usr/bin/perl

use Shell;
package test;

$finish = 0;

open(config, "foreach $line ()
{
($filetype, $junk, $filename) = split(' ',$line);

#AC TEST

if ($filetype eq "ac")
{
open(ac, "$filename") || die("Could not open AC file!");
system("./testac.pl $filename > actest.txt");
}

#CPU TEST

if ($filetype eq "cpu")
{
open(cpu, "$filename") || die("Could not open CPU file!");
system("./testcpu.pl $filename > cputest.txt");
}


#DISK TEST

if ($filetype eq "disk")
{
open(disk, "$filename") || die("Could not open DISK file!");
system("./testdisk.pl $filename > disktest.txt");
system('join -t"," disktest.txt actest.txt > test.txt');
system('join -t"," test.txt cputest.txt > final.txt');
system("cat final.txt > $finish.csv");
$finish++;
}
}

Wednesday, February 9, 2011

Moving forward at last!

It's not always finding the big bugs that lead to breakthroughs on research. Sometimes it's the little ones that trail into different problems and at long last the real cause of all those headaches is found. Last semester the scripts gave a strange error during one of the test runs to collect data. Killall had nothing to kill. Strange, according to the machine collecting data the machine under test was suppose to be in the middle of running the disk at a particular frequency and collecting the AC measurements once every minute. Curious, I was sent off to comb the logs.

The logs revealed a bit of oversight and assumption coming back to bite us squarely you know where. The two computers time stamps were not in sync. Not just the hour was off but the minute, second, they had drifted apart by several minutes. Ubuntu checks the time of a computer against an external server only on boot up. We leave rickroll and lolcat running all the time. Not a problem, server administrators the world over solve this problem of time drifting before. Google to the rescue and help from the lovely folks on the Ubuntu community forums later, we had a quick fix. Thankfully this fix was quick and easy, add a line to the chron.daily directory. Chron is a unix utility that runs the scrips inside its directories at a set interval like daily, hourly etc.

Wait overnight. Come to find the timestamps don't match the next day. Move the fix to the chron.hourly directory. Repeat waiting. No dice the next day. Discover the network time protocol daemon hadn't be installed! Install ntp, wait over night. I did a little dance the next morning to see rickroll and lolcat's timestamps. They actually stayed in sync!

Now once Vince finishes off his script to parse our data logs into a format easier to read, we'll be able to take data from a run and actually analyse it. Of course the first model we create with testZulu will be ugly, but at least then we'll be moving forward!

Saturday, February 5, 2011

Welcome Back !

Good evening, and welcome to the new year everyone!

After getting settled this past week for the new semester, Stephanie and I have begun to start working towards our end goal of modeling the power consumption of GPUs. Our current task is to create a bash script that runs through our 3 separate PERL parsing scripts and joins the files into an Excel file, as well as cleaning up our file names in the log database that is quite large.

Unfortunately, we also discovered a slight issue towards the end of our work last semester. For the sake of efficiency, we opted to run our benchmark programs on one CPU, and parse the data on another. In doing so, the CPU timestamps have seemingly drifted apart during longer benchmark runs. Thus, our main concern is alleviating this issue as soon as possible in order to be able to accurately parse our data in the future.

Lastly, we are quite excited to be gearing up to attend the 2011 Tapia Conference! We will be spending 3 days in San Francisco with other CREU recipients from around the country and meeting with some of the industry's leading innovators!

With all of this exciting potential this semester, we will hopefully be updating you with great news as the days progress. With that being said - Stay tuned!