Sonoma State University CREU

Sunday, March 4, 2012

Winning!

Who knew before last week that a small liberal arts focused college computer science project stood a chance against bigger universities (like Virginia Tech, Duke, Carnegie Mellon)? As the title implies, despite going to SIGCSE with little expectation other than to have fun at the poster session I had better odds than imagined!

Firstly some action shots from the poster session:

Me with the project poster.

Two of the judges for the competition hearing my elevator pitch.

After the poster session wrapped up, semi-finalists were posted. Naturally as the title implied I didn't just pack up and go home, I made it to the semi-finals! This required a mad dash of prep work under Dr. Rivoire's supervision to organize a talk that could make or break going on to being a finalist.

Each of the 5 semi finalists had to give a talk and be ready for a Q&A session afterwards.

Giving a talk - less scary once I got started.

The judges must have seen something in the talk because I came home with this lovely third place medal and a chance to go on to the Grand Finale!

Based on the SIGCSE results, Dr. Rivoire and I are currently looking to enter this project in the CSU Student Research Competition at CSU Long Beach and continue onward in the ACM Student Research Grand Final.

Wednesday, January 25, 2012

And the nominees are....

After a nerve wracking wait, we are pleased to announce that this research has been accepted into the SIGCSE Student Research Competition!! The peer review induced wait between submitting an abstract and hearing back to see if the research had been accepted meant many a night refreshing email with baited breath and fingers crossed. There was much rejoicing when that email that begin Congratulations arrived in my inbox.

I'll be presenting a poster on this very research later this year in March, fighting with the other undergraduates to outshine and throw down for the glory & prizes.

Friday, August 5, 2011

Research = Rainbow Bright?

Analyzing data is not the photographic "fun" side of research--at least on the outside. Chugging away at imputing the new gpu-aware data in an excel sheet can be a little mind numbing. 10 variables (Watts, Disk Utilization, GPU Temperature, GPU Fan Speed, Utilization GPU, Utilization Memory, CPU0, CPU1, CPU2, CPU3) requiring 7 calculations (Min, 1st quartile, Median, Mean, 3rd quartile, Max) each for 12 freq PER excel sheet (6 total). Math makes these moments hurt more because they feel like they'll take forever.

Looking at the graphs the data plots though, like this:

Makes you wonder what the heck is going on with this workload!

Monday, July 25, 2011

Recipe for Debugging

How do you take a working script and convince yourself it is broken?

1. Try to use outdated sub-scripts in another directory
2. Don't check the test inputs as a potential error
3. All of the above.

In other words, the CSV generating scripts finally have headers! Better analysis here we come. The other reason for the victory dance atmosphere on the project is due to the fact (knock on wood) all the scripts are working and we're collecting the last run of GPU-aware workloads!! The end of data collection is finally nigh. I'm sure I'l regret this phase ending when I have to sit down and write about the analysis. Meantime it wouldn't be unreasonable to feel a little burn out on stalking data logs.

Thursday, July 21, 2011

Cable Drama (Not of the Soap Variety)

Let me begin with a sincere thank you to to the folks at Microsoft Research in Mountain View for donating a WattsUp meter friendly USB cable. Anyone who has to depend on these meters knows their cables can't be easily replaced with a quick trip to the store. This is due to the recessed port on the actual meter.

The story behind this donation revolves around two WattsUp meters [see left in the red circles].

One of these meters began to return too many errors and bad packets. Not a problem, we avoided it by changing the setup so the Machine Under Test (MUT) uses the "good" meter. Switching the two computers between this "good" meter did cause a minor headache because it left room for errors to happen. For each workload we wanted to run on both boxes (meaning all benchmarks) we would have to swap meters. This meant shutdown the computers, unplug a great many cables and hope things didn't get messed up when the cables were then plugged into their new configuration.

How does a new cable play into this meter drama? Last time we re-configured the wiring our "good" meter began to exhibit the same behavior as the "bad" meter. This time the only thing changed were the usb cables on the meters relaying the data from the MUT's meter to the Data Acquisition Machine. Turns out both meters work fine, it was a dud cable.

Dr. Rivoire, our faculty adviser, in her research at Microsoft, had recently replaced cables on the WattsUp meters used in her research there because of a similar problem. QED let's change out the cable. In short the bottleneck on progress has be resolved thanks to a coworker of Dr. Rivoire's for cutting and refitting the casing on a USB cable to fit WattsUp's unique recessed port.

Workloads ahoy I'll finally have a chance to use my new parsing script written in perl abusing some tricks of regular expressions & unique logs.

Friday, July 1, 2011

Progress Summary

Our project's model is currently trained on models where the CPU is the main consumer of dynamic power. Results are fragmented between the two test machines, lolcat and rickroll, due to data collection errors. The next steps planned are finishing data collection, examining GPU benchmarks possibly with the added benefit of instrumenting the GPU, and analyzing oddness within the results.

For FDTD3D (GPU benchmark) rickroll’s MSE, rMSE/mean and DRE display a delta in the results. The MSE at frequency 2000 is 4.13 and goes to 318.29 at frequency 2200. DRE repeats this delta at the two frequencies moving from 0.10 to 0.63. Root MSE/Mean’s 2000 to 2200 frequency delta is a change from 0.01 to 0.12. A reasonable explanation would be to hypothesize that before 2200 the CPU is bound. Other data presently does not support this explanation.

Using two benchmarks, nbody and binomialOptions, as sets of train and test (same model both train and test, using nbody as train but binomialOption as test and vise versa) lolcat’s results stress how unaware the model is of the GPU’s influence on the expected power (but the power does correlate well to CPU & disk for this workload). The model cannot predict a reasonable expected power when the GPU is stressed in addition to the CPU, or the GPU is stressed but not the CPU.

Once calibration data recollection on locat finishes and is analyzed for errors, the next step will be proceeding on GPU awareness. For more insight on the GPU’s role with power consumption NVIDIA-smi will be implemented for GPU instrumentation. The model can’t predict beyond the CPU exercising at 100% but if a GPU aware component is added the prediction should be less erroneous.

Friday, June 24, 2011

Resolving a Heisenbug

Resolving a Heisenbug can be tricky since by its nature it resists the debugger. Instead we decided to go the unscientific way of throwing a bunch of solutions at the problem, then in true programmer fashion not questioning what fixed Wattsup.

Part one of the solution was to rewrite the packet handling code within wattsup to fix several line errors and how wattsup was handling mis-sized packets. Originally the code would give up and kill the run if the packet size (packets being what the watts data was sent inside) changed due to the program falling out of sync. Now Wattsup recaptures the mis-sized packet and concatenates it with the following packet (which for the majority of the time will also be mis-sized and follow the first packet in data order).

Part two involved unplugging the meters for a while to clear their caches. While unplugging the meters we checked just in case the physical meters were over heating. One meter did feel warm to the touch which could be a result of overheating or normal heat left over from finishing a workload. A last minute idea when we went back to reconnect all the cables included switching the meters measuring each computers. In addition the work space was also rearranged to provide more air flow around the meters since they previously felt warm to the touch.

Wattsup now runs without problems! What actually fixed the Heisenbug? Probably a combination of all three solutions (rewriting code, switching meters and increasing air circulation). The important question now move toward analyzing all the data that we have finished collecting.

Here are a few graphs from our data that will inspire the future headaches from analysis: