Thursday, September 9, 2010

Week 1 - CPUs, GPUs, Power Modeling Oh My

I knew before this project that GPUs lived to make graphic heavy games like Left 4 Dead and Halo look awesome. I never bothered myself to learn the how behind crunching the instructions so quickly. The speed is achievable due to the multiple processing units inside the hardware. The cores are beneficial here unlike the CPU because the mathematics involved with graphics are a good fit for parallel programming [1]. Sanford Russell of Nvidia gives a lay friendly example looking at the traditional CPU & GPU roles in computing [1]:

If you were looking for a word in a book, and handed the task to a CPU, it would start at page 1 and read it all the way to the end … It would be fast, but would take time because it has to go in order. A GPU … "would tear [the book] into a thousand pieces" and read it all at the same time.

Industry is showing a surprising openness with one another by supporting OpenCL (Open Computing Language) as the standard across different GPU hardware. nVidia did come out with the first proprietary GPU programming model with CUDA. The idea of taking advantage of the speedup GPU computing gives has become recognized enough that Mac OS X Snow Leopard incorporated OpenCL.

[1] Matt Buchanan. Giz Explains: GPGPU Computing, and Why It'll Melt Your Face Off. Gizmodo.com, May 2009.

---

Unlike the language side of parallel programming there hasn't been an industry standard defining how to test/measure the total power of this improved hardware. Some attempts have been made with the Green Grid’s Datacenter Performance Efficiency (DCPE). The DCPE is normally incorporated within a simplistic formula [1]

PUE is defined as Power Usage Effectiveness i.e. the datacenter building infrastructure vs the actual equipment (like servers). PUE measurements are performed by electrical equipment without any disruption to normal operations. Great if a datacenter is not available for benchmarking. SPUE stands for Server PUE a ratio of total server input power vs power consumed by components like motherboards, CPUs, etc. But there is no protocol for measuring SPUE. The actual power a server draws could be erratic depending on the activity. Introduce another set of numbers to consider, benchmarks measuring energy efficiency of the servers: Joulesort and SPECpower_ssj2008 benchmark [2]. Both benchmarks attempt to isolate efficiency differences in the hardware.

[1] Luiz André Barroso and Urs Hölzle. Chapter 5 The Datacenter as a Computer An Introduction to the design of Warehouse-Scale Machines. Synthesis Series on Computer Architecture, Morgan & Claypool Publishers, May 2009.

[2] In the interest of fair disclosure our faculty advisor, Dr. Rivoire, is also involved in the team behind JouleSort.

---

The two popular methods on real time power modeling are detailed analytical power models or high-level blackbox models. For a blackbox model the OS-reports make up a large component of present power modeling & benchmarking schemes. This has been the standard as an accurate look at the system under the assumption that CPU, memory, and disk are the main draws of power [1]. In other words graphics processors or power-aware networking equipment fall in the cracks under this assumption bound testing. To learn a great deal about one component though the analytical power model is superior to a blackbox model. An analytical power models looks only at a single component within a system. Extrapolating a model to include an entire system would be impractical and unfeasible. Hardware is moving away from CPU-dominated increasing the outdating of these models to paint an accurate picture.

Only recently have detailed analytical power models or high-level blackbox models have been attempted for modeling GPU power consumption [2]. Ma et all found with more work being sent to the GPU there is a noticeable demand in not only higher power consumption requirement but an increase in cooling solutions (also increasing the power consumption). A current problem with testing GPU Ma et all found is the potential power consumption to rapidly spike (high or low) during testing [2] exceeding the modeling parameters. Given the varying nature within GPU hardware compared to CPU, setting the boundaries for the parameters of a model, as Ma et all found, may require differing models depending on the benchmark being run.

[1] Suzanne Rivoire, Parthasarathy Ranganathan, Christos Kozyrakis. A Comparison of High-Level Full-System Power Models. Hotpower, 2008.

[2] Xiaohan Ma, Mian Dong, Lin Zhong, Zhigang Deng. Statistical Power Consumption Analysis and Modeling for GPU-based Computing. Hotpower, 2009.

No comments:

Post a Comment