Tuning, Measurement, And Analysis
Before any measurements are made, both the hardware and software configurations are tuned. Tuning is an iterative cycle of tuning and measuring. It involves measuring components of the system such as CPU utilization and memory usage, and possibly adjusting system hardware parameters, system resource parameters, and middleware parameters. Tuning is one of the first steps of performance analysis. Without tuning, scaling results may be misleading; that is, they may not indicate kernel limitations but rather some other issue.
The benchmark runs are made according to the run rules so that both performance and scalability can be measured in terms of the defined performance metric. When calculating SMP scalability for a given machine, we chose between computing this metric based upon the performance of a UP kernel or computing it upon the performance of an SMP kernel, with the number of processors set to 1 (1P). We decided to compute SMP scalability using UP measurements to more accurately reflect the SMP kernel performance improvements.
A baseline measurement is made using the previously determined version of the Linux kernel. For most benchmarks, both UP and SMP baseline measurements are made. For a few benchmarks, only the 8-way performance is measured since collecting UP performance information is time prohibitive. Most other benchmarks measure the amount of work completed in a specific time period, which takes no longer to measure on a UP than on an 8-way.
The first step required to analyze the performance and scalability of the SUT (System Under Test) is to understand the benchmark and the workload tested. Initial performance analysis is made against a tuned system. Sometimes analysis uncovers additional modifications to tuning parameters.
Analysis of the performance and scalability of the SUT requires a set of performance tools. Our strategy is to use Open Source community (OSC) tools whenever possible. This allows us to post analysis data to the OSC in order to illustrate performance and scalability bottlenecks. It also allows those in the OSC to replicate our results with the tool or to understand the results after experimenting with the tool on another application. If ad hoc performance tools are developed to gain a better understanding of a specific performance bottleneck, then the ad hoc performance tool is generally shared with the OSC. Ad hoc performance tools are usually simple tools that instrument a specific component of the Linux kernel. The performance tools we used include:
- /proc file system
meminfo, slabinfo, interrupts, network stats, I/O stats, etc.
- SGI's lockmeter
From SMP lock analysis
- SGI's kernel profiler (kernprof)
Time-based profiling, performance counter-based profiling, annotated call graph (ACG) of kernel space only
- IBM Trace Facility
Single step (mtrace) and both time-based and performance counter-based profiling for both user and system space
Ad hoc performance tools are developed to further understand a specific aspect of the system.
Collects scheduler statistics
Determines which kernel functions are blocking for investigation of idle time
Post-processes kernprof ACG
- copy in/out instrumentation
Determines alignment of buffers, size of copy, and CPU utilization of copy in/out algorithm
Performance analysis data is then used to identify performance and scalability bottlenecks. A broad understanding of the SUT and a more specific understanding of certain Linux kernel components that are being stressed by the benchmark are required, in order to understand where the performance bottlenecks exist. There must also be an understanding of the Linux kernel source code that is the cause of the bottleneck. In addition, we work very closely with the LTC Linux kernel development teams and the OSC (Open Source community) so that a patch can be developed to fix the bottleneck.