Here we discuss the analysis methodology we used to quantify Linux performance for SMP scalability. If you prefer, you can skip ahead to the section.
Our strategy for improving Linux performance and scalability includes running several industry accepted and component-level benchmarks, selecting the appropriate hardware and software, developing benchmark run rules, setting performance and scalability targets, and measuring, analyzing and improving performance and scalability. These processes are detailed in this section.
Performance is defined as raw throughput on a uniprocessor (UP) or SMP. We distinguish between SMP scalability (CPUs) and resource scalability (number of network connections, for example).
Hardware and software
The architecture used for the majority of this work is IA-32 (in other words, x86), from one to eight processors. We also study the issues associated with future use of non-uniform memory access (NUMA) IA-32 and NUMA IA-64 architectures. The selection of hardware typically aligns with the selection of the benchmark and the associated workload. The selection of software aligns with IBM's Linux middleware strategy and/or open source middleware. For example:
We use a query database benchmark, and the hardware is an 8-way SMP system with a large disk configuration. IBM DB2 for Linux is the database software used, and the SCSI controllers are IBM ServeRAID 4H. The database is targeted for 8-way SMP.
- SMB file serving
The benchmark is NetBench and the hardware is a 4-way SMP system with as many as 48 clients driving the SMP server. The middleware is Samba (open source). SMB file serving is targeted for 4-way SMP.
- Web serving
The benchmark is SPECweb99, and the hardware is an 8-way with a large memory configuration and as many as 32 clients. The benchmarking was conducted for research purposes only and was non-compliant (more on this in the Benchmarks section). The Web server is Apache, which is the basis for the IBM HTTP Server. We chose an 8-way in order to investigate scalability, and we chose Apache because it enables the measurement and analysis of next generation posix threads (NGPT) (see Resources). In addition, it is open source and the most popular Web server.
- Linux kernel version
The level of the Linux kernel.org kernel (2.2.x, 2.4.x, or 2.5.x) used is benchmark dependent; this is discussed further in the Benchmarks section. The Linux distribution selected is Red Hat 7.1 or 7.2 in order to simplify our administration. Our focus is kernel performance, not the performance of the distribution: we replaced the Red Hat kernel with one from kernel.org along with the patches we evaluated.
During benchmark setup, we developed run rules to detail how the benchmark is installed, configured, and run, and how results are to be interpreted. The run rules serve several purposes:
- Define the metric that will be used to measure benchmark performance and scalability (for example, messages/sec).
- Ensure that the benchmark results are suitable for measuring the performance and scalability of the workload and kernel components.
- Provide a documented set of instructions that will allow others to repeat the performance tests.
- Define the set of data that is collected so that performance and scalability of the System Under Test (SUT) can be analyzed to determine where bottlenecks exist.
Performance and scalability targets for a benchmark are associated with a specific SUT (hardware and software configuration). Setting performance and scalability targets requires the following:
- Baseline measurements to determine the performance of the benchmark on the baseline kernel version. Baseline scalability is then calculated.
- Initial performance analysis to determine a promising direction for performance gains (for example, a profile indicating the scheduler is very busy might suggest trying an O(1) scheduler).
- Comparison of baseline results with similar published results (for example, find SPECweb99 publications on the same Web server on a similar 8-way from spec.org).
If external published results are not available, we attempt to use internal results. We also attempt to compare to other operating systems. Given the competitive data and our baseline, we select a performance target for UP and SMP machines.
Finally, a target may be predicated on getting a change in the application. For example, if we know that the way the application does asynchronous I/O is inefficient, then we may publish the performance target assuming the I/O method will be changed.