Hyper Threading Speeds Linux Linux kernel benchmarks

Hyper-Threading Speeds Linux

By Duc Vianney, Ph. D. - 2003-12-31 Page: 1 2 3 4 5 6 7 8 9 10

Linux kernel benchmarks

To measure Linux kernel performance, five benchmarks were used: LMbench, AIM Benchmark Suite IX (AIM9), chat, dbench, and tbench. The LMbench benchmark times various Linux application programming interfaces (APIs), such as basic system calls, context switching latency, and memory bandwidth. The AIM9 benchmark provides measurements of user application workload. The chat benchmark is a client-server workload modeled after a chat room. The dbench benchmark is a file server workload, and tbench is a TCP workload. Chat, dbench, and tbench are multithreaded benchmarks, while the others are single-threaded benchmarks.

Effects of Hyper-Threading on Linux APIs

The effects of Hyper-Threading on Linux APIs were measured by LMbench, which is a microbenchmark containing a suite of bandwidth and latency measurements. Among these are cached file read, memory copy (bcopy), memory read/write (and latency), pipe, context switching, networking, filesystem creates and deletes, process creation, signal handling, and processor clock latency. LMbench stresses the following kernel components: scheduler, process management, communication, networking, memory map, and filesystem. The low level kernel primitives provide a good indicator of the underlying hardware capabilities and performance.

To study the effects of Hyper-Threading, we focused on latency measurements that measure time of message control, (in other words, how fast a system can perform some operation). The latency numbers are reported in microseconds per operation.

Table 1 shows a partial list of kernel functions tested by LMbench. Each data point is the average of three runs, and the data have been tested for their convergence to assure that they are repeatable when subjected to the same test environment. In general, there is no performance difference between Hyper-Threading and no Hyper-Threading for those functions that are running as a single thread. However, for those tests that require two threads to run, such as the pipe latency test and the three process latency tests, Hyper-Threading seems to degrade their latency times. The configured stock SMP kernel is denoted as 2419s. If the kernel was configured without Hyper-Threading support, it is denoted as 2419s-noht. With Hyper-Threading support, the kernel is listed as 2419s-ht.

Table 1. Effects of Hyper-Threading on Linux APIs

Kernel function	2419s-noht	2419s-ht	Speed-up
Simple syscall	1.10	1.10	0%
Simple read	1.49	1.49	0%
Simple write	1.40	1.40	0%
Simple stat	5.12	5.14	0%
Simple fstat	1.50	1.50	0%
Simple open/close	7.38	7.38	0%
Select on 10 fd's	5.41	5.41	0%
Select on 10 tcp fd's	5.69	5.70	0%
Signal handler installation	1.56	1.55	0%
Signal handler overhead	4.29	4.27	0%
Pipe latency	11.16	11.31	-1%
Process fork+exit	190.75	198.84	-4%
Process fork+execve	581.55	617.11	-6%
Process fork+/bin/sh -c	3051.28	3118.08	-2%
Note: Data are in microseconds: smaller is better.

The pipe latency test uses two processes communicating through a UNIX pipe to measure interprocess communication latencies via socket. The benchmark passes a token back and forth between the two processes. The degradation is 1%, which is small to the point of being insignificant.

The three process tests involve process creation and execution under Linux. The purpose is to measure the time taken to create a basic thread of control. For the process fork+exit test, the data represents the latency time taken to split a process into two (nearly) identical copies and have one exit. This is how new processes are created -- but it is not very useful since both processes are doing the same thing. In this test, Hyper-Threading causes a 4% degradation.

In the process fork+execve, the data represents the time it takes to create a new process and have that new process run a new program. This is the inner loop of all shells (command interpreters). This test sees 6% degradation due to Hyper-Threading.

In the process fork+/bin/sh -c test, the data represents the time taken to create a new process and have that new process run a new program by asking the system shell to find that program and run it. This is how the C library interface called system is implemented. This call is the most general and the most expensive. Under Hyper-Threading, this test runs 2% slower compared to non-Hyper-Threading.

View Hyper-Threading Speeds Linux Discussion

Page: 1 2 3 4 5 6 7 8 9 10 Next Page: Single-user application workload

First published by IBM developerWorks