Find and secure the gateways into your program
This article discusses various ways data gets into your program, emphasizing how to deal appropriately with them; you might not even know about them all! It first discusses how to design your program to limit the ways data can get into your program, and how your design influences what is an input. It then discusses various input channels and what to do about them, including environment variables, files, file descriptors, the command line, the graphical user interface (GUI), network data, and miscellaneous inputs.
In early 2001, many large
companies installed the application "SAP R/3 Web Application Server
demo," unaware that it included a dangerous vulnerability. The
application included a program named saposcol, which failed to protect
itself against malicious input values. An attacker could set the
environment variable to change where saposcol looked for other
programs, and then create a malicious "expand" program for saposcol to
run. Since saposcol had setuid root privileges, this meant that local
users could quickly gain control over the entire computer system (as
root), because of a single programming mistake (see Resources for a link to more on this and other incidents mentioned in this article).
The previous installment in this column identified some common input data types and how to check them. But knowing how to check data types isn't enough if you don't know where all of your data comes from. This article discusses various ways that data gets into your program -- some of which aren't obvious -- and emphasizes how to deal appropriately with them.
If you're not in control, your attacker is
The first line of defense in a secure program is to check every untrusted input. But what does that mean? It comes down to three things:
- Limit the portions of your program that are exposed. If your program is subdivided into pieces -- and this is often a good idea -- try to design it so that an attacker can't communicate at all to most pieces. That includes being unable to exploit the communication paths between the pieces. It's best if attackers cannot view, modify, or insert their own data into those communication paths (including slipping in as a middleman between the pieces). If that's not possible -- such as when the pieces communicate using a network -- use mechanisms such as encryption to counter attackers. Later articles will discuss this in more detail.
- Limit the types of inputs allowed by the exposed portions. Sometimes you can change your design so that only a few inputs are even possible -- if you can, do so.
- Ruthlessly check the untrusted inputs. A truly "secure" program would have no inputs, but that program would be useless. Thus, you need to ruthlessly check data on all input paths into your program from untrusted sources. The previous installment discussed how to check different types of data; this article will help you identify where that data comes from. That doesn't mean you only check data just entering into your program. It's often wise to check data in multiple places, but you must check all data at least once, and it's often wise at least to have a check when the data first comes in.
It all depends on the type of program
You must check all untrusted inputs -- but what are they? Some of them depend on what your program does. If your program is a viewer or editor of data -- such as a word processor or image displayer -- that data might be from an attacker, so it's an untrusted input. If your program responds to requests over a network, those requests might very well come from an attacker -- so the network connection is an untrusted input.
Another important factor is how your program is designed. If parts of your program run as "root" or some other privileged user, or have privileged access to data (such as the data in a database), then inputs to those parts from the unprivileged parts and programs are untrusted.
An especially important case is any program that is "setuid" or "setgid." Just running a setuid/setgid program turns on special privileges, and these programs are especially hard to make secure. Why? Because setuid/setgid programs have an especially large set of inputs -- many of them surprising -- that can be controlled by an attacker.