The big mistake: Looking for incorrect input
One of the biggest mistakes developers of secure programs make is to try to check for "illegal" data values. It's a mistake because attackers are quite clever; they can often think of yet another dangerous data value. Instead, determine what is legal, check if the data matches that definition, and reject anything that doesn't match that definition. For security it's best to be extremely conservative to start with, and allow just the data that you know is legal. After all, if you're too restrictive, users will quickly report that the program won't allow legitimate data to be entered. On the other hand, if you're too permissive, you may not find that out until after your program has been subverted.
For example, let's say that you're going to create filenames based on certain inputs from a user. You may know that allowing users to include "/" would be a bad idea, but just checking for this one character would probably be a mistake. For example, what about control characters? Would spaces be a problem? How about leading dashes (which can cause problems in poorly-written scripts)? And could certain phrases cause a problem? In most cases, if you create a list of "illegal" characters, an attacker will find a way to exploit your program. Instead, check to make sure the input matches a certain pattern that you know is safe, and reject anything not matching the pattern.
It's still a good idea to identify values you know are dangerous: you can use them to (mentally) check your validation routines. Thus, if you know that "/" is dangerous, look at your pattern to make sure it wouldn't let that character through.
Of course, all of this begs the question: what are the legal values? The answer depends, in part, on the kind of data that you're expecting. So the next few sections will describe some common kinds of data that programs expect -- and what to do about them.
Let's start with what would appear to be one of the easiest kinds of information to read -- numbers. If you're expecting a number, make sure your data is in number format -- typically, that means only digits, and at least one digit (you can check this using the regular expression
^[0-9]+$). In most cases there is a minimum value (often zero) and a maximum value; if so, make sure the number is inside its legal range.
Don't depend on the lack of a minus sign to mean that there are no negative numbers. Many number-reading routines, if presented with an excessively large number, will "roll over" the value into a negative number. In fact, a clever attack against Sendmail was based on this insight. Sendmail checked that "debug flag" values weren't larger than the legal value, but it didn't check if the number was negative. Sendmail developers presumed that since they didn't allow minus signs, there was no need to check. The problem is that the number-reading routines took numbers larger than 2^31, such as 4,294,967,269, and converted them into negative numbers. Attackers could then use this fact to overwrite critically important data and take control of Sendmail.
If you're reading a floating point number, there are other concerns. Many routines designed to read floats may allow values such as "NaN" (not a number). This can really confuse later processing routines, because any comparison with them is false (in particular, NaN is not equal to NaN!). Standard IEEE floating point has other oddities that you need to be prepared for, such as positive and negative infinities, and negative zero (as well as positive zero). Any input data that your program isn't prepared for may be exploitable later.