Search This Blog

Saturday, February 27, 2010

Form Validation

Here's a maxim for you: Users can, and will, enter anything and/or nothing in your HTML forms, no matter how easy you make it to use the forms or what you expect the users to enter. Likewise, malicious users will deliberately enter oddball data to try to break your applications. Form validation on both the client and the server is your main defense.
One way to deal with this is to limit the values permitted in a certain text box. In loan.php, for example, you'll recall that the program passed the person's age into a variable called $age, and the PHP page then validated that value against a more realistic range of age values:


if ($age < 1 or $age > 130)
{
     echo "Incorrect age value entered. Please enter an age between 1 and 130.";
     break;
}

Of course, you're not limited to simply informing the user he's entered an inappropriate age value. You can take any other steps you want at this point, now that you've identified an inappropriate value.

Using the exit Statement

Performing validation is great, but if you encounter incorrect data, what can you do? Sometimes you just need to quit processing, and that's what the exit statement is for. Although the break statement exits the current structure, the exit statement ends all processing. You can use either method to end processing, but exit is much more abrupt, so be careful where you use it. No further HTML, PHP code, or text is executed after an exit is encountered, and unless you're very careful, the user may get back an unexpected result, such as a partially completed page. So why don't you rebuild the loan application example and tighten it up a bit more against possible user errors by incorporating some form validation logic.


Try it Out: Form Validation
Start example
  1. Open loan.php (from Chapter 4), save it as loan_fv.php, and insert the following changes:
    <html
      <head><title></title></head>
      <body>
    <b>Namllu credit bank loan form</b>
    <?php
    if (isset($_POST['posted'])) {
    
       $age = $_POST['age'];
       $first_name = $_POST['first_name'];
       $last_name = $_POST['last_name'];
       $address = $_POST['address'];
       $loan = $_POST['loan'];
       $month = $_POST['month'];
    
       //validation
       if ($age < 10 OR $age > 130)
       {
         echo "Incorrect Age entered - Press back button to try again";
         exit;
       }
       if; ($first_name == "" or $last name = = "")
       {
         echo "You must enter your name - Press back button to try again";
         exit;
       }
       if ($address == "")
       {
         echo "You must enter your address - Press back button to try again";
         exit;
       }
       if ($loan != 1000 and $loan != 5000 and $loan != 10000)
       {
            echo "You must enter a loan value -- Press back button to try again";
            exit;
       }
       $duration = 0;
       switch ($loan) {
       case "1000";
          $interest = 5;
          break;
       case "5000";
          $interest = 6.5;
          break;
       case "10000";
          $interest = 8;
          break;
       default:
          echo "You didn't enter a loan package!<hr>";
          exit;
       }
       while ($loan > 0)
        {
          $duration = $duration + 1;
          $monthly = $month - ($loan * $interest / 100);
          if ($monthly <= 0)
           {
             echo "You need larger repayments to pay off your loan!<hr>";
             exit;
           }
       $loan = $loan - $monthly;
       }
        echo "This would take you $duration months to pay this off
      at the interest rate of $interest percent.<hr>";
    
      }
      ?>
      <form method="POST" action="loan_fv.php">
      <input type="hidden" name="posted" value="true">
      <br>
    
      First Name:
      <input name="first_name" type="text">
      Last Name:
      <input name="last_name" type="text">
      Age:
      <input name="age" type="text" size="3">
      <br>
      <br>
      address:
      <textarea name="address" rows="4" cols="40">
      </textarea>
      <br>
      <br>
      what is your current salary?
      <select name="salary">
      <option value=0>under $10000</option>
      <option value=10000>$10,000 to $25,000</option>
      <option value=25000>$25,000 to $50,000</option>
      <option value=50000>over $50,000</option>
       </select>
       <br>
       <br>
       How much do you want to borrow? <br><br>
       <input name="loan" type="radio" value="1000">Our $1,000 package at 5.0%
    interest
       <br>
       <input name="loan" type="radio" value="5000">Our $5,000 package at 6.5%
    interest
       <br>
       <input name="loan" type="radio" value=" 10000">Our $10,000 package at 8.0% interest
       <br>
       <br>
       How much do you want to pay a month?
       <input name="month" type="text" size="5">
       <br>
       <br>
       <input type="submit" value="calculate">
       </form>
       </body>
       </html>
    
  2. Save this file as loan_fv.php and then close it.
  3. Open the file in your browser and enter some values that are out of bounds or otherwise incorrect. 


End example

How it Works

You could enter someone else's address, or an age other than your own, and there's no way PHP can check for incorrect entries of this type. But what we can do with our new code is make sure that the user hasn't simply forgotten to add a detail, or maliciously supplied obviously wrong information about their age. We use four if..then..else statements to do this. The first checks to see whether the age entered is between 10 and 130, otherwise we can be pretty sure that the person is lying:


if ($age < 10 OR $age > 130)
{
   echo "Incorrect Age entered - Press back button to try again";
   exit;
}

If the condition is not met, you display the "error" message and exit there. If the condition is met, you don't need to do anything further.


The second if statement checks for first and last names being present. The "" denotes an empty string (a string with no characters in it), and this is how you check for one:


if ($first_name == "" or $last_name == "")
{
    echo "You must enter your name - Press back button to try again";
    exit;
}
Do the same check for the $address variable:
if ($address == "")
{
   echo "You must enter your address - Press back button to try again";
   exit;
}
And then check the values of the radio buttons to validate that one was selected.
if ($loan != 1000 and $loan != 5000 and $loan != 10000)
{
   echo "You must enter a loan value - Press back button to try again";
   exit;
}

If the $load variable is not equal to any of these values, you know that the user didn't select a value.

Preventing User-Inserted HTML: HTMLSpecialChars()

Another way users can abuse your applications is by entering HTML code directly in as part of their data entry into your form fields. This works because HTML is plain text, and when it comes back out, it's processed by the browser just like any other plain text characters that form HTML code. For example, if you have created an application that accepts user input for directory listings, a slick user might insert <b> before his name, and </b> after his name, so that his listing would have his name in bold. Nice trick, but unfair to the other users.


While this exact situation might not be a serious problem, under other circumstances it could be used to break your HTML or otherwise thwart your intent. Fortunately, PHP provides the HTMLSpecialChars() function, which changes HTML tags into special characters (more on this shortly). It just requires a string argument to work, like this:


$String = HTMLSpecialChars("<b>The bold tags won't appear after processing </b>");

A variable name will also do the trick:


$String ="<B> The bold tags won't appear after processing </B>";

$String = HTMLSpecialChars($String);

The HTMLSpecialChars() function converts any HTML tags into the what are called special characters. Special characters in HTML are simply entities that represent the HTML characters they have been translated from. For example, <b> is translated into &lt; (for the less-than sign), the letter b, and &gt; (for the greater-than sign). When the browser receives these special characters, it displays them on the screen as the HTML characters they represent, instead of processing and rendering them as ordinary HTML tags.


This feature is often used when you want to make a Web page that discusses HTML tags (when you need to display the tags in plain text rather than letting the browser process them), but it certainly comes in handy for preventing users from entering their own HTML into your PHP application.


Ultimately, there is no limit to the degree of error checking you can perform. In fact, you could preset all variable values in advance, but then what would be the point of providing a form? You could just automatically do everything in advance. Seriously, it really helps to try to think like a user. The ratio of users to hackers is pretty small, meaning that most users who break your application do so unintentionally. They'll either not understand what you meant, or simply make a mistake. So attempt to think like your users, try your applications out on family members and friends, and try to anticipate all the possible responses, accounting for them as comprehensively as possible. The extra work you put in will be well worth it.

String Validation and Regular Expressions

All the data your application receives from the user's browser is formatted as strings, as you know. So PHP's wealth of string manipulation functions come in very handy for validating data entered by the user, or reformatting string data into sequences acceptable as other data types. But PHP contains other functions, called regular expression functions, that are a quantum leap more powerful when it comes to manipulating data. You'll explore both of these subjects in the next few sections.

String Validation

You can use PHP's string manipulation functions in a variety of clever ways to perform basic validation of data being entered by users. In this section you'll see a few common ones, but remember, you can easily devise your own, using the functions covered here, other PHP functions, or combinations of all of them. You've already had a go at a few of these (like strlen() and substr()) in Chapter 2 and in some of our other example scripts, but it doesn't hurt to use them again in a validation context.
Using strlen()
Some data is always a certain length, such as U.S. ZIP codes, which are always 5 digits or, in the case of ZIP+4, 5 digits, a hyphen, and 4 more digits. So one way to validate data entered as postal codes is to use the strlen() function like this:
If (strlen($postal_code) == 5 or strlen($postal_code) == 10) {
   //check to make sure if 10 the dash is in spot 6
     //do something
} else {
   //send error message
}
Using strstr()
In the preceding example you needed to find out if the character in the sixth spot of the incoming string was actually a hyphen. The strstr() function is useful for determining this because as its name implies, it looks for a string within a string. In this example, you want it to look for a one-character string consisting of a hyphen. This code would work:
if (strlen($postal_code) == 5
  or strlen($postal_code) == 10) {
    if (strlen($postal_code) == 10) {
       if (strstr($postal_code, "-")) {
          //do something
       }
    }
} else {
    //send error message
    }
You also want to use the strstr() function to check for a space at the sixth position, just in case the user omitted the hyphen but entered a space.
Using substr()
Continuing with the same example, suppose you want to separate out the +4 portion of the ZIP+4. You know that this portion begins at spot 7, and should be four characters long. You could employ the substr() function for this purpose. As arguments, the substr() function takes the string in question, an integer representing the position to start looking at, and an optional integer specifying the number of characters to return. Code like this would work:


If (strlen($postal_code) == 5
 or strlen($postal_code) == 10) {
   If (strlen($postal_code) == 10) {
      $plus4_portion = substr($postal_code,7,4)
   }
} else {
  //send error message
}
Using addslashes() and stripslashes()
For applications in which you allow the user to enter data that may be going into a database, it's a great idea to use the addslashes() function. This function adds slashes wherever it finds string characters that might cause a problem for database entries (the ', ", \, and NULL characters). Later, when you output the data to the user again, you must use the stripslashes() function to—you guessed it—remove those slashes that addslashes() inserted.
Why is there a function dedicated to protecting your database entries? If you've ever composed a SQL string for inserting a record (or for just about any database function, for that matter), you're aware that SQL tends to be very intolerant of misplaced apostrophes (if you've not used SQL yet, you'll get that experience in Chapters 9–11).


So perhaps you're SQL string is supposed to look like this:


$query = "INSERT INTO clients (username) values('$username')";
mysql_query($query);

Now, if the user enters joeblow for his username, the query should run without a hitch. But if the user enters joe'blow, your query blows up because your database sees:


$query = "INSERT INTO clients (username) values('joe'blow')";

It might be a little hard to see, but if you look carefully you can see that the apostrophe in the user's username looks to the database like a broken set of delimiters. How can you ensure this doesn't happen? Apply the addslashes() function to the value entered by the user, and the username stored in the database will be correct. For example, using this would work:


$username = addslashes($username);
$query = "INSERT INTO clients (username) values('$username')";
mysql_query($query);

When this code runs, if the user enters joe'blow, addslashes converts that value to joe\'blow. The slash escapes the apostrophe, thereby causing your database to accept both characters (the slash and the apostrophe in combination) without blowing up your query.


But you must take care to use stripslashes() when outputting the username to the user again. Otherwise the user sees joe\'blow as his username, and if he doesn't remove the slash, the next time he edits his username he'll end up with joe\\\'blow entered because addslashes will try to escape both characters, and this could cause problems.


And you also must make sure to use addslashes anywhere that the user's username is used, because although it is properly seen on the screen as joe'blow, it is stored as joe\'blow, and only by using the addslashes() function again (like when he logs in) can you be sure the database will match up the values correctly.

Regular Expressions

Finding a specific string within another one is quite helpful in some situations, and of course if you know what the string you're looking for, the substr() function can handle the situation for you. But suppose you don't know exactly what string you'll be looking for. If you even know a little about the string, you can use regular expressions and PHP's regular expression functions to help you find it.


Suppose, for example, that you know you'll be looking for a string made up of all alphabetic characters and no numerical characters. In that case, you at least know the pattern you're looking for, and that's enough to start with. The simplest pattern is a word or a single character, as in the earlier strstr() example, which is looking for the hyphen:


if (strstr($postal_code, "-")) {

And to separate out data values in a string (a string of comma-separated values, for instance), you can use the explode() function, which separates characters in a string by a characters and puts the results into an array. The function takes two arguments, first the string to separate on and then the name of the array into which to put the results. You can use explode() with a simple if statement to test for the existence of a particular word within a string, as shown in the following code:


<?php
$words = "you, should, vote, happily";
$wordarray = explode(",", $words);
foreach ($wordarray as $word) {
   if ($word == "vote") {
      echo "Found string 'vote'";
   }
}
?>

But although you can use a simple PHP function like strstr() to find a matching character within a larger string of characters, or the explode() function for slightly more complex matching operations, you'll often find the need for much more complex patterns to match. That's where regular expressions come in handy.


Regular expressions, called regexps, are like a mini-programming language for creating very powerful patterns. They use a special notation to form the patterns that are used to match the values (or parts of values) that you provide. Certain characters take on special meanings in the context of a regexp, enabling you to broaden or narrow matches against sub-strings in the data. Some regexps will find characters belonging to a specified group; others find characters repeated a certain number of times. Regular expressions necessarily follow certain rules of syntax, which will be outlined as you read on.
Regular expressions are not limited to PHP. Languages such as Perl and Python, along with UNIX utilities like sed and egrep use the same notation for finding patterns in text. PHP's regular expression functions that follow Perl notation are called PCRE functions and begin with preg (for Perl Regular Expression), whereas ordinary PHP regular expression function are termed Posix-Extended regular expression functions. Don't use the ordinary (posix-extended) PHP regexp functions on binary data (they're not binary-safe); use the PCRE regexp functions instead.
So let's take a look at how to pattern match with some of the PHP regular expression functions, starting with the ereg() function.

Using ereg()

It works (after a fashion), but it's clunky, complicated, hard coded (the word "vote" is actually part of the code, instead of coming as input; indeed, the entire array is hard-coded), and worse still, the explode() function actually keeps all the punctuation—the string "you" won't be found but "you," (with the comma) would. This looks like a difficult problem, but it should be easy. Here's how it looks using a regular expression:


<?php
$words = "you, should, vote, happily";
if (ereg("vote", $words)) {
   echo "Found string 'vote'";
}
?>

Use the PHP function ereg() and just specify the pattern (the word you want to match that constitutes the actual regexp) and the string you want to match it against. It returns True if the pattern match was successful (in this case, on finding the character sequence "vote" in the string held by $words) and False if it wasn't.


You can also specify a third argument in ereg(): the name of an array, which is used to store successfully, matched expressions. Here's the preceding example modified to make use of it like this:


<?php
$words = "you, should, vote, happily";
if (ereg("vote", $words, $reg)) echo "Found string '$reg[0]'";
?>

Literal text written as a string is the simplest regular expression of all to look for, but you don't have to search for just the one word—you could look for any particular phrase. 


However, all the characters you're searching for must match exactly—words (with correct capitalization), numbers, punctuation, and even whitespace:


<?php
$words = "Vote twice or more if you can.";
if (ereg("twice if", $words, $reg)) echo "Found string '$reg[0]'";
?>
This string won't match, because it's not an exact match. Similarly, spaces inside the pattern are significant:


<?php
$words1 = "The bigdog is in the pound...";
$words2 = "...but the dog is in the cornfield";
$regexp = " dog";
if (ereg($regexp, $words1, $reg)) echo "Found string '$reg[0]'";
if (ereg($regexp, $words2, $reg)) echo "Found string '$reg[0]'";
?>

This finds only the second dog because both ereg() calls are specifically looking for a space followed by the three letters "d", "o", and "g".