Search This Blog

Monday, November 29, 2010

Storing Data

Whenever you start work on a data-driven application, one of the first design decisions you have to make is this: what's the best data store to use? In other words, how and where is the application going to store and access its data. The correct choice is always going to depend on the application's requirements. At the simplest level, you should be asking questions like:
  • How much data will the application use?
  • How often will it need access to the data?
  • How often will it need to modify the data?
  • How many users are likely to want access to the data at once?
  • How will the data grow over time?
  • How much do I stand to lose if the data is broken, stolen, or lost?
If the answer to any of these questions is "a lot," then you probably want to steer clear of using plain text files to store your data.
That's not to say that text files are useless—for instance, when visitors attempt to enter a password-protected area, an Apache Web server (unless configured otherwise) authenticates them against a plain text file containing a complete list of user IDs and passwords. This isn't a problem as long as it's dealing with a small group of users, but if it has to validate hundreds at the same time, it must scan the text file one line at a time until it finds a match. One unlucky user would have to wait for Apache to find his details on the last line of the file before it let him enter the password-protected area, and that could cost quite some time.
Often, the most efficient alternative is to use a database engine—or to use its more technical name, a Database Management System (DBMS)—to store, retrieve, and modify the data for you. A good database engine really just serves as a very, very smart go-between for you and your data. It organizes, catalogs, backs up, and a whole lot more besides, all with a view toward making that data quicker and easier to work with.


So where does all the data go? Well, it depends to some extent on the database engine you're using. Chances are, though, it'll end up being stored as streams of bits and bytes in a number of files—yes, files! Truth is you can't really get away from using files at some point. The trick is in finding ways to use them as efficiently as possible, and a good database engine has many, many such tricks up its metaphorical sleeves.

Databases and Databases

At this point it's worth taking a moment to look at exactly what the word database refers to. Strictly speaking, a database is just an efficiently organized collection of data, just as a library is really no more than an ordered collection of books. However, everyone tends to use the word library to refer to the institution as a whole—not just the books, but the staff, its working practices, and the building, too—and the same is true of databases. So, when you hear someone talk about a database, he's more than likely referring to the whole caboodle: the database engine and the data.


This isn't so surprising because under normal circumstances the only contact you ever have with the raw data is through the DBMS, or database engine to be slightly less formal. Most of the time you never see the join, so there's no real practical distinction. Later we discuss installing the database, connecting to the database, talking to the database, and so on—just remember that we're using the word in its more common, all-encompassing sense.

Database Architectures

Before we get going here, you need to settle on a particular database with which to experiment, and that means first deciding on the type of database architecture you're going to use. There are two main options: embedded and client/server. Let's take a quick look at both.

Embedded Databases

An embedded database runs—and stores its data—on the same machine as the program that wants to use it (PHP in this case). The database is not networked, and only one program can connect to it at any given time. Moreover, the database can't be shared between different machines because each one would simply end up storing and manipulating its own separate version of the data. It's analogous to having your own personal library, staff and all, to which you have exclusive access. For smaller applications there can often be advantages to using an embedded database. For larger systems, the advantages provided by the larger Relational Database Management Systems (RDBMS) tip the scales in their favor and as a result, most commercial enterprises use client/server database architectures.


Long-standing, popular examples of embedded database engines include dBase and DBM, and PHP provides connectivity to both. A more recent addition to the fold is SQLite, which is not just available as a PHP5 extension, but is actually bundled as part of the PHP5 download. For that reason alone it's well worth a look, and some impressive performance stats certainly help back up its placement as the rising star of PHP database technologies. You can learn more about SQLite in Appendix C, Using SQLite.

Client/Server Databases

Client/server databases are designed for use over networks, enabling multiple users (who may be scattered across a whole host of different locations) to work simultaneously with the same data. The database (the term database is also often used as shorthand for Database Management System, or DBMS) itself acts as a server, not unlike the Web servers discussed in the opening chapters. In principle it can field requests from just about anywhere with a network connection and a suitable client program. That said, there's no reason why you can't run both on the same machine.


Client/server databases are more closely analogous to the earlier metaphor of the public lending library. It's open to anyone who has a membership card (is registered with the library) and, with several staff members on the reception desk, can deal with requests from a number of visitors at once. The visitors may be there in person, or may call in on the telephone.


The actual job of finding or returning a book to the shelf might then be delegated to an assistant, but visitors don't need to worry about that. Each receptionist may have to deal with several visitors at once, but if he's good at his job, he'll spend a reasonable amount of time on each, and all visitors feel that they're getting good service. Likewise, when several users access a client/server database engine at the same time, the engine regularly switches its attention among them, giving each one the sense that she's being attended to all the while.
This is the kind of database you're more likely to find being used in a large company, where large quantities of data need to be shared among large numbers of people, where access may be needed from all sorts of different locations, and where having a single centralized data store makes important jobs like administration and backup relatively straightforward. Any applications that need to access the database use specialized, lightweight client programs to communicate with the server.


RDBMSs (Relational Database Management Systems) are often expensive and complex to set up and administer. The widely acknowledged big three in this field are Oracle, DB/2 (from IBM), and SQL Server (from Microsoft). All three are massive, feature-rich systems, seemingly capable of just about any kind of data storage and processing that a modern business could need. The flip side of the coin is that these systems are big and expensive, and may contain more functionality than you will ever need.


Fortunately there are alternatives such as PostgreSQL and MySQL, which are both Open Source client/server database systems that have proven very popular with PHP developers for many years. They're fast, stable, easily meet the needs of most small-to-medium sized projects, and, to top it all, they're free!

Choosing a Database

In principle, you can use any of these database systems in your PHP applications. There's even no reason why you can't hook one application up to several different database engines. To keep these chapters to a reasonable length, however, the focus will be on just one—MySQL.


Compared to the other choices, it offers several advantages:
  • It's one of the most popular databases being used on the Web today.
  • It's freely available as a download to install and run on your own machine.
  • It's easy to install on a wide range of operating systems (including Windows and UNIX).
  • It's available as a relatively cheap feature in many Web-hosting packages.
  • It's simple to use and includes some handy administration tools.
  • It's a fast, powerful client/server system that copes well with very large, complex databases, and should stand you in good stead when it comes to larger projects.
If you're not too fussed about the last criterion (and particularly if you don't want to pay out extra for database functionality on your Web account!) you might well find that an embedded database such as SQLite does a perfectly good job.
MySQL is a freely available RDBMS, which fully joined the Open Source Community only recently, when it was released under the GNU Public License (GPL). Even before it went free, you didn't need a license unless you wanted to make money out of it, or run the server on the Windows platform (the Windows version of MySQL was shareware). Because you now don't have to pay a dime to use it, this alone makes MySQL a solid candidate for developing database applications. If the GPL worries you for any reason, or you need to incorporate MySQL into a commercial application, you can still buy a commercially licensed version from the developers at www.mysql.com.
That said, a lot of the concepts to be introduced here and in the following chapters extend well beyond one specific database engine, so I still recommend following along with MySQL for the time being.