Simple Page Scraper due Mon 22 Jan 11:00

\begin{purpose}
Completing this assignment will provide experience in:
\begin{...
...tem leveraging the power of server-side web coding
\end{itemize}
\end{purpose}

Overview

This assignment is the first in a series exercises that will produce a functional web application for providing a textbook clearing house in which a seller can list books they have for sale and buyers can browse for books on the site. Details of the site will be explained as the assignments progress.

IMPORTANT: We can lookup book data from Amazon via the book's ISBN by using the following URL: http://www.amazon.com/exec/obidos/ISBN=0123456789

On our book site we are going to have users identify their book by its ISBN. We will then look up the details of the book so they don't have to type that information in when listing a book.

Steps

If you haven't done so already, complete the lab day assignment because this builds on it.

In scraper.php add additional validation code that makes sure the price is a number in the range 0.00 to 200.00.

Modify the code that reads the the poem file to instead read the URL described above (but with the ISBN filled in from the value provided in the form). The result should be that Amazon's book page is displayed from your page.

Rather than displaying the page we want to extract the data of interest to us. In particular you should parse the book page to obtain: the list of authors as a single string, the year, the publisher, the edition, and the title of the specified book. NOTE: Not all information is available for all books. Your extraction code should work for most books, but don't spend forever trying to make it perfect for all cases. Here are a couple of ISBNs that may be useful for test purposes: 0-8120-4152-6 and 0-07-050606-X.

If no matching ISBN is found then provide a simple message indicating that fact. Otherwise, display the extracted information with one item per line.

Hints

Here is an example of the the code I used to extract a title:
if (preg_match("/<span id=\"productTitle\".+>(.+)<\/span>/",$line,$result)) {
   $title= $result[1];

NOTE: $line is the current line of HTML code we are looking at. If the line has an id of ``productTitle'' then I know it contains the title which I capture into $result using the parenthesized .+. I determined the string to look for by inspecting sample pages at Amazon.

Take time to read the documentation for the preg_match command rather than blindly trying to use it. Then come up with additional commands to extract the other required elements.

Quick Links