In completing this assignment you MAY use/access the following resources:
You may NOT use/access:
- Resources not expressly listed above, including, but not limited to,
the following ...
- Source code not provided as part of this assignment. (Obviously, this
includes, but is not limited to, source code written by other students
whether current or in the past).
- Code-generating tools (of which ChatGPT is one example).
- Any web sites not directly linked to from the homework assignment.
Failure to abide by these guidelines will result in a zero for the assignment
and the incident will be reported to the university provost as a violation of
the university academic integrity policy. A second incident of academic
dishonesty (whether from this course or another computer science course) will
result in an F in the course.
We will extend the behavior of the work in the prelab assignment so that when
a user enters a valid ISBN we will access information about the book from
Amazon's website. We can lookup book data from Amazon via the book's ISBN by
using the following URL: https://www.amazon.com/exec/obidos/ISBN=0123456789
Instead of using fopen to open a local file you can use it to open a
URL and read the contents on the page one line at a time.
IMPORTANT: Amazon doesn't take kindly to lots of automated requests to its system.
So, while you are debugging your code you should use a static local file, and not
make a new request during the development process.
- If you haven't done so already, complete the lab day assignment because
this builds on it.
- Here are a couple of ISBNs that may be useful for test purposes:
0-8120-4152-6 and 0-07-050606-X. Start by visiting:
https://www.amazon.com/exec/obidos/ISBN=0812041526 in your browser
and then save the source code to a file named page1.html. Do the
same for the second ISBN and save it as page2.html. Save the contents of
these files in your hw01 directory on the CSCI server. You can do this
by transferring with an sftp client or by copy/pasting.
- Modify scraper.php to read page2.html instead of the poem file.
- Rather than displaying the page we want to extract the data of
interest to us. In particular you should parse the page to obtain: author names,
publisher, and book title. NOTE: Not all information is available for all
books. Also, there is some variation about how the information
is provided for different books. Your extraction code should work for most
books, but don't spend forever trying to make it perfect for all cases.
- If no matching ISBN is found then provide a simple message indicating
that fact. Otherwise, display the extracted information with one item per
line. Here is my output for ISBN 0-07-050606-X:
Title: A Tutorial Introduction to Occam Programming
Author: Dick Pountain
Author: David May
Publisher: McGraw-Hill (date)
- Once your code is properly scraping the desired data and is working for
both page1.html and page2.html, you can modify your fopen statement so
it reads directory from the amazon URL (with the ISBN entered by the user
embedded in the URL.
Here is an example of the the code I used to extract a title:
if (preg_match("/<span id=\"productTitle\".+>(.+)<\/span> <\/h1>/",$line,$result)) {
echo "Title: $result[1]<br>\n";
}
NOTE: $line
is the current line of HTML code we are looking at. If the line
has an id of “productTitle” then I know it contains the title which I capture
into $result
using the parenthesized .+
. I determined the string to
look for by inspecting sample pages at Amazon.
Take time to read the documentation for the preg_match
command rather than
blindly trying to use it. Then come up with additional commands to extract the other
required elements.
Next week we will be starting the first stage of a series of assignments that will
build for most of the semester. One of the steps of next week's assignment is to
build HTML and CSS pages for that site as a starting point. That task alone will
likely take a couple of hours. After that you will add a good bit of database
and form handling code which will take a good bit of time. Since this week's lab
day and homework assignment was short my advice is that you build your HTML and
CSS starting point this week.
If you choose to do this here are some thoughts to consider:
- The site we will be building is a simple community book selling website
in which users can list books for sale (title, condition, and price) and
interested shoppers can contact the seller to arrange for the sale.
- The application will need a menu with three items: Home, Add Book, and
Login.
- For the next homework you'll need a home page that will list the books
currently for sale (show title and price only). You can just put in a
couple of hard-coded books to get styling worked out.
- The “Add Book” page should use the same basic format as the home page
but should present a form that allows a user to enter a book title, the
condition of the book, and the price of the book.
It would make sense for you to use the basic styling and page arrangement you
used for your final web page from last semester, but you are welcome to branch
out and do something different.