Linked List and Phonetic Spell Checker due Wed 06 Sep 14:30

\begin{purpose}
This program requires a variety of programming skills including ...
...class written by someone else, and using a
hand-coded linked list.
\end{purpose}

Overview

Programs that perform spell checking often provide suggested spellings for words that do not appear in its dictionary. Deciding what words might be ``similar'' to a misspelled word can be a challenge. One approach to solving this problem is to take a phonetic approach. That is, you imagine what the misspelled word would sound like if pronounced and then find other words with similar pronunciations.

There have been a variety of algorithms proposed to give a phonetic representation of a word. One well-research approach is called Metaphone 3. We will use a provided implementation of this algorithm to explore a possible approach to suggesting alternatives for words not found in a dictionary.

Details

For this assignment you will write a program that loads words from a provided dictionary and calculates and stores a phonetic representation of each word. Then the program will read a text file containing Herman Melville's Moby Dick, parse it to isolate words, and then look those words up in the dictionary. If the word is not found then it is considered to be a misspelling and you will search for phonetic matches for the word in the dictionary, printing out the phonetic matches.

The program should report the following information:

When parsing the Moby Dick text use the following rules for each line of the file:

If you follow these rules you will eliminate all words that start a sentence. For the purposes of this assignment that is okay. Here is the (not super efficient) code I used to employ those rules. You are welcome to use this code as-is or you can do something more efficient yourself:

int i;
String [] splitted;
Scanner f;
String str;
 ...

   // inside loop where I'm reading from melville.txt
   str= f.nextLine();
   str= str.replaceAll("[^a-zA-Z]", " "); // get rid of non-alphanumeric
   splitted = str.split("\\s+");          // split based on whitespace
   for (i=0; i<splitted.length; i++) {
   if (splitted[i].length() > 2 && !splitted[i].matches(".*[A-Z].*"))
   {
      // we have extracted a "word" (splitted[i]) from melville.txt
      // this word contains only lowercase letters and is more than 2 characters
	}

Sample Output

Here is a small snippet of the output produced by my program for this assignment:
Suggestions for 'dotings': tidings tattings
Suggestions for 'maketh':
Suggestions for 'lookest': luckiest loggiest locust likest leggiest ...
   .         .          .
   .         .          .
   .         .          .
Suggestions for 'coincidings': conceiting conceding coinciding  ...
Suggestions for 'halfspent':
Words in Dictionary: 355097
Words Checked      : 155401
Words Not Found    : 631
Time Spent Loading : x.xx seconds
Time Spent Parsing : y.yy seconds
Total Time Spent   : z.zz seconds

Requirements

For this assignment the dictionary should be implemented as a hand-coded linked list. You may use any code you have previously written. All code must be yours unless it was supplied by the instructor as part of the assignment.

Your program should be documented using JavaDoc conventions and should follow other common Java programming conventions (http://josephus.hsutx.edu/classes/all/javaconventions/).

The program should employ an object-oriented design with each class stored as a separate file.

I would like start making use of discussion streams available in Canvas as a way to enhance learning and to increase the quality of turned-in work. For this assignment you will be asked to do the following:

Helpful Stuff

The starter code for this assignment is distributed in the hw01 directory of the ds_homework git repository described below. In the hw01 directory you'll find the Metaphone 3 libary along with a demo of how to use it. You have probably timed code before. If not I'm providing a Timer class you can use to make timing the code easier. Feel free to use or write your own Timer class.

The repository also contains a data directory that contains a dictionary data file as well as a text file containing the text of Moby Dick.

Submitting Your Program

This semester you will be submitting homework using a homework git repository as we did in Program Design 2. The URL for repository from which I'll be providing resources is: https://tsergeant@bitbucket.org/tsergeant/ds_homework.git.

If you don't remember how to use the repository for turning in homework, follow the instructions at http://josephus.hsutx.edu/classes/all/usinggit/. As part of the setup instructions you will establish your own (private) copy of the repository at bitbucket.org and will give terrys@gmail.com write privileges to your repository.

Prior to the due date you will push your completed work to your repository where it will be graded.

This program (and every other program you write this semester) should conform to the documentation guidelines given at http://josephus.hsutx.edu/classes/all/documentation/ as well as the Java conventions outlined at http://josephus.hsutx.edu/classes/all/javaconventions/.

Your program will be graded according the following criteria:

Correctness/Completeness 20 pts
Documentation 4 pts
Conventions 3 pts
OOP Design 5 pts
Discussion 3 pts
Total 35 pts

Quick Links