HW15: MongoDB GeoSpatial Operators due Wed 01 May 13:00

\begin{purpose}
This assignment provides practice in using the aggregate framew...
...nd MongoDB's geospatial operators to produce some useful results.
\end{purpose}

Allowed and Disallowed Resources

In completing this assignment you MAY use/access the following resources:

You may NOT use/access:

Failure to abide by these guidelines will result in a zero for the assignment and the incident will be reported to the university provost as a violation of the university academic integrity policy. A second incident of academic dishonesty (whether from this course or another computer science course) will result in an F in the course.

Overview

In this assignment you will make use of two collections both of which are provided in the /home/data/json directory on the csci server:

You will be producing a script that will define functions (named a(), b(), etc.) that will perform relevant tasks for the purpose of helping the company get information about which countries their customers reside in.

Preparing the Database

This assignment will be done in a database called hw15. Begin by using the mongoload command to create two collections (named customers and countries) that are populated from the data files people.json and countries.json respectively. The data files are available in /home/data/json on the csci server.

The Functions

All functions should begin by switching to the hw15 database. When working directly in the shell you would do this by typing: use hw15. That will not work when executed as a function. Instead you'll put this at the beginning of the function:
db = db.getSiblingDB("hw15")

Your script, when finished should define the following functions:

a()
(2 pts) Switch to the hw15 database and drop these collections: customers, countries, and geocustomers. Obviously, after you run this function you will have to re-create the customers and countries collections as described above. (So after you get it working don't run it unless you need to!)

HINT: I have written this function below to introduce some notation.

b()
(6 pts) When executed this script should create a new collection named “geocustomers” based on the contents of the customers collection. Each document in the new collection will have _id to match the id value in the customer collection. The geocustomers collection will also have a gender field and a name field that are copied from customers. Finally, add a new field called “location” that is a GeoJSON object containing the customer's location. To make this transformation happen you'll need to rearrange the coordinate values into a new object and convert the coordinate values from string to double.

HINT: I used an aggregate pipeline with $project and $out stages to accomplish this.

c()
(6 pts) Create indexes on the geocustomers location field and on the countries geometry field. Then determine how many customers have their location in each country. Update the countries collection to have a new field called customerCount that is set to the number of customers whose locations are within that country's borders.

HINT: I used the .forEach() method to iterate over each country and did a $geoWithin query to get the count of people within that country's border. Then I used an update query to add the new field.

d()
(6 pts) Display list of countries that have 5 or more customers. The list should be sorted to show countries with highest customer count first. The list should show the country name and the customerCount only. I used the properties.admin field as the country name.

HINT: I used an aggregate pipeline because I wanted more control over the format of my output document (but you can also use find() if you want to). When calling .aggregate in a function the output is squelched, so I used this to get the output I wanted:

let result = db.countries.aggregate([ .... ])
result.forEach(doc => printjson(doc))

e()
(8 pts) Find a delete all customers whose location is not found in any of the countries in the database. The customers should be deleted from both the customers collection and the geocustomers collection.

HINT: Remember that if you did b() correctly then entries in both collections share the same _id value. So, when you learn that a customer in geocustomers is not found in any country, you can use the id to delete from both collections. It took about 20 minutes for this function to run to completion when I had it working. I put an output statement in my loop to display the id of each customer I was deleting to be able to see/track progress. At one point the script stopped making progress (probably a network issue). So, I restarted the function and it finished fine. Also, I used $geoIntersects rather than $geoWithin for this particular query.

Possible Workflow

From the MongoDB shell you can load a script file using the load() command:
load("hw15.js")

Then you can test a function by simply running it from the shell: a()

My solution to a() was this:

function a() {
   db = db.getSiblingDB("hw15") // switch to hw15 DB
   db.customers.drop()
   db.countries.drop()
   db.geocustomers.drop()
}

Grading and Submission

In Canvas, upload your final working hw15.js script that will define the functions described above in a database named hw15.