In completing this assignment you MAY use/access the following resources:
- Examples and sample code found here:
https://josephus.hsutx.edu/classes/db/source/
- The SQL handout provided in class and also available here:
https://josephus.hsutx.edu/classes/db/source/sql.pdf
- A text editor (or VSCode) that provides syntax highlighting. You MAY
NOT USE an extensions to such editors/IDEs that produce code.
- The phpPgAdmin front end to your personal database (login is here:
https://csci.hsutx.edu/phpPgAdmin/
- Video instructions provided in Canvas as part of this course. You MAY
NOT USE any other video resources.
- Any handouts provided by the instructor as part of this course.
- Your own course notes
- Your instructor
- Discussions about the assignment with other students as long as you
never look at the code produced by another student and you never receive
instructions about solving the homework. That is, discussions need to
be about concepts and understanding the technologies and not about
how to solve the particular problem posed in this assignment.
You may NOT use/access:
- Resources not expressly listed above, including, but not limited to,
the following ...
- Source code not provided as part of this assignment. (Obviously, this
includes, but is not limited to, source code written by other students
whether current or in the past).
- Code-generating tools (of which ChatGPT is one example).
Failure to abide by these guidelines will result in a zero for the assignment
and the incident will be reported to the university provost as a violation of
the university academic integrity policy. A second incident of academic
dishonesty (whether from this course or another computer science course) will
result in an F in the course.
In this assignment you will make use of two collections both of which are provided
in the /home/data/json directory on the csci server:
- people.json contains lots of information about customers for a company.
- countries.json contains information about many countries in the world
including a multipolygon designating each country's borders.
You will be producing a script that will define functions (named a(), b(), etc.) that
will perform relevant tasks for the purpose of helping the company get information
about which countries their customers reside in.
This assignment will be done in a database called hw15. Begin by using the
mongoload command to create two collections (named customers and countries)
that are populated from the data files people.json and countries.json
respectively. The data files are available in /home/data/json on the csci server.
All functions should begin by switching to the hw15 database. When working directly
in the shell you would do this by typing: use hw15. That will not work when executed as
a function. Instead you'll put this at the beginning of the function:
db = db.getSiblingDB("hw15")
Your script, when finished should define the following functions:
- a()
- (2 pts) Switch to the hw15 database and drop these
collections: customers, countries, and geocustomers. Obviously, after you
run this function you will have to re-create the customers and countries
collections as described above. (So after you get it working don't run it
unless you need to!)
HINT: I have written this function below to introduce some notation.
- b()
- (6 pts) When executed this script should create a new collection named
“geocustomers” based on the contents of the customers collection. Each
document in the new collection will have
_id
to match the id value in
the customer collection. The geocustomers collection will also have a
gender field and a name field that are copied from customers. Finally, add
a new field called “location” that is a GeoJSON object containing the
customer's location. To make this transformation happen you'll need to
rearrange the coordinate values into a new object and convert the coordinate
values from string to double.
HINT: I used an aggregate pipeline with $project
and $out
stages
to accomplish this.
- c()
- (6 pts) Create indexes on the geocustomers location field and on the
countries geometry field. Then determine how many customers have their location
in each country. Update the countries collection to have a new field called
customerCount that is set to the number of customers whose locations are within
that country's borders.
HINT: I used the .forEach() method to iterate over each country and
did a $geoWithin
query to get the count of people within that
country's border. Then I used an update query to add the new field.
- d()
- (6 pts) Display list of countries that have 5 or more customers.
The list should be sorted to show countries with highest customer count
first. The list should show the country name and the customerCount only.
I used the properties.admin field as the country name.
HINT: I used an aggregate pipeline because I wanted more control over the
format of my output document (but you can also use find() if you want to).
When calling .aggregate in a function the output is squelched, so I used
this to get the output I wanted:
let result = db.countries.aggregate([ .... ])
result.forEach(doc => printjson(doc))
- e()
- (8 pts) Find a delete all customers whose location is not found in
any of the countries in the database. The customers should be deleted from
both the customers collection and the geocustomers collection.
HINT: Remember that if you did b() correctly then entries in both
collections share the same _id
value. So, when you learn that a
customer in geocustomers is not found in any country, you can use the id
to delete from both collections. It took about 20 minutes for this function
to run to completion when I had it working. I put an output statement in
my loop to display the id of each customer I was deleting to be able to
see/track progress. At one point the script stopped making progress
(probably a network issue). So, I restarted the function and it finished
fine. Also, I used $geoIntersects
rather than $geoWithin
for this particular query.
From the MongoDB shell you can load a script file using the load() command:
load("hw15.js")
Then you can test a function by simply running it from the shell: a()
My solution to a() was this:
function a() {
db = db.getSiblingDB("hw15") // switch to hw15 DB
db.customers.drop()
db.countries.drop()
db.geocustomers.drop()
}
In Canvas, upload your final working hw15.js script that will define the functions
described above in a database named hw15.