DH Bridge

Encouraging Computational Thinking and Digital Skills in the Humanities

Moving to Phase Two

In this module we will learn to:

  1. add additional parameters to our API call
  2. create an empty list

It is time stop briefly and think through what we have already learned. So far, we have learned to interact with your computer using the terminal, and with Python (a programming language) we used to execute a simple program. We have also learned how to use pip to add additional libraries to Python, and thought about how we can combine libraries to create new programs.

We also learned about data and APIs. We looked at JSON as a way of structuring data and thought about what it means to represent things in this format. Then, we looked at the DPLA API and learned how to leverage libraries like DPyLA to use the API in our code. Finally, we saved our code in a Python file that we learned how to execute in the terminal.

Well Done!

So far you have been writing code that does one thing: it get search results or prints a particular item. The power of computation, however, is being able to do the same thing to a series of items. This means thinking in terms of iterating through a series of items, doing the same thing to each in order to work to an answer. To do this, we will learn two very powerful programming concepts: functions and loops.

Anatomy of a Python Script File

Just like when we are writing papers or structuring an argument, order matters in our script files. The computer will execute the code in the order it encounters it.

One good way to keep your code organized and avoid bugs caused by organization errors is to follow a set pattern when constructing your scripts.

First, at the top of your files, place the list of libraries you're importing. This way they are ready for use throughout the script. We did this when we started with from dpla.api import DPLA.

Next, declare (or set) the variables (names) that you will use through the script. In code-speak, these are set at the "global scope" meaning that they are available to all functions in the file. We did this when we assigned our API key to the "dpla" variable with dpla = DPLA("YourAPIKey") and when we saved our search to "results" with result = dpla.search("cooking").

After the variables, we will put our functions. Functions are like mini-scripts or packets of code that do one job. We can then combine functions to create more complicated programs. We will add some functions to our script in the next modules.

Finally, at the send of the file, we put the "calls to the functions" or commands to execute functions. To do this, we use the function name and (usually) parens "()". One exception is the "print" function, which doesn't require parens in the version of Python we are using for this workshop. We called the "print" function to print one item from our search results. It is important to call the functions in the order that you want them to be run.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Load Libraries
import [library name]

# Set Variables
my_string = "information to be stored in my_var"

# Define Functions
def my_function(my_string):
	#do something to "my_string"
	my_list = str.split(my_string)
	return my_list

# Make Function Calls
my_var = my_function()

print my_var[1]

Asking Questions of Our Data

We have been able to do a lot with filtering the data. But what if we want to ask questions about how the objects associated with "cooking" are described across all 10,000+ items? Say we are interested in how those descriptions of items related to cooking are gendered. How would we investigate patterns across the entirety of the DPLA's holdings related to "cooking" and gender?

There are many ways one could go about investigating the descriptions. Work with your table to brainstorm a couple of approaches.

1. Adding Parameters to the Query

The first step is to get all of the relevant data. To do that, we need to restructure our query so we can control the number of items and the which "page" of results we retrieve. Remember that the API by default gives us 10 items at a time. We can add a parameter to our query to get up to 500 items at a time, but there were nearly 11,000 items associated with "cooking". This means we need to retrieve all 23 pages of 500 items.

Open your "my_first_script.py" file. Look at the line that says:

1
result = dpla.search('cooking')

First, let's narrow down the information we get back for each item from the DPLA. Looking at the documentation, we can do this by identifying the field names we are interested in. Looking at the search results, the description information, the titles, and subject headings are all found under 'sourceResource'. We can limit our results by adding fields=['sourceResource'] to our search:

1
result = dpla.search('cooking', fields=['sourceResource'])

This gives us a smaller set of data that is applicable to our questions, but with enough context in case we need to work back to the original item.

Next, let's change the number of items we get back by adding an additional parameter to what we pass to "dpla.search". Looking at the documentation, we learn that the syntax for setting the number of items is page_size= and the number of items we want. To get 50 items rather than 10, change that line to:

1
result = dpla.search('cooking', fields=['sourceResource'], page_size = 50)

To check that this worked, let's add a line telling the computer to print out the 40th item:

1
print result.items[39]

Remember, counting within a list begins at 0.

Group Challenge

Using the documentation, work with your group to add another parameter to get the information from page 3 of the search results.

2. Creating a Variable

To get all 10,000+ items associated with cooking, we need to do two things: we need to get the items from all of the pages and we need a place to store them, so that as we get new items, our collection grows.

Let's tackle the second problem first. Remember back when we discussed variables? Variables are names we use to hold values. We can also use variables to hold lists. A list holds a series of items, such as fruit = ['apple', 'orange', 'banana']

We are going to use a similar structure to save all of the items we get back from the DPLA.

Go back to "my_first_script.py" in your editor. "Comment out" the line result = dpla.search('cooking', fields=['sourceResource'], page_size=50) by putting a # at the beginning of the line. When your computer executes the file, it will skip all lines that start with a pound sign. This allows you to leave comments for yourself or to test new ways of doing things without loosing your work and tells the computer to skip this line.

Also comment out the line you wrote to print out one of the items.

Your file should look like:

1
2
3
4
5
6
7
8
9
10
11
#Load Libraries
from dpla.api import DPLA

#Set Variables
dpla = DPLA('YourKeyHere')
# result = dpla.search('cooking', fields=['sourceResource'], page_size = 50)

# Define Functions

# Make Function Calls
# print result.items[39]

Now, let's create a new variable, "all_records" and set it equal to an empty list.

To do this, add a new line to the variables section of the code:

1
all_records = []

This tells the computer that we have a variable, "all_records", that this variable will hold a list, and that we currently have no items in that list.

Now that we have a place to store our values, we now have to tell the computer to get the search results from each page and save those results to the "all_records" list.

Previous Module Next Module