DH Bridge

Encouraging Computational Thinking and Digital Skills in the Humanities

Writings Search Results to a File

In this module we will learn:

  1. how to save our results to a file

In this module, we will focus on writing the results of our functions to a text file. This gives us a local copy of the data so that we only hit the DPLA servers once for the entire collection of files.

Saving our Search Results

In order to save our search results, we first need to add a line to our code that creates the file we will save to. In your "my_first_script.py" file, right under the "all_records" variable in the "Define Variables" section, add:

1
f = open('search_results.json', 'w')

Your file should now look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Load Libraries
from dpla.api import DPLA

# Set Variables
dpla = DPLA('Your-Key-Here')
# result = dpla.search('cooking', fields=['sourceResource'], page_size = 50)
all_records = []

f = open('search_results.json', 'w')

# Define Functions
def pull_records(pages, end, size):
	while(pages <= end):
		paged_search = dpla.search('cooking',fields=['sourceResource'], page_size=size, page=pages)
		# print paged_search.items[2]
		save_each(paged_search)
		print 'finished page ' + str(pages)
		pages = pages + 1

def save_each(x):
	for each in x.items:
		all_records.append(each)

# Make Function Calls
# print result.items[1]
pull_records(2, 5, 50)
print all_records[150]

Here you are assigning the results of a function - open("search_results.json") - to the variable f. The "open" method both opens an existing file and creates a file if the file does not already exist on your computer. The "w" argument indicates that the file should be opened for "writing". One thing to note about "w" - "write" gives the program permission to overwrite the data inside the file, which is why we are opening the file once and writing the entire list at the end. If we were to write each item as we looped, we would end up with only the last item in the file. This code would write and overwrite each item as it went along. If you need to write inside a loop, you can open the file as "a" instead. This tells the computer to "append" the information to the end of the file, rather than overwrite the existing information.

Python has libraries for working with JSON, but because these are more specialized libraries, they are not automatically loaded into Python. Since we will be writing a JSON object, let's load that library into our file with import json:

1
2
3
# Load Libraries
from dpla.api import DPLA
import json

Now that we have a file to write to, let's create a third function named "save_results" to write our search results to the file. Place this function after the "save_each" function.

1
2
3
4
def save_results():
	data = json.dumps(all_records)
	f.write(data)
	f.close

Let's look at how this function works: first, we create a variable named "data" which contains the results of "json.dumps" which we passed the all_records list to. We have access to a new method (json.dumps) because we loaded the "json" library in the Python program. Next, it takes the file variable from the first part of the module and writes all of the information contained in "data" to it (f.write(data)). Lastly we clean up by terminating the program's ability to write to the file with f.close().

Calling the "Write" Function

To run the "save_results" function, call the function at the end of the file with:

1
save_results()

Our file should now look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# Load Libraries
from dpla.api import DPLA
import json

# Set Variables
dpla = DPLA('Your-Key-Here')
# result = dpla.search('cooking', fields=['sourceResource'], page_size = 50)
all_records = []
f = open('search_results.json', 'w')

# Define Functions
def pull_records(pages, end, size):
	while(pages <= end):
		paged_search = dpla.search('cooking',fields=['sourceResource'], page_size=size, page=pages)
		# print paged_search.items[2]
		save_each(paged_search)
		print 'finished page ' + str(pages)
		pages = pages + 1

def save_each(x):
	for each in x.items:
		all_records.append(each)

def save_results():
	data = json.dumps(all_records)
	f.write(data)
	f.close

# Make Function Calls
# print result.items[1]
pull_records(2, 5, 50)
print all_records[150]
save_results()

Test that everything is working by running your "my_first_script.py" in Terminal.

Open your new "search_results.json" file and check that 200 items made it in. Close the file once you're done.

Congratulations! You've written your first results file!

Saving all the Search Results

Finally, let's change the parameters we pass to the "pull_records" function to get all of the search results.

Remember, the first number we pass to "pull_records" corresponds to the first page of search results, the second number to the last page of search results, and the third number is the number of items per page. The DPLA will cap us at 500 items per page, so let's take "500" for our third variable. We also want to start with the first page, so "1" is our first variable.

To figure out the value we want for "end", we need to do a little math. If we have 10,909 items and can get 500 items a page, how many pages do we have to work through?

Update your "pull_records" line to:

1
pull_records(1, 22, 500)

Save and run your script.

Well done! You now have a local copy of all the data that we want to analyze! You have done a lot of work! Take a moment, refresh your brain with some coffee and sugar, and come back for the last leg of the tutorial!

Previous Module Next Module