DH Bridge

Encouraging Computational Thinking and Digital Skills in the Humanities

Looping through the Pages

In this module, we will learn to:

  1. use a "while loop"
  2. create a "counter"

We're getting there! We have a function to gather all of the search results on any given page. We now need a way to work systematically through all of the pages available.

To do that, we will add another loop to our "pull_records" function that allows us to move through more than one page of search results.

1. Introducing the while Loop

The "for loop" allows us to do something to each item in a list. The "while loop" is a powerful tool that tells the computer to continue doing something as long as some criteria is true. We can use the "while loop" and a "counter" to work through all of the pages of search results.

To use a while loop, let's look again at our "pull_records" function.

1
2
3
4
def pull_records(pages, end, size):
	paged_search = dpla.search('cooking',fields=['sourceResource'], page_size=size, page=pages)
	# print paged_search.items[2]
	save_each(paged_search)

Remember, "pages" is a variable for the first page and "end" is a variable for the last page of search results we want. We want the pull_records function to retrieve every page of search results. In other words, if the page number is less than or equal to the total number of pages available, we want to get the search results from that page. Once we hit the end, we want to stop.

To write this logic into our existing function code, we will add:

1
while(pages <= end):

so that our function now looks like this:

1
2
3
4
5
def pull_records(pages, end, size):
	while(pages <= end):
		paged_search = dpla.search('cooking',fields=['sourceResource'], page_size=size, page=pages)
		# print paged_search.items[2]
		save_each(paged_search)

Can you see the problem with our current function? As it currently is written, "pages" is always less than "end" because it never increases. This means we would get stuck in an "infinite loop" if we tried to run the code right now — don't run the code at this point! If you did, that's OK, just stop the execution of the program with by hitting ctrl + c on the keyboard.

To avoid an "infinite loop", we need to increase the value of "pages" each time we work through the loop. We can do this by overwriting the value of "pages" to be "pages + 1".

Add "pages = pages +1" after save_each(paged_search) like this:

1
2
3
4
5
6
def pull_records(pages, end, size):
		while(pages <= end):
			paged_search = dpla.search('cooking',fields=['sourceResource'], page_size=size, page=pages)
			# print paged_search.items[2]					
			save_each(paged_search)
			pages = pages + 1

Let's also add a print command to check that things are working as we expect. Above pages = pages + 1 add:

1
print "finished page " + str(pages)

Our file should now look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Load Libraries
from dpla.api import DPLA

# Set Variables
dpla = DPLA('Your-Key-Here')
# result = dpla.search('cooking', fields=['sourceResource'], page_size = 50)
all_records = []

# Define Functions
def pull_records(pages, end, size):
	while(pages <= end):
		paged_search = dpla.search('cooking',fields=['sourceResource'], page_size=size, page=pages)
		# print paged_search.items[2]
		save_each(paged_search)
		print "finished page " + str(pages)
		pages = pages + 1

def save_each(x):
	for each in x.items:
		all_records.append(each)

# Make Function Calls
# print result.items[1]
pull_records(2, 3, 50)
print all_records[30]

Let's test our function on a subset of the pages. Change pull_records(2, 3, 50) to pull_records(2, 5, 50) and change print all_records[30] to print all_records[150]

Save and run in Terminal.

Group Challenge:

Go back to your pen and paper and diagram a while loop. How does that diagram compare to the one you drew for the for loop?

Previous Module Next Module