DH Bridge

Encouraging Computational Thinking and Digital Skills in the Humanities

Getting Data

In this module, you will learn to:

  1. install new libraries using pip
  2. use the Python Interactive Interpreter to run basic Python commands
  3. load the DPyLA library and make an API request

So far, we have looked at the data in our browser and have a sense of the scope of the information we can work with and the questions we might want to ask.

In this module, we will use pip to install the DPyLA Python library, which will make it easy to use the DPLA API on our machines. We will also begin writing our first lines of Python to interact with that data using the Python Interactive Interpreter.

1. Install DPyLA Library

We talked earlier about libraries, those bundles of code that we can load and use in creating our own scripts. Just like in cooking, it is possible to make everything from scratch. However, this is often not necessary, nor is it the most efficient way to get the same results.

The DPyLA library makes it easy to work with the DPLA API.

We can easily get this library for use with our own code using pip.

Type pip install dpla.

Look at the last lines in the Terminal to make sure the command worked. If you see something like 'error: could not create' or 'failed with error code' then the command didn't finish properly. If it says 'Successfully installed dpla requests' then you're good to go.

If you get a permissions error and you're a Mac user, type sudo pip install dpla and enter your password. If you get a permissions error and you're a Windows user, type Start-Process powershell -Verb runAs to open Powershell as an administrator, then type pip install dpla.

You have just installed your first Python library.

2. Introducing the Python Interactive Interpreter

Python is a language we can use to write and execute scripts. But it also comes with a handy feature called the Interactive Interpreter. Using the Interactive Interpreter, we will first experiment with lines of Python and see what the language can do.

To start up the Interactive Interpreter, go to your Terminal window and type python and press "Enter".

Your Terminal should now look like this:

1
2
3
4
5
jeriwieringa$ python
Python 2.7.5 (default, Mar  9 2014, 22:15:05)
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

Everything you type now will be processed by Python. Let's try it out a bit. Type print "Hello World" and press "Enter". Now type print "My Name is [Your Name]" and press "Enter". Now type print 5 + 8 and press "Enter".

Here you have discovered a couple of very useful things about programming languages.

"print" is a function in Python that returns whatever follows it next.

The content contained inside the " " marks is called a string. This is especially important for humanities research, because often when we work with texts, or metadata, we are working with strings.

Finally, if we input numbers or equations into Python, it will compute them. If you don't want Python to compute, you should treat the numbers like a string. Experiment in the Interactive Interpreter to see if you can get it to display 5 + 8 rather than 13.

3. Loading DPyLA in the Interactive Interpreter and Make an API Request

Let's start using our new DPyLA library to make an API call to the DPLA. Following the library documentation, run:

1
from dpla.api import DPLA 

You just loaded the DPyLA library into your Python Interpreter, making all of its code available for you to use.

Next run:

1
dpla = DPLA('your-key-here')

What you have just done is create what is called a "variable"– dpla –that stores your API key.

Next run:

1
result = dpla.search('cooking')

You have just created a new variable, "result", that stores the result of a DPLA search for "cooking". Now, you might be wondering what happened to your API key. This is part of the magic of using a library. When you save your key as "dpla", you make it known to the library. Then, when you call "dpla.search", you're combining your key with code in the DPyLA library that then executes the search and saves all of the data in a new variable called "result".

But we still haven't seen any of the results from the API. To show the results, run:

1
result.items

You should see something like this:

1
[{u'_id': u'digitalnc--urn:brevard.lib.unc.eduecu_c5:oai:digital.lib.ecu.edu/7394', u'admin': {u'sourceResource': {u'title': u'Cooking'}, u'validation_message': None, u'valid_after_enrich': True}, u'sourceResource': {u'isPartOf': [u'https://digital.lib.ecu.edu/encore/ncgre000/00000008/00007394/00007394_tn_0001.gif'], u'description': [u'Boys and girls cooking during home economics class. Dates from negative sleeve.'], u'language': [{u'iso639_3': u'eng', u'name': u'English'}], u'rights': u'Copyright held by Joyner Library. Permission to reuse this work is granted for all non-commercial purposes.', u'@id': u'http://dp.la/api/items/7cb32765b538a57a35fbdbfad03be57b#sourceResource', u'format': u'negatives (photographic)'

You just asked the computer to give you all of the items it got back from the search for "cooking".

Notice the structure of this data. Can you find the key:value pairs?

You might notice the " u' " in front of each string. This is a feature of Python 2. The u' indicates that what follows is a unicode string. Don't worry too much about what that means - it is information about how the data is being encoded - but if you want to really geek out, you can read all about encoding in the language documentation.

But what if you only want to see one item at a time?

The information you get back from the DPLA API comes in the form of an array, or list ("array" is the general programming word for a series of things, "list" is Python's name for a series of things). Lists are super powerful, enabling us to load up a lot of information and then work through each bit individually.

One general thing we can do with lists is get items according to their position in the list.

For example, if you want the first item only:

1
result.items[0]

You should now see a much smaller dump of data. You might be wondering why we used 0. Programming languages start counting with 0 rather than one.

Try to get the 3rd item in the search results.

Group Challenge

Can you use this tutorial to figure out how to get items 1 - 3?

Previous Module Next Module