1.4 Getting data from the Comtrade API
Hopefully, you have a few ideas about data you’d like to explore from the Comtrade database.
In the previous section, I managed to identify a set of data that describes the amount of unset diamonds (commodity code 7102) imported into the UK from the Russian Federation, Angola and South Africa in 2013 and 2014.
You can export the data you have selected as a CSV file that will be downloaded to your own computer by clicking on the Download CSV button. You may find it useful to change the filename of the downloaded file to something more meaningful than the comtrade.csv default name.
If you moved the downloaded CSV file into the same folder as your Exercise notebook 4 (that you’ll download later), you could use the following command to load the data into a pandas dataframe:
In []:
filename='comtrade.csv
' df=read_csv(filename, dtype={'Commodity Code':str, 'Reporter Code':str })
The ‘Commodity Code’ and ‘Reporter Code’ values are explicitly read in as a string (str) otherwise codes like 0401 will be returned as the number 401.
One of the problems of working with real data like this is that it may not be just the data you want. The data returned from Comtrade includes several columns that are essentially surplus to requirements for the reports you will produce. I suggest that you clean the dataframes so that they contain at most the following key columns: ‘Year’, ‘Period’, ‘Trade Flow’, ‘Reporter’, ‘Partner’, ‘Commodity’, ‘Commodity Code’, ‘Trade Value (US$)’.
In []:
COLUMNS = ['Year', 'Period','Trade Flow','Reporter','Partner', 'Commodity','Commodity Code','Trade Value (US$)']
df=df[COLUMNS]
To avoid conflating data relating to all countries (the ‘World’ partner), and each separate country, create separate dataframes for each, using the comparison operators introduced in Week 3.
In []:
world = df[df['Partner'] == 'World']
countries = df[df['Partner'] != 'World']
A More Direct Way of Getting the Data
Just as there was a method for downloading data directly from the World Bank, there is also a more direct way of getting the Comtrade data into a dataframe – directly from the Comtrade website. You might have noticed that when you downloaded the file from the Comtrade website, a link appeared on the site labelled ‘View API Call’.
An API is an ‘application programming interface’ that provides a means for one computer to talk to another ‘in machine terms’. When you extracted data from the World Bank, you were calling the World Bank API using a set of functions provided by the pandas library. Behind the scenes, these functions create URLs (that is, web addresses) that call the World Bank API and allow requests to be made directly from it, putting the response into a pandas dataframe.
In the case of Comtrade, clicking the View API Link reveals a URL that requests the data you selected in the search form as a data file, though not, by default, as a CSV data file.
This link can be used to download data directly into a pandas dataframe from Comtrade, although you will need to make a couple of modifications to the URL first. In particular, change the max value to 5000 (to increase the amount of data returned by each request) and add &fmt=csv to the end of the URL to ensure that the data is returned in a CSV format.
For example, if you copied the URL:
http://comtrade.un.org/api/get?max=500&type=C&freq=M&px=HS&ps=2015&r=826&p=all&rg=1%2C2&cc=0401%2C0402
you would need to modify it as follows:
http://comtrade.un.org/api/get?max= 5000 &type=C&freq=M&px=HS&ps=2015&r=826&p=all&rg=1%2C2&cc=0401%2C0402 &fmt=csv
You can then load the data in using the panda read_csv() function.
Note that if you are using the CoCalc free plan, you will not be able to download data directly from the Comtrade API into a pandas dataframe.
Set the datatypes as shown using the dtype argument to ensure that the codes are read in correctly.
In []:
URL='http://comtrade.un.org/api/get?max=5000&type=C&freq=A&px=HS&ps=2014%2C2013%2C2012&r=826&p=all&rg=all&cc
=0401%2C0402&fmt=csv'
df=read_csv(URL, dtype={'Commodity Code':str, 'Reporter Code':str})
Having downloaded the data, you should then separate out the World data as before.
If you want to save a copy of data downloaded into pandas directly from the Comtrade API, call the to_csv() method on your dataframe, pasting in the filename you want to save the file under, and setting index=False so that the dataframe’s automatically introduced index column is not included. For example:
countries.to_csv('saved_country_data_example.csv', index=False)
The file will be saved in the same folder as the notebook.