Learn to code for data analysis
Learn to code for data analysis

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

Free course

Learn to code for data analysis

1.2 Loading the data

Many applications can read files in Excel format, and pandas can too. Asking the computer to read the data looks like this:

In []:

data = read_excel('WHO POP TB some.xls')



CountryPopulation (1000s)TB deaths
3Equatorial Guinea75767
8Russian Federation14283417000
9Sao Tome and Principe19318
10South Africa5277625000

The variable name data is not descriptive, but as there is only one dataset in our analysis, there is no possible confusion with other data, and short names help to keep the lines of code short.

The function read_excel() takes a file name as an argument and returns the table contained in the file. In pandas, tables are called dataframes . To load the data , I simply call the function and store the returned dataframe in a variable.

A file name must be given as a string , a piece of text surrounded by quotes. The quote marks tell Python that this isn’t a variable, function or module name. Also, the quote marks state that this is a single name, even if it contains spaces, punctuation and other characters besides letters.

Misspelling the file name, or not having the file in the same folder as the notebook containing the code, results in a file not found error. In the example below there is an error in the file name.

In []:

data = read_excel('WHO POP TB same.xls')



FileNotFoundError Traceback (most recent call last)

<ipython-input-25-c017b2500afa> in <module>()

----> 1 data = read_excel(‘WHO POP TB same.xls’)

2 data

/Users/mw4687/anaconda/lib/python3.4/site-packages/pandas/io/excel.py in read_excel(io, sheetname, **kwds)

130 engine = kwds.pop(‘engine’, None)


--> 132 return ExcelFile(io, engine=engine).parse(sheetname=sheetname, **kwds)



/Users/mw4687/anaconda/lib/python3.4/site-packages/pandas/io/excel.py in __init__(self, io, **kwds)

167 self.book = xlrd.open_workbook(file_contents=data)

168 else:

--> 169 self.book = xlrd.open_workbook(io)

170 elif engine == ‘xlrd’ and isinstance(io, xlrd.Book):

171 self.book = io

/Users/mw4687/anaconda/lib/python3.4/site-packages/xlrd/__init__.py in open_workbook(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows)

392 peek = file_contents[:peeksz]

393 else:

--> 394 f = open(filename, "rb")

395 peek = f.read(peeksz)

396 f.close()

FileNotFoundError: [Errno 2] No such file or directory: ‘WHO POP TB same.xls’

Jupyter notebooks show strings in red. If you see red characters until the end of the line, you have forgotten to type the second quote that marks the end of the string.

In the next section, find out how to select a column.


Take your learning further

Making the decision to study can be a big step, which is why you'll want a trusted University. The Open University has 50 years’ experience delivering flexible learning and 170,000 students are studying with us right now. Take a look at all Open University courses.

If you are new to University-level study, we offer two introductory routes to our qualifications. You could either choose to start with an Access module, or a module which allows you to count your previous learning towards an Open University qualification. Read our guide on Where to take your learning next for more information.

Not ready for formal University study? Then browse over 1000 free courses on OpenLearn and sign up to our newsletter to hear about new free courses as they are released.

Every year, thousands of students decide to study with The Open University. With over 120 qualifications, we’ve got the right course for you.

Request an Open University prospectus371