Skip to content
Skip to main content

About this free course

Download this course

Share this free course

Learn to code for data analysis
Learn to code for data analysis

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

1.2 Removing extra characters

If you opened London_2014.csv in a text editor once again and looked at the last column name you would see that the name is'WindDirDegrees '.

What has happened here is that when the dataset was exported from the Weather Underground website an html line break ( ) was added after the line of column headers which read_csv() has interpreted as the end part of the final column’s name.

An image of two bouncers in suits standing in a corridor
Figure 4

In fact, the problem is worse than this, let’s look at some values in the final column:

In []:

london[['WindDirDegrees ']].head()

Out[]:

WindDirDegrees
0186
1214
2219
3211
4199

It’s seems there is an html line break at the end of each line. If I opened ‘London_2014.csv’ in a text editor and looked at the ends of all lines in the file this would be confirmed.

Once again I’m not going to edit the CSV file but rather fix the problem in the dataframe. To change 'WindDirDegrees ' to 'WindDirDegrees' all I have to do is use the rename() method as follows:

In []:

london = london.rename(columns={'WindDirDegrees ':'WindDirDegrees'})

Don’t worry about the syntax of the argument for rename() , just use this example as a template for whenever you need to change the name of a column.

Now I need to get rid of those pesky html line breaks from the ends of the values in the 'WindDirDegrees' column, so that they become something sensible. I can do that using the string method rstrip() which is used to remove characters from the end or ‘rear’ of a string, just like this:

In []:

london['WindDirDegrees'] = london['WindDirDegrees'].str.rstrip(' ')

Again don’t worry too much about the syntax of the code and simply use it as a template for whenever you need to process a whole column of values stripping characters from the end of each string value.

Let’s display the first few rows of the ' WindDirDegrees ' to confirm the changes:

In []:

london[['WindDirDegrees']].head()

Out[]:

WindDirDegrees
0186
1214
2219
3211
4199