1.2 Removing extra characters
If you opened London_2014.csv in a text editor once again and looked at the last column name you would see that the name is'WindDirDegrees '.
What has happened here is that when the dataset was exported from the Weather Underground website an html line break ( ) was added after the line of column headers which read_csv() has interpreted as the end part of the final column’s name.
In fact, the problem is worse than this, let’s look at some values in the final column:
In []:
london[['WindDirDegrees ']].head()
Out[]:
WindDirDegrees | |
---|---|
0 | 186 |
1 | 214 |
2 | 219 |
3 | 211 |
4 | 199 |
It’s seems there is an html line break at the end of each line. If I opened ‘London_2014.csv’ in a text editor and looked at the ends of all lines in the file this would be confirmed.
Once again I’m not going to edit the CSV file but rather fix the problem in the dataframe. To change 'WindDirDegrees ' to 'WindDirDegrees' all I have to do is use the rename() method as follows:
In []:
london = london.rename(columns={'WindDirDegrees ':'WindDirDegrees'})
Don’t worry about the syntax of the argument for rename() , just use this example as a template for whenever you need to change the name of a column.
Now I need to get rid of those pesky html line breaks from the ends of the values in the 'WindDirDegrees' column, so that they become something sensible. I can do that using the string method rstrip() which is used to remove characters from the end or ‘rear’ of a string, just like this:
In []:
london['WindDirDegrees'] = london['WindDirDegrees'].str.rstrip(' ')
Again don’t worry too much about the syntax of the code and simply use it as a template for whenever you need to process a whole column of values stripping characters from the end of each string value.
Let’s display the first few rows of the ' WindDirDegrees ' to confirm the changes:
In []:
london[['WindDirDegrees']].head()
Out[]:
WindDirDegrees | |
---|---|
0 | 186 |
1 | 214 |
2 | 219 |
3 | 211 |
4 | 199 |