1.1 Removing rogue spaces
One of the problems often encountered with CSV files is rogue spaces before or after data values or column names.

You learned earlier, in What is a CSV file? , that each value or column name is separated by a comma. However, if you opened ‘London_2014.csv’ in a text editor, you would see that in the row of column names sometimes there are spaces after a comma:
GMT,Max TemperatureC,Mean TemperatureC,Min TemperatureC,Dew PointC,MeanDew PointC,Min DewpointC,Max Humidity, Mean Humidity, Min Humidity, Max Sea Level PressurehPa, Mean Sea Level PressurehPa, Min Sea Level PressurehPa, Max VisibilityKm, Mean VisibilityKm, Min VisibilitykM, Max Wind SpeedKm/h, Mean Wind SpeedKm/h, Max Gust SpeedKm/h,Precipitationmm, CloudCover, Events,WindDirDegrees
For example, there is a space after the comma between and . This means that when reads the row of column names it will interpret a space after a comma as part of the next column name. So, for example, the column name after will be interpreted as rather than what was intended, which is . The ramification of this is that code such as:
london[['Mean Humidity']]
will cause a key error (see Selecting a column ), as the column name is confusingly '.
This can easily be rectified by adding another argument to the function:
skipinitialspace=True
which will tell to ignore any spaces after a comma:
london = read_csv('London_2014.csv', skipinitialspace=True)
The rogue spaces will no longer be in the dataframe and we can write code such as:
london[['Mean Humidity']].head()
| Mean Humidity | |
|---|---|
| 0 | 86 |
| 1 | 81 |
| 2 | 76 |
| 3 | 85 |
| 4 | 88 |
Note that a argument won’t remove a trailing space at the end of a column name.
Next, find out about extra characters and how to remove them.
OpenLearn - Introduction and guidance
Except for third party materials and otherwise, this content is made available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 Licence, full copyright detail can be found in the acknowledgements section. Please see full copyright statement for details.
