1.1 Removing rogue spaces

One of the problems often encountered with CSV files is rogue spaces before or after data values or column names.

An image of empty, numbered parking spaces
Figure 3

You learned earlier, in What is a CSV file? , that each value or column name is separated by a comma. However, if you opened ‘London_2014.csv’ in a text editor, you would see that in the row of column names sometimes there are spaces after a comma:

GMT,Max TemperatureC,Mean TemperatureC,Min TemperatureC,Dew PointC,MeanDew PointC,Min DewpointC,Max Humidity, Mean Humidity, Min Humidity, Max Sea Level PressurehPa, Mean Sea Level PressurehPa, Min Sea Level PressurehPa, Max VisibilityKm, Mean VisibilityKm, Min VisibilitykM, Max Wind SpeedKm/h, Mean Wind SpeedKm/h, Max Gust SpeedKm/h,Precipitationmm, CloudCover, Events,WindDirDegrees

For example, there is a space after the comma between and . This means that when reads the row of column names it will interpret a space after a comma as part of the next column name. So, for example, the column name after will be interpreted as rather than what was intended, which is . The ramification of this is that code such as:

london[['Mean Humidity']]

will cause a key error (see Selecting a column ), as the column name is confusingly '.

This can easily be rectified by adding another argument to the function:

skipinitialspace=True

which will tell to ignore any spaces after a comma:

london = read_csv('London_2014.csv', skipinitialspace=True)

The rogue spaces will no longer be in the dataframe and we can write code such as:

london[['Mean Humidity']].head()

Mean Humidity
086
181
276
385
488

Note that a argument won’t remove a trailing space at the end of a column name.

Next, find out about extra characters and how to remove them.