Learn to code for data analysis
Learn to code for data analysis

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

Free course

Learn to code for data analysis

1.1 Removing rogue spaces

One of the problems often encountered with CSV files is rogue spaces before or after data values or column names.

You learned earlier, in What is a CSV file? , that each value or column name is separated by a comma. However, if you opened ‘London_2014.csv’ in a text editor, you would see that in the row of column names sometimes there are spaces after a comma:

GMT,Max TemperatureC,Mean TemperatureC,Min TemperatureC,Dew PointC,MeanDew PointC,Min DewpointC,Max Humidity, Mean Humidity, Min Humidity, Max Sea Level PressurehPa, Mean Sea Level PressurehPa, Min Sea Level PressurehPa, Max VisibilityKm, Mean VisibilityKm, Min VisibilitykM, Max Wind SpeedKm/h, Mean Wind SpeedKm/h, Max Gust SpeedKm/h,Precipitationmm, CloudCover, Events,WindDirDegrees

For example, there is a space after the comma between Max Humidity and Mean Humidity . This means that when read_csv() reads the row of column names it will interpret a space after a comma as part of the next column name. So, for example, the column name after 'Max Humidity' will be interpreted as ' Mean Humidity' rather than what was intended, which is 'Mean Humidity' . The ramification of this is that code such as:

london[['Mean Humidity']]

will cause a key error (see Selecting a column [Tip: hold Ctrl and click a link to open it in a new tab. (Hide tip)]   ), as the column name is confusingly ' Mean Humidity '.

This can easily be rectified by adding another argument to the read_csv() function:


which will tell read_csv() to ignore any spaces after a comma:

In []:

london = read_csv('London_2014.csv', skipinitialspace=True)

The rogue spaces will no longer be in the dataframe and we can write code such as:

In []:

london[['Mean Humidity']].head()


Mean Humidity

Note that a skipinitialspace=True argument won’t remove a trailing space at the end of a column name.

Next, find out about extra characters and how to remove them.


Take your learning further

Making the decision to study can be a big step, which is why you'll want a trusted University. The Open University has 50 years’ experience delivering flexible learning and 170,000 students are studying with us right now. Take a look at all Open University courses.

If you are new to University-level study, we offer two introductory routes to our qualifications. You could either choose to start with an Access module, or a module which allows you to count your previous learning towards an Open University qualification. Read our guide on Where to take your learning next for more information.

Not ready for formal University study? Then browse over 1000 free courses on OpenLearn and sign up to our newsletter to hear about new free courses as they are released.

Every year, thousands of students decide to study with The Open University. With over 120 qualifications, we’ve got the right course for you.

Request an Open University prospectus371