Learn to code for data analysis
Learn to code for data analysis

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

Free course

Learn to code for data analysis

1.2 Removing extra characters

If you opened London_2014.csv in a text editor once again and looked at the last column name you would see that the name is'WindDirDegrees '.

What has happened here is that when the dataset was exported from the Weather Underground website an html line break ( ) was added after the line of column headers which read_csv() has interpreted as the end part of the final column’s name.

In fact, the problem is worse than this, let’s look at some values in the final column:

In []:

london[['WindDirDegrees ']].head()

Out[]:

WindDirDegrees
0186
1214
2219
3211
4199

It’s seems there is an html line break at the end of each line. If I opened ‘London_2014.csv’ in a text editor and looked at the ends of all lines in the file this would be confirmed.

Once again I’m not going to edit the CSV file but rather fix the problem in the dataframe. To change 'WindDirDegrees ' to 'WindDirDegrees' all I have to do is use the rename() method as follows:

In []:

london = london.rename(columns={'WindDirDegrees ':'WindDirDegrees'})

Don’t worry about the syntax of the argument for rename() , just use this example as a template for whenever you need to change the name of a column.

Now I need to get rid of those pesky html line breaks from the ends of the values in the 'WindDirDegrees' column, so that they become something sensible. I can do that using the string method rstrip() which is used to remove characters from the end or ‘rear’ of a string, just like this:

In []:

london['WindDirDegrees'] = london['WindDirDegrees'].str.rstrip(' ')

Again don’t worry too much about the syntax of the code and simply use it as a template for whenever you need to process a whole column of values stripping characters from the end of each string value.

Let’s display the first few rows of the ' WindDirDegrees ' to confirm the changes:

In []:

london[['WindDirDegrees']].head()

Out[]:

WindDirDegrees
0186
1214
2219
3211
4199
LCDAB_1

Take your learning further371

Making the decision to study can be a big step, which is why you'll want a trusted University. The Open University has 50 years’ experience delivering flexible learning and 170,000 students are studying with us right now. Take a look at all Open University courses372.

If you are new to university level study, we offer two introductory routes to our qualifications. Find out Where to take your learning next?373 You could either choose to start with an Access courses374or an open box module, which allows you to count your previous learning towards an Open University qualification.

Not ready for University study then browse over 1000 free courses on OpenLearn375 and sign up to our newsletter376 to hear about new free courses as they are released.

Every year, thousands of students decide to study with The Open University. With over 120 qualifications, we’ve got the right course for you.

Request an Open University prospectus371