4.1 Week 4 glossary
Here is an alphabetical list of the terms introduced this week, for quick look-up.
Programming and data analysis concepts
The bitwise operators & (and) and | (or) are used in pandas to build more complicated expressions from two comparison expressions (typically involving column comparisons).
A Boolean has one of two possible values: True or False .
A Comma Separated Values (CSV) file is a plain text file that is used to hold tabular data.
A list is a sequence of values, separated by commas, and written within square brackets.
There are six comparison operators that can be used to compare number, string and date values. Expressions composed of these operators evaluate to True or False . These operators can also be used to compare every value in a column, row by row, against some number, string or date value. When used in this manner the operators return a series of Boolean values.
The ‘dot’ notation is used to access a dataframe’s methods and attributes.
The Series data type is a collection of values with an integer index that starts from zero. Each column in a dataframe is an example of the Series data type. The Series data type has many of the same methods as the DataFrame data type.
The object data type is how pandas represents strings.
The datetime64 data type is how pandas represents dates.
The int64 data type is how pandas represents integers (whole numbers).
The float64 data type is how pandas represents floating point numbers (decimals).
Functions and methods
asType(aType) when applied to a dataframe column, the method changes the data type of each value in that column to the type given by the string aType .
datetime(yyyy, mm, dd) the function takes three arguments, yyyy a four digit integer representing a year, mm a two digit integer representing a month and dd a two digit integer representing a day. From these arguments the function creates and returns a value of datetime64 .
dropna() when applied to a dataframe returns a new dataframe without the rows that have at least one missing value.
head() gets and displays the first five rows of a dataframe. Optionally the method can take an integer argument to specify how many rows (from and including row 0) to get and display.
iloc[index] gets and displays the row in the dataframe indicated by the integer argument index .
isnull() is a series method that checks which rows in that series have a missing value.
fillna(value) is a series method that returns a new series in which all missing values have been filled with the given value.
plot() when applied to a dataframe column of numeric values, the method displays a graph of those values. The x-axis shows the dataframe’s index and the y-axis the range of the column’s values. Before the method is called you first need to execute %matplotlib inline .
read_csv(csvFile) creates a dataframe from the dataset in the CSV file.
rename(columns={oldName : newName}) renames the column oldName to newName .
str.rstrip(suffix) when applied to a dataframe column of string values, the method removes the argument suffix from the end of each string value in the column.
tail() gets and displays the last five rows of a dataframe. Optionally the method can take an integer argument to specify how many rows (until and including the last row) to get and display.
to_datetime(aSeries) when applied to a series, typically a column from a dataframe, this function returns a new series in which each value in aSeries has been changed to type datetime64 .