4 The statistics reports

4.1 Test statistics

Very important

These statistics are designed for use with summative iCMAs where students have just one attempt and complete that attempt

Average grade:

For discriminating, deferred feedback, tests aim for between 50% and 75%. Values outside these limits need thinking about. Interactive tests with multiple tries invariably lead to higher averages

Median grade:

Half the students score less than this figure.

Standard deviation:

A measure of the spread of scores about the mean. Aim for values between 12% and 18%. A smaller value suggests that scores are too bunched up.


A measure of the asymmetry of the distribution of scores. Zero implies a perfectly symmetrical distribution, positive values a ‘tail’ to the right and negative values a ‘tail’ to the left.


Aim for a value of -1.0. If it is too negative, it may indicate lack of discrimination between students who do better than average. Similarly, a large positive value (greater than 1.0) may indicate a lack of discrimination near the pass fail border.


Kurtosis is a measure of the flatness of the distribution. A normal, bell shaped, distribution has a kurtosis of zero. The greater the kurtosis, the more peaked is the distribution, without much of a tail on either side. Aim for a value in the range 0-1. A value greater than 1 may indicate that the test is not discriminating very well between very good or very bad students and those who are average.

Coefficient of internal consistency (CIC):

It is impossible to get internal consistency much above 90%. Anything above 75% is satisfactory. If the value is below 64%, the test as a whole is unsatisfactory and remedial measures should be considered.

A low value indicates either that some of the questions are not very good at discriminating between students of different ability and hence that the differences between total scores owe a good deal to chance or that some of the questions are testing a different quality from the rest and that these two qualities do not correlate well – i.e. the test as a whole is inhomogeneous.

Error ratio (ER):

This is related to CIC according to the following table: it estimates the percentage of the standard deviation which is due to chance effects rather than to genuine differences of ability between students. Values of ER in excess of 50% cannot be regarded as satisfactory: they imply that less than half the standard deviation is due to differences in ability and the rest to chance effects.

4.1 CIC and ER

Standard error (SE):

This is SD x ER/100. It estimates how much of the SD is due to chance effects and is a measure of the uncertainty in any given student’s score. If the same student took an equivalent iCMA, his or her score could be expected to lie within ±SE of the previous score. The smaller the value of SE the better the iCMA, but it is difficult to get it below 5% or 6%. A value of 8% corresponds to half a grade difference on the University Scale – if the SE exceeds this, it is likely that a substantial proportion of the students will be wrongly graded in the sense that the grades awarded do not accurately indicate their true abilities.

3 The responses report

4.2 Question statistics