Reed-Solomon codes and error correction continued
An interesting feature of RS codes is that if you already know the locations of the errors – that is, which bytes contain errors – more errors can be corrected than if you don’t know where the errors are. In effect, all the information provided by the check digits can be used in correcting the bytes, instead of some of it being needed to identify which bytes have errors. It might seem a strange idea that the location of the error could already be known, but this feature is exploited in some practical applications – one of which, Blu-ray discs, shall be explained in a little more detail shortly. This type of error, where the location is known, is called an erasure. Bytes that have errors where the location is not known in advance are just called symbol errors. (Confusingly, ‘symbol’ and ‘digit’ are used to mean the same thing in this context, and the symbols/digits are bytes.)
In general, in each block of n digits an (n, k) RS code can:
- correct up to symbol errors
- correct up to n − k erasures.
More generally, if there are both erasures and symbol errors to be corrected, the code can correct ν symbol errors and ρ erasures, where:
2ν + ρ ≤ n − k.
Activity 2.5 (self-assessment)
A popular base-256 (byte) RS code is the (255, 223) code.
- a.How many symbol errors can this code correct? How many bits is that?
- b.How many erasures can it correct? How many bits is that?
- c.What is the code rate and the redundancy of this code?
For this code, n = 255 and k = 223.
- a.The number of symbol errors it can correct is given by:
So the code can correct up to 16 symbol errors (16 bytes) in each block of 255 bytes. In terms of bits, it can correct up to 16 × 8 = 128 bits in each block of 255 × 8 = 2040 bits.
- b.The number of erasures the code can correct is given by:
n – k = 255 – 223 = 32.
So the code can correct up to 32 erasures (32 bytes) in each block of 255 bytes. In terms of bits, it can correct up to 32 × 8 = 256 bits in each block of 255 × 8 = 2040 bits.
- c.The code rate is 223/255 = 0.87 (to 2 s.f.). The redundancy (to 2 s.f.) is
which is 13%.
Activity 2.5 showed that RS codes can correct a large number of bits. These errored bits can all be adjacent in the received bit sequence, so they could form long bursts of errors. This ability to deal with long bursts of errors is a key feature of RS codes, which is the reason for their selection in many applications.
Blu-ray discs, for example, use RS codes in a way that not only exploits their inherently good burst-error protection, but also extends it through the way data is written in the disc. Two RS codes are used, both working with bytes (base 256) and shortened from the (255, 223) code described in Activity 2.5, and they encode different kinds of data. One code, referred to as the long-distance code (LDC), protects user data – that is, the content of the disc intended for its user. This uses a (248, 216) RS code. The other code, called the burst-indicating subcode (BIS), encodes information needed for addressing and control within the disc. This is a (62, 30) RS code.
User data on the disc is organised into 64 KB chunks or ‘clusters’ (see box below). After LDC encoding, the clusters are interleaved. Interleaving is a common way of combatting bursts of noise that could affect several consecutive data units (such as clusters in the present context, or frames in other contexts). Prior to transmission, the order of the clusters is shuffled. When the clusters are restored to their correct order by the receiver, any noise bursts should have their effect dispersed, so that affected clusters are distributed among unaffected clusters, rather than being consecutive. So, if there is a mark on the disc, instead of possibly obliterating an entire coded block of data, it should cause lesser damage to a number of blocks. This by itself increases the burst lengths that can be corrected, but the BIS helps further.
Working with G, M, K in memory sizes
The 64 KB clusters on Blu-ray discs contain 65 536 bytes, because in memory sizes the multipliers K, M and G mean the powers of 2 that fall closest to 103, 106 and 109 respectively. Thus:
K is 210 = 1024 (i.e. close to 103 = 1000)
M is 220 = 1048 576 (i.e. close to 106 = 1000 000)
G is 230 = 1073 741 824 (i.e. close to 109 = 1000 000 000).
Notice the relationship between K, M and G: each is 210 = 1024 times the previous one. So if you have a number expressed using the multiplier M, say 2 Mbits, you can express this using the multiplier K by multiplying by 1024. Thus 2 Mbits = 2048 Kbits. Similarly, you can multiply a number expressed using the multiplier G by 1024 in order to express it using the multiplier M. Loosely, we might say G = 1024M and M = 1024K, and G = M × K.
Note that K has a different meaning from k, which always means 103 rather than 210.
The burst-indicating subcode, as its name suggests, is involved in detecting error bursts. The BIS data is recorded on the disc at frequent and regular intervals, so that there is a short length of encoded user data (38 bytes) between single bytes of BIS data. This can be seen in Figure 2.2, which shows the structure of a 64 KB ECC (error-correcting code) cluster, including the so-called ‘picket codes’ of the BIS data. (This particular cluster has 496 rows and 155 columns. ‘Rows’ and ‘columns’ here refer to the arrangement of bytes on a Blu-ray disc, and are not directly connected with the RS codes.)
If it is found that two or more consecutive bytes of BIS data have been corrupted, there is a fair chance that the LDC data between them will have been corrupted as well. This information is passed to the LDC decoder, which now treats the bytes in question as erasures, meaning it can correct twice as many of them compared to what it could have done if it had not known they were at fault.