BBCeng.info  
Recollections of BBC engineering from 1922 to 1997
The British Broadcasting Corporation
web site is:
www.bbc.co.uk

An Introduction to Broadcast Technology - Digital Systems

2.1        Digital audio coding

2.1.1      Outline of PCM

2.1.2      Companding (Compression And Expansion)

2.1.3      Near-Instantaneous (NI) Companding

2.1.4      NICAM 3

2.1.5      NICAM 728

2.1.6      Two Channel Sound-in-Syncs

2.1.7      MUSICAM

2.2        Digital video coding

2.2.1      Linear Video Coders

2.2.2      Video coders with bitrate reduction

2.2.3      MPEG

2.2.4      MPEG Video Bitrate reduction

2.3        Digital multiplexing systems

2.4        Digital transmission systems

2.4.1      Transmission on Optical Fibres

2.4.2      Transmission on Copper Cables

2.4.3      Digital Cable Broadcasting

2.4.4      Digital Satellite Broadcasting

2.4.5      Digital Terrestrial Broadcasting

2.5        Digital systems ancillary to analoge services

2.5.1      Teletext

2.5.2      Radio Data System (RDS)

2.1         Digital audio coding

2.1.1        Outline of PCM

Pulse Code Modulation is the most common method used to convert an analogue signal into digital form. 

Fig 22   Conversion from an analogue to a digital signal

The instantaneous voltage is measured or sampled at a rate which is at least twice its maximum frequency.  (Broadcast sound normally has a maximum frequency of 15KHz and the sampling frequency is usually 32 KHz).  The voltage of each sample is then expressed as a number.  In practical systems there is a limited range of numbers to choose from so the nearest possible number is used, this process is called quantization.  The number is usually expressed in binary form by a combination of high and low voltages.

For example, familiar decimal numbers can be converted to binary as follows:-

Decimal     Binary
0                 000
1                 001
2                 010
3                 011
4                 100
5                 101
6                 110
7                 111

Taking the measurements shown in Fig 25, (3, 5 and 6), they can be represented as 011, 101 and 110 or, by stringing them all together 011101110.  By using, say, +5 volts for "1" and -5 volts for "0", this series of numbers can be turned into a digital signal or bitstream consisting of nine bits:-

Fig 23   A digital signal representing the numbers 3, 5 and 6

Once this conversion has taken place the information can be transmitted over great distances.  The noise and distortion produced by radio links etc. has no effect at the receiving end, provided that each binary digit can be correctly identified as being above or below a threshold level.  The diagram below illustrates this by showing noise and distortion.

Fig 24   Data being recovered from a noisy and distorted digital signal, without error

After recovering the data, the original signal can be reconstituted in analogue form by a reversal of the coding process.

In reality each measurement of an audio signal needs at least 14 bits, not 3 as shown above and the measurements need to be taken 32,000 times a second.

2.1.2        Companding (Compression And Expansion)

The system described above uses linear quantization which means that all quantizing levels are evenly spaced in voltage terms.  This is the simplest arrangement for PCM but it is fairly inefficient as a relatively large number of bits are needed to define each sample.  Within individual pieces of equipment or even within studio centres this is not usually a problem, but when transmitting over a long distance better efficiency is desirable in order to make economic use of the channel capacity that is available.

To achieve better efficiency it is necessary to exploit some form of redundancy in the original analogue signal to be coded.  In audio systems this usually means exploiting the fact that loud sounds tend to mask quiet sounds, therefore coders and decoders use various systems for compressing and expanding respectively - i.e. companding - some of which are described below.

2.1.3        Near-Instantaneous (NI) Companding

This is the basis of the NICAM systems used by broadcasters for internal distribution purposes and for Stereo TV sound.  NICAM stands for Near Instantaneously Companded Audio Multiplex

The input signal is first quantized linearly using 14 bits per sample (just like a simple linear PCM system).  However only 10 bits of each 14 bit sample are actually sent to the decoder - the problem is:  which ones?

The amplitude of each sample is represented by a 14 bit binary number.  The most significant bits give the coarse amplitude of the sample and the least significant bits give the fine detail.  It was mentioned earlier that loud sounds tend to mask quiet sounds, so when the sound to be coded is loud it is not necessary to send the fine detail of the least significant bits.  Conversely quiet sounds need to be precisely defined, but by definition are low level and so the most significant bits are not required.  This is an over-simplification but hopefully conveys the general idea.

By choosing the most significant 10 bits out of 14, the least significant 10 bits out of 14 or a group in between, 5 different ranges can be selected.  The range selected depends upon the value of 32 samples and, once selected, the range remains constant for 32 samples.  As the sampling rate is 32 KHz, the range can change at 1mS intervals.  This is frequent enough to cope adequately with high quality audio.

The coder has to tell the decoder which of the 5 ranges have been selected each millisecond.  This information is vital but it only adds a very small amount to the bit rate required, so the overall channel capacity required is reduced by a ratio of almost 14:10.

There are two different varieties of NICAM, both of which use 32KHz sampling and compression from 14 to 10 bits in one millisecond blocks, as described above.

2.1.4        NICAM 3

NICAM 3 was developed in the 1970's (NICAM 1 and 2 were early versions of the specification that did not go into production).

In NICAM 3 the message which tells the decoder which range is being used, is only sent every 3 milliseconds.  So, every 3 milliseconds, the decoder is told the required ranges for the subsequent three 1 millisecond blocks.  This bitstream economy is needed to squeeze 6 channels in 2048Kbit/s per second.  As this is an internationally agreed standard bit rate, the signal can be sent over long distances by a number of methods.

2.1.5        NICAM 728

NICAM 728 was developed in the early 1980's (728 refers to the bitrate used).

NICAM 728 was developed specifically for broadcasting stereo sound with terrestrial television and, unlike NICAM 3, there is no requirement to get 6 channels in 2048Kbit/s, so a higher bit rate can be tolerated.  Two audio channels (stereo) occupy 728kbit/s as opposed to 676kbit/s for NICAM 3.  This higher bitrate provides the greater resilience needed for transmitting to domestic television sets

The NI companding system is the same as NICAM 3 but range code information is sent every millisecond instead of every 3 milliseconds. Also a more elaborate error processing system is used and there is more signalling capacity for miscellaneous data.

2.1.6        Two Channel Sound-in-Syncs

This system inserts a NICAM728 bitstream into the sync pulses of a video signal and it is used for distributing TV sound to transmitting stations.  It is also used for vision contributions.

2.1.7        MUSICAM

MUSICAM was developed in the late 1980's.

This is the bitrate reduction technique that is used for Digital Audio Broadcasting (DAB).  The MUSICAM technique can be used to produce bitrates from 32 to 244 kbit/s per mono channel, according to the quality required.

Bitrate reduction is achieved by exploiting the psycho-acoustic effect whereby loud sounds mask quiet sounds, especially if they are near to the same frequency.

The first stage is to split the spectrum of the audio signal into 32 separate frequency bands so that, in essence there are 32 separate audio signals.  Each signal, if connected to a loudspeaker, would only produce a narrow range of frequencies rather like the effect of putting one control of a Hi-Fi graphic equaliser at maximum and all the rest at minimum, only more extreme. 

If the 32 audio signals were combined together the original sound would be recovered.  However, at any particular time it is likely that most of the sound will be coming from just a few of the 32 signals and if the rest are slightly distorted, particularly those with frequencies close to the loud signals, then the difference would be undetectable by the human ear.

The next stage is to use the standard PCM technique to quantize, separately, each of the 32 signals.  Some channels will need very precise quantization (16 bits per sample), but for the reason outlined above others will only need coarse quantization, sometimes as low as 4 bits per sample or even no sample at all.  In order to decide how many bits to use for the quantization, the entire audio spectrum is analysed every 12 ms.

All of the processes outlined above are carried out by processing information in digital form (the 32 channels could not really be connected to loudspeakers because they only exist as data inside a microchip.)

2.2         Digital video coding

2.2.1        Linear Video Coders

The basic principles are the same as for digital audio coding (described earlier).

Fig 25   Conversion from an analogue to a digital signal

The simplest form of PCM video coder takes 9 bit samples of an analogue PAL video signal, at three times colour sub-carrier frequency.  The resultant bitstream, together with audio/data channels, fits comfortably within the standard transmission bitrate of 140Mbit/s.  This is the type of coder used on the Energis network.

The advantage of this type of coder is that the decoder faithfully reproduces all parts of the analogue video signal, including teletext and Sound-in-Syncs.  Furthermore several such coders and decoders can be connected in tandem with minimal degradation of picture quality.  A high bitrate is required but this can be accommodated on an optical fibre network.

2.2.2        Video coders with bitrate reduction

There are many applications which require video coders that have a much lower bitrate and the specification known as MPEG2 is the coding system used for digital television broadcasts from satellite, cable and terrestrial transmitters.

2.2.3        MPEG

MPEG stands for the Motion Picture Experts Group that started work in 1988 and developed the specifications.  MPEG1 was specified in 1993 for applications such as video and sound on CD-ROM.  It normally operates at about 1.5Mbit/s and provides quality similar to VHS video recorders.

MPEG2 was developed between 1990 and 1994 and offers the following improvements compared with MPEG1:-

        It is optimised for interlaced as opposed to progressively scanned pictures.

        It can code 625 line, normal or wide screen pictures, with good quality, operating between about 3 and 9Mbit/s.  (It can also code High Definition television at 15 to 25 Mbit/s.)

        It provides multi-channel surround sound.

        It supports multiple programmes

        It can tolerate error prone transmission systems

        It supports scrambling (as needed for conditional access subscription programmes)

The specification covers video coding, audio coding and systems management/multiplexing.  It is a generic specification that supports different applications that have different requirements, and variants are defined by "profiles" and "levels".  Profile refers to a particular application and level refers to such things as picture resolution.  The variant of chief interest for digital broadcasting is "main profile/main level".

2.2.4        MPEG Video Bitrate reduction

Bitrate reduction is achieved by exploiting the predictable nature of television pictures and the limited visual perception of the human eye.  Consider the two pictures below, which are intended to represent consecutive frames of a television picture.

Fig 26   Consecutive frames of a moving television picture

At this stage, rather than thinking about television, imagine having to describe picture B over the 'phone to someone who already has picture A. 

1.       One way would be to ignore the fact that the other person has picture A, treat the picture as 400,000 evenly spaced picture elements (pixels), assign a number to the brightness of each one, and read out all the numbers over the phone!

2.       Another way would be to say that picture B is very similar to picture A but the yacht has moved 12mm to the right.

The second method is much easier to do, but how accurate is it?  In order to determine the exact movement between the pictures we would have to overlay picture B with picture A, move it around to get the best fit and measure the offset.  The problem is that it is not always possible to get a perfect fit.  Did you spot the other difference between the pictures?  Someone has moved to a front window in picture B so, even after shifting picture A to the right there is still a difference:-

Fig 27   The difference between picture A shifted to the right and picture B

So, to describe picture B  more precisely to the person at the other end of the phone we need to describe image C.  The description could be: "A black patch 2mm high and 1mm wide, 45mm from the left and 32mm from the bottom".  It may not be exact, but nobody will notice that in fact it is a bit wider and dark grey at the top.

In essence these are the main bitrate reduction techniques used by MPEG2.  The first technique is called motion compensated prediction and the second is called DCT.  This stands for Discrete Cosine Transform and it enables simple images such as image C above to be described with just a few numbers.

Obviously the outline above is far from thorough, so here are a few more details.

The overall picture is divided up into fairly small squares and the process takes place on each square individually, not the whole picture as in the outline description above.

There are three types of prediction mode: none, motion-compensated prediction from the past frame, and motion-compensated prediction from the past and future frames.  All three types normally occur within every half second or so. 

When there is a shot change the first option is used because the previous frame is of no use for prediction, but fortunately the eye is relatively insensitive to detail immediately after a shot change so the subsequent DCT process does not have to be very accurate, which is just as well because it has a more complicated picture to code.

The third option gives the best prediction, but the penalty is that there is more delay through the system.

The DCT system usually operates on an 8 x 8 block of pixels and usually, due to the nature of pictures, the values (luminance/chrominance) of each pixel in the block will be about the same.

 

5

5

5

5

5

6

6

6

5

5

5

5

5

5

6

6

5

5

5

5

5

5

5

6

5

5

5

5

5

5

5

5

4

5

5

5

5

5

5

5

4

4

5

5

5

5

5

5

4

4

4

4

5

5

5

5

4

4

4

4

4

5

5

5

 Is far more likely than:-

1

5

1

5

15

5

1

5

7

2

7

2

7

2

7

2

3

6

9

3

6

9

3

6

1

1

1

1

9

9

9

9

8

8

8

8

2

2

2

29

2

2

2

7

7

8

8

8

9

9

9

1

9

9

9

1

1

1

1

8

2

2

2

5

Fig 28   Values of pixels presented to the DCT

Taking the first block, the DCT will do some calculations and produce another 8 x 8 block of different numbers.  The easiest thing to describe is the average of all the numbers in the input block, which in this case is 5.25 so the DCT would put this number in the top left hand corner.  Then a few other numbers would be used to describe the variations such as the 'tilt' from mainly fours to mainly sixes.  The result would be something like this.

5.25

1.6

0.6

0.07

0

0

0

0

2.45

1.1

0.1

0

0

0

0

0

0.4

0.15

0

0

0

0

0

0

0.1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

 Fig 29  Illustration of the output from a Discrete Cosine Transform

Due to the usual lack of variation in the input block, the output block has many zeros, which can be ignored and so the original block can be described with just ten numbers in this case.  (More detail would result in more non-zero numbers.)

If there is time to send all of the ten numbers then they are sent.  However if the transmission channel can't cope even with this reduced amount of data then the numbers can be quantized or rounded, so 0.07 might become 0.1 and 2.45 might become 2,5 etc.  Also the non-zero numbers furthest away from the top left hand corner may be progressively discarded until the resultant bitrate is low enough to fit in the transmission channel.  These two processes impair the decoded picture, but the MPEG2 parameters have been chosen to minimise the effect.

MPEG2 decoding is simpler than coding because motion prediction is not needed.  The decoder simply (!!) carries out an inverse DCT and uses the motion information from the coder to reconstruct the original picture.

2.1         Digital multiplexing systems

The MPEG2 specification includes details of arrangements for multiplexing several programmes together, as well as managing the configuration of coders and decoders.

The video and audio coding processes described above result in Elementary Streams.

Each Programme Stream consists of one or more Elementary Streams, such as video, audio in one or more languages, teletext etc..

Several programmes may be multiplexed into a Transport Stream, together with timing information, Service Information for all the Programmes in the multiplex, Private Data and support for scrambling (but MPEG does not specify a scrambling or access control system).  The result is a series of 188 byte long packets of data which can be conveyed on a digital transmission system e.g. Digital Terrestrial Television, using COFDM modulation, or Digital Satellite Broadcasting using QPSK modulation or Digital Cable Broadcasting using QAM.

2.2         Digital transmission systems

Once a signal has been produced by a coder and multiplexed if necessary, the bitstream can be transmitted to a distant location by a number of means.  The possibilities are almost endless, but a few systems are outlined below.

2.2.1        Transmission on Optical Fibres

Optical fibres offer a means of transmitting a very large amount of information over long distances.  They are also inherently immune from most types of interference.  Typically a PTO sends about 2.5Gbit/s along each fibre, with repeaters every 30km or so.  However advances in technology such as Wavelength Division Multiplexing are facilitating much higher bitrates (WDM, in effect, uses separate colours to send several bitstreams along the same fibre simultaneously).  In essence a laser shines light into the glass fibre and the light can't escape, for much the same reason that is not possible to see through the surface of water unless looking at a fairly steep angle.  However the light does appear at the other end of the fibre and a light sensitive detector produces an electrical signal with two voltages according to whether the laser is on or off.

2.2.2        Transmission on Copper Cables

Bitstreams of up to about 2Mbit/s can be sent a few kilometres on ordinary telephone lines, although some fairly sophisticated signal processing is needed to achieve this.  It does however offer the possibility of conveying one 'VHS quality' MPEG coded programme to an individual household.  The transmission technique is called Asymetrical Digital Subscriber Loop (ADSL) and it is asymetrical because the bitrate capacity to the subscriber (for TV) is much greater than the capacity from the subscriber (for selecting programmes etc.).

2.2.3        Digital Cable Broadcasting

Coaxial copper cables (basically similar to TV antenna cable) can carry hundreds of Mbit/s a few kilometres, depending on the exact construction of the cable.  Digital Cable Television uses basically the same MPEG2 bitstream as Digital Satellite or Digital Terrestrial, but the bitstream is used to modulate a carrier which is fed over a cable, usually along with other services at different carrier frequencies.  Quadrature Amplitude Modulation is used where, in essence, the input bitstream (see Fig 30) is split into two bitstreams (B2 and B2) of half the rate, each one amplitude modulates a carrier and the result is added together.  Prior to modulation, each sub-bitstream is divided into blocks of usually 8 bits, but consider 2 for explanation purposes.  Since there are 4 (decimal) combinations of 2 bits they are used to give four amplitude modulation levels.  The two carriers have the same frequency, but their phase is shifted by 90, hence the word quadrature

Fig 30   Illustration of Quadrature Amplitude Modulation

The signal on the right shows the result of adding the two modulated signals together.  This is the signal that is applied to the cable.  Note that as well as the amplitude varying there is also a shift on the time axis i.e. a variation in phase.

2.2.4        Digital Satellite Broadcasting

Digital Satellite Broadcasting uses basically the same MPEG2 bitstream as Digital Cable or Digital Terrestrial Broadcasting, but the bitstream is used to modulate an SHF carrier which is transmitted to a transponder in a satellite.  The transponder changes the frequency, amplifies the signal and then transmits it back to earth.  The type of modulation used is Quadrature Phase Shift Keying (QPSK), which is fairly similar to QAM (See Fig 30) but has characteristics more suited to a satellite channel.

2.2.5        Digital Terrestrial Broadcasting

Digital Terrestrial Broadcasting uses basically the same MPEG2 bitstream as Digital Cable or Digital Satellite Broadcasting, but a completely different type of modulation is used - Coded Orthogonal Frequency Division Multiplex (COFDM).  COFDM is also used for Digital Audio Broadcasting (DAB).

If a carrier is modulated with a very high bitrate it cannot be received reliably in the presence of multipath propagation, which is an effect that occurs on terrestrial UHF transmission.

Fig 31   Illustration of multipath interference

The signal travels on a direct path to the receiver and it is also reflected from buildings, following a longer path to the receiver.  The signal travels at the speed of light (300,000 km/s), so if the path difference is 17 metres, two 18Mbit/s signals would arrive shifted by one bit, making the result impossible to decipher.  Greater path delay would be just as bad until path B was so long that it significantly reduced the signal strength.

The solution to this problem is to divide the bitstream into many (e.g. 2000) separate, slower bitstreams and use each of them to modulate carriers of different frequencies.  So the 18Mbit/s bitstream mentioned above is divided into 2000 bitstreams, of 9kbit/s each.  Then each of these bitstreams modulates its own carrier. 

Considering just one of these 9kbit/s bitstreams, and referring back to Fig 31, the difference in path length would have to be 33km to delay by one bit period, and at this distance the signal strength of the reflected signal would be negligible..

It is not feasible to draw 2000 modulators, so the diagram below illustrates the process with just ten.

Fig 32   Illustration of Coded Orthogonal Frequency Division Multiplex

Each of the carrier frequencies are closely spaced so there is some overlap between the sidebands, but the modulation system is carefully arranged so that they do not interfere - the signals are orthogonal.

In reality all of the above process, except the Radio Frequency amplification, are carried out by processing digital signals in a set of microchips.  Broadly speaking a COFDM demodulator carries out the process in reverse.

2.3         Digital systems ancillary to analoge services

2.3.1        Teletext

Pages of text and graphics are compiled using a computer system and the resulting data is added to an analogue television signal.  The basic television signal controls the brightness of red, green and blue beams which scan across a television picture and work their way from top to bottom.  When the beams reach the bottom of the picture it takes a finite time for them to move back to the top and it is during this period that teletext data is inserted into the television signal. 

Fig 33   Teletext data in a video signal

A special module in a television set decodes the teletext data and displays the resultant text and graphics on the screen.  In an ideal world, the entire transmission system from the output of Television Centre through to the domestic television sets, should convey teletext without any further processing.  However in order to minimise the likelihood of incorrect characters being displayed, the signal is "regenerated" at most of the larger transmitting stations.  There is another complication in that the television distribution network usually goes via regional studio centres in order for them to insert opt outs and "teletext bridges" are needed to ensure that the teletext signal passes through even though the studio is opting out the vision and sound.

2.3.2        Radio Data System (RDS)

RDS means Radio Data System and it is used to provide a service which is associated with VHF/FM radio.  The system enables radios to be tuned more easily and offers other advantages such as automatic selection of traffic announcements.  Information is compiled on a computer system and distributed over data circuits to the main FM transmitters.  Associated with each individual transmitter there is a piece of equipment called an RDS assembler, which takes the incoming data feed, combines it with some relatively fixed information which is held locally and produces a digitally modulated sub-carrier which is combined with the audio to feed the transmitter’s FM modulator.  RDS radios have a module which decodes the data on the sub-carrier and controls the tuning of the radio as well as driving an alpha numeric display.

A traffic announcement system can be initiated from any local radio studio.  Just before the announcement is due to take place (and usually automatically initiated by a jingle) a piece of data is sent from the local radio studio to the central RDS computer which sends out a message to all transmitting stations.  This message is interpreted by the appropriate transmitting station which covers the road traffic area of interest, and this message is then broadcast to RDS receivers in this area.  On receiving the signal the RDS receivers automatically re-tune to the local radio station for the duration of the traffic announcement.