Chapter 5: Spectral Pattern Discovery
Analysing a spectrum to discover chemicals and molecules has been one of the important subjects in biochemistry research. However, spectrometric data is complex because it is often a mixture between a number of signals and a complicated background. The latter is also called a baseline. The signals mixed with the background of a spectrum can be well discovered only after the background of a spectrum has been accurately identified. Background estimation or baseline removal is thus the very first step to go in the area of spectra pattern discovery. The difficulty, however, is that a baseline of a spectrum is hardly an easily estimated linear function (a straight curve) or a simple function. Instead, it is commonly a complex, unknown and a non-analytic function. Many algorithms have therefore been developed for estimating the baseline of a spectrum in the hope to extract peaks and thus discover the signals as accurately and as correctly as possible. Only when a baseline has been accurately estimated and removed, the number of the falsely discovered signals can be minimised and the number of the true signals can be maximised. Among many algorithms, the Whittaker-Henderson smoother is one of the best for spectra pattern discovery. This chapter will introduce this algorithm and its variants as well as other algorithms which are used for spectra pattern discovery. How these algorithms can be applied to some real spectrometric data for spectra pattern discovery will be introduced in this chapter. How to accurately extract peaks, discover signals, separate signals from artifacts and align discovered signals across replicates will also be introduced in this chapter.