Chapter 3: Protease Cleavage Pattern Discovery
A protein functions when it interacts with molecules or chemicals. Among many formats of interactions, protease cleavage is one of the widely researched subjects for several decades. This type of research aims to build up a predictive model based on collected laboratory data to discover novel interactions. Such a model is commonly established based on laboratory-verified protease cleavage data, in which the association knowledge between protease cleavage structure and protease cleavage function can be examined. A protease cleavage structure commonly means a primary sequence (or sub-sequence or peptide) which is believed to contain a specific amino acid composition pattern or trend in relationship with the protease cleavage function. In other words, the protease cleavage pattern must not show a random amino acid composition. Instead, the composition of the amino acids in a data set of protease cleaved peptides should demonstrate a trend for a specific protease to recognise for the interaction. To make a protease cleavage pattern discovery model to work efficiently, two types of peptides are collected and pooled together for constructing a model. They are the cleaved peptides and the non-cleaved peptides. Non-cleaved peptides definitely must have no trend of the amino acid composition at all. Instead, they must show random distribution of the amino acids. By the contrast comparison between a data set with a trend and a data set without any trend, a pattern by which two types of data can be discriminated can thus be discovered and can thus be formulated as the knowledge or the intelligence. The purpose of discovering such intelligence is, no doubt, for the prediction. The protease cleavage pattern discovery problem is a classification problem in machine learning. This chapter will introduce several classification analysis or discriminant analysis algorithms. This chapter will discuss how these algorithms can be used for the protease cleavage pattern discovery. Importantly, how to encode peptides so as to make encoded data biologically sound is the key to make a protease cleavage pattern discovery task successful. This chapter will therefore introduce the bio-basis function, a new and cutting-edge approach for encoding peptides. Based on the bio-basis function, several advanced algorithms will be introduced for the protease cleavage pattern discovery.