APPLICATION OF FRACTAL REPRESENTATION OF GENETIC TEXTS FOR RECOGNITION OF GENOME FUNCTIONAL AND CODING REGIONS
By applying fractal representation of nucleotide sequences for plotting a set of functionally similar sequences, a new approach for classification of nucleic sequences was suggested1,2 and some measures of sequence similarity were introduced. Many examples of good separation of sequences belonging to different gene families were shown. Among them are a good classification for such subfamilies as α and β-actins, α-, β-, γ-interferons. The method does not require alignment procedure both for generating a recognition matrix of learning set and for searching homologous regions. The computer time does not depend on the length of the searching sequence and the fractal images of sequence sets can be compared easily by computer procedures as well as visually. The latter is especially convenient for representing the density of fractal mask as a third coordinate of the image. The method is successfully applied both for searching genes (globins, histones, etc.) and different kind of repetitive DNA sequences (Alu, LTR, etc.).3 The FRS approach is used also for revealing the gene structure in uncharacterized sequences. The fractal images for exons, introns, 5′- and 3′-region have significantly different patterns which permit us to find preliminary localizations of these gene regions. We obtained functions which recognize small size coding regions with 90% accuracy for 54 bp windows, 95% for 108 bp windows and the functions which recognize 5′- and 3′-regions of eukaryotic genes based on 8–9 bp oligonucleotides statistics, that can be simply recomputed with the extension of the current database. Some examples of the application of fractal representation of amino acids sequences for protein classification are also given.