LOCATION OF REPETITIVE REGIONS IN SEQUENCES BY OPTIMIZING A COMPRESSION METHOD
Suppose that a biologist wishes to study some local property P of genetic sequences. If he can design (with a computer scientist) an algorithm C which efficiently compresses parts of the sequence which satisfy P, then our algorithm TURBOOPTLIFT locates very quickly where property P occurs by chance on a sequence, and where it occurs as a result of a significant process. Under some conditions, the time complexity of TURBOOPTLIFT is O(n log n). We illustrate its use on the practical problem of locating approximate tandem repeats in DNA sequences.