Search
This Book
This Book
Anywhere
Quick Search in Books
Enter words / phrases / DOI / ISBN / keywords / authors / etc
Access type:Only show content I have full access toOnly show Open Access
Advanced Search
0 My Cart
Sign in
Institutional Access

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

Bioinformatics, Supercomputing and Complex Genome Analysis cover

Bioinformatics, Supercomputing and Complex Genome Analysis

Proceedings of the 2nd International Conference

, Florida, USA

, 4 – 7 June 1992

https://doi.org/10.1142/1819 | September 1993

Pages: 680

Edited by:
H A Lim (SCRI, Florida State Univ.),
J W Fickett (LANL),
C R Cantor (Boston Univ.), and
R J Robbins (Applied Res. Lab., Johns Hopkins Univ.)

Tools

Purchase Save for later

ISBN: 978-981-4602-55-6 (ebook) USD 50.00 Add to cart

Since the beginning of the genome project, the necessary involvement of scientists of widely divergent backgrounds has been evident. The proper handling, analysis, dissemination of information, and the control and data gathering of automated process are areas where computers are directly involved. Thus computers are intimately tied into the production and analysis of biological data. However, many challenges lie ahead.

This volume is a collection of selected oral and poster presentations given at The Second International Conference on Bioinformatics, Supercomputing and Complex Genome Analysis, organized to address some of these challenges. The topics include the current status and future prospects of genome map, mapping and sequencing, complex genome analysis,linguistic and neural network approaches, database issues, and computer tools in the genome project. The volume will be ideal for students, newcomers, young researchers and experts alike, who are computationally or experimentally oriented.

Keynote Speakers: C L Smith, D Grothues, T Ito, T Sano, D Wang, Y-W Zhu, C R Canton & R J Rohins.

Contents:

The conference consisted of several topical areas and more than fifty talks. The topics covered are: Linguistic Approaches
Mapping and Sequencing I
Genome Map
Mapping & Sequencing II
Databases
Applications of Neural-Net
Genome Sequence Analysis

Readership: Biochemists, biologists, computational scientists, geneticists and mathematicians.

Free Access

FRONT MATTER

Pages:i–xxix

https://doi.org/10.1142/9789814503655_fmatter

No Access

ACCELERATING THE RATE OF GENOME MAPPING AND SEQUENCING: WILL COMPUTATIONAL BIOLOGY KEEP UP?

Pages:3–15

https://doi.org/10.1142/9789814503655_0001

Preview Abstract

No Access

GENOME INFORMATICS: REQUIREMENTS AND CHALLENGES

Robert J. Robbins

Pages:17–30

https://doi.org/10.1142/9789814503655_0002

Preview Abstract

No Access

LINGUISTIC APPROACHES TO UNDERSTANDING THE MEANING OF DNA

Sarah Barron

Pages:33–34

https://doi.org/10.1142/9789814503655_0003

Preview Abstract

No Access

REASONING ABOUT BIOCHEMICAL COMPOUNDS AND PROCESSES

Toni Kazic

Pages:35–49

https://doi.org/10.1142/9789814503655_0004

No Access

GENENG: A DIALOGUE-BASED NATURAL LANGUAGE INTERFACE TO THE GENBANK

Ezat Karimi

Pages:51–60

https://doi.org/10.1142/9789814503655_0005

Preview Abstract

No Access

DNA STATISTICS, OVERLAPPING WORD PARADOX AND CONWAY EQUATION

Pavel A. Pevzner

Pages:61–67

https://doi.org/10.1142/9789814503655_0006

Preview Abstract

No Access

PLAUSIBLE CLASSIFICATION CODES AND LOCAL COMPOSITIONAL COMPLEXITY OF NUCLEOTIDE SEQUENCES

Andrzej K. Konopka

Pages:69–87

https://doi.org/10.1142/9789814503655_0007

No Access

A SYNTACTIC PATTERN RECOGNITION SYSTEM FOR DNA SEQUENCES

David Searls and
Shan Dong

Pages:89–101

https://doi.org/10.1142/9789814503655_0008

Preview Abstract

No Access

DNA AS A LANGUAGE

E. N. Trifonov

Pages:103–110

https://doi.org/10.1142/9789814503655_0009

Preview Abstract

The classical triplet code is not the only code carried by the sequences. They contain, for example, the gene-splicing code, transcription codes and many other codes. By analyzing a large volume of the nucleotide sequences available, i.e., by performing various computer experiments with the sequences, one can decipher them and extract from them valuable biological information. At the DNA level there are at least two more codes — the DNA shape code and the chromatin code. The overall DNA shape is sequence-dependent and can be described by a set of angles characteristic for various dinucleotide elements — codons of the DNA shape code. The chromatin code provides instructions for histone octamers where along the DNA to form the nucleosomes. This code is expressed as positional periodicity of, primarily, AA and TT dinucleotides. A new RNA code has been described — the translation framing code. The frame seems to be maintained by a synchronizing pattern GCUGCUGCU… hidden in mRNA. Most enigmatic of all is, perhaps, the gene-splicing code. An interesting recent development indicates that the gene-splicing pattern in the sequences and the nucleosomal pattern have some common features. This has to do with superposition of the patterns that is characteristic for the sequence language in general which carries simultaneously many codes in one and the same text. This results in an increased complexity of the sequences. Analysis of the protein-coding sequence complexity in eukaryotes and in prokaryotes revealed that the former are simpler. This is interpreted as the result of a spatial separation of the triplet code (carried by exons) and the chromatin code (carried by introns). Perhaps, the necessity of the separation of otherwise conflicting codes is one of the reasons why the intervening sequences had been introduced at all. The nucleotide sequences are written in an unbroken manner. One way to detect “words” in such a continuous text is to evaluate the degree of internal correlation by calculating contrast values for the words. This technique allows one to derive vocabularies, which are species- and function-specific. The nucleotide sequences, thus, carry numerous superimposed messages. We do understand only a few of these messages while many more are waiting for their turn to be deciphered.

No Access

ANALYSIS OF EXPRESSED SEQUENCE TAGS FROM HUMAN BRAIN CDNAS

Pages:113–119

https://doi.org/10.1142/9789814503655_0010

Preview Abstract

No Access

SBH AND THE INTEGRATION OF COMPLEMENTARY APPROACHES IN THE MAPPING, SEQUENCING, AND UNDERSTANDING OF COMPLEX GENOMES

Pages:121–134

https://doi.org/10.1142/9789814503655_0011

Preview Abstract

A variant of sequencing by hybridization (SBH) is under development with a potential to inexpensively determine up to 100 million base pairs per year. The method comprises five experimental steps: 1) arraying short clones in 864-well plates; 2) growth of the M13 clones or PCR of the inserts; 3) automated spotting of DNAs by corresponding pin-arrays; 4) hybridization of dotted samples with 200-3000 ³²P- or ³³P-labeled 6- to 8-mer probes; and 5) scoring hybridization signals using storage phosphor plates. The method opens up intriguing possibilities for genome analysis. Some 200 7- to 8-mers can provide an inventory of the genes if cDNA clones are hybridized, or can define the order of 2-kb genomic clones, creating physical and structural maps with 100-bp resolution; the distribution of G+C, LINEs, SINEs, and gene families would be revealed. cDNAs that represent new genes and genomic clones in regions of interest selected by SBH can be sequenced by a gel method. Uniformly distributed clones from the previous step (20% of all) will be hybridized with 2000–3000 6- to 8-mers. As a result, approximately 50-60% of the genomic regions containing members of large repetitive and gene families and those families represented in GenBank would be completely sequenced. In the less redundant regions, every base pair is expected to be read with 3-4 probes, but the complete sequence can not be reconstructed. Such partial sequences allow the inference of similarity and the recognition of coding, regulatory, and repetitive sequences, as well as study of the evolutionary processes all the way up to the species delineation. Targeted gel sequencing with up to 10% error can be effectively used to complete sequences of genomic segments more than 70% similar to the treated ones. A 1000-bp read from a single gel strip would be sufficient to complete several thousand base pairs of sequence generated by duplications or present in genomes of closely related species. More interestingly, partial sequences generated with the same probes on three to four 70– to 90%-similar genomes may mutually complete each other, minimizing additional experimental data. This SBH variant can effectively fill a gap between expensive 300- to 600-bp runs on sequencing gels and the 10- to 100-kb resolution of the presently available mapping techniques.

No Access

PHYSICAL MAPPING OF THE HUMAN GENOME: COMPUTATIONAL ASPECTS

Pages:135–146

https://doi.org/10.1142/9789814503655_0012

Preview Abstract

No Access

CPROP: A RULE-BASED PROGRAM FOR CONSTRUCTING GENETIC MAPS

Pages:149–167

https://doi.org/10.1142/9789814503655_0013

Preview Abstract

No Access

CLONE ORDERING BY SIMULATED ANNEALING: APPLICATION TO THE STS-CONTENT MAP OF CHROMOSOME 21

Philippe Rigault

Pages:169–176

https://doi.org/10.1142/9789814503655_0014

Preview Abstract

No Access

MODELS AND ANALYSES OF GENOMIC EVOLUTION

David Sankoff

Pages:177–183

https://doi.org/10.1142/9789814503655_0015

No Access

PULSED FIELD GEL ELECTROPHORESIS SIMULATIONS IN THE DIFFUSIVE REGIME

M. A. Smith and
Y. Bar-Yam

Pages:185–193

https://doi.org/10.1142/9789814503655_0016

Preview Abstract

No Access

MULTIPLEX GENOMIC WALKING: INTEGRATION OF THE WET LAB AND COMPUTER LAB INTO A SINGLE PROTOTYPING ENVIRONMENT

P. M. Gillevet

Pages:197–205

https://doi.org/10.1142/9789814503655_0017

Preview Abstract

No Access

DATA MANAGEMENT FOR RE-SEQUENCING DNA

Pages:207–217

https://doi.org/10.1142/9789814503655_0018

Preview Abstract

No Access

PHYSICAL MAPPING OF GENOMES USING THE LANDMARK STRATEGY

Robert L. Charlebois

Pages:219–229

https://doi.org/10.1142/9789814503655_0019

Preview Abstract

No Access

PREDICTION OF GENE LOCATIONS USING DNA MARKOV CHAIN MODELS

Pages:231–248

https://doi.org/10.1142/9789814503655_0020

Preview Abstract

No Access

MASSIVELY PARALLEL PROCESSING ON THE INTEL PARAGON SYSTEM: ONE TOOL IN ACHIEVING THE GOALS OF THE HUMAN GENOME PROJECT

Denise J. Ecklund

Pages:249–260

https://doi.org/10.1142/9789814503655_0021

Preview Abstract

A massively parallel computing system is one tool that has been adopted by researchers in the Human Genome Project. This tool is one of many in a toolbox of theories, algorithms, and systems that are used to attack the many questions posed by the project. A good tool functions well when applied alone to the problem for which it was devised. A superior tool achieves its solitary goal, and supports and interacts with other tools to achieve goals beyond the scope of any individual tool. It is our thesis that Intel's massively parallel Paragon™ XP/S system is a superior tool. This paper presents specific requirements for a superior computing tool for the Human Genome Project (HGP) and shows how the Paragon system addresses these requirements.

Computing requirements for HGP are based on three factors:

1. computing requirements of algorithms currently used in sequence homology, protein folding, and database insertion/retrieval;

2. estimates of the computing requirements of new applications arising from evolving biological theories; and

3. the requirements for facilities that support collaboration among scientists in a project of this magnitude.

The Paragon system provides many hardware and software features that effectively address these requirements. These features include high-performance RISC processors for compute-intensive applications, a scalable architecture for increasing problem sizes, flexible resource management for ease-of-use and machine sharing, adherence to communication standards and remote access facilities for interactive and collaborative work with applications and researchers at remote facilities.

The power of these system features are exemplified by the results of applications in sequence alignment, sequence analysis, and molecular phylogeny, developed by computer scientists at Argonne National Laboratory and biologists at the University of Illinois, and executed on the massively parallel Intel Touchstone Delta system, a one-of-a-kind prototype of the Paragon XP/S system.

No Access

ODS: OVERLAPPING OLIGONUCLEOTIDE DATABASE WITH DEDUCTIVE ENGINE FOR SIGNAL SEQUENCE SEARCH

Toshihisa Takagi

Pages:263–272

https://doi.org/10.1142/9789814503655_0022

Preview Abstract

No Access

A GENETIC ALGORITHM FOR ASSEMBLING CHROMOSOME PHYSICAL MAPS

Pages:273–285

https://doi.org/10.1142/9789814503655_0023

Preview Abstract

No Access

A GRAPHICAL ENVIRONMENT FOR A COMPUTERIZED METABOLIC MAP

Pages:287–296

https://doi.org/10.1142/9789814503655_0024

Preview Abstract

No Access

COMPARATIVE ANALYSIS OF GENOMIC DATA: A GLOBAL LOOK AT STRUCTURAL AND REGULATORY FEATURES

Pages:297–308

https://doi.org/10.1142/9789814503655_0025

Preview Abstract

No Access

AN OBJECT-ORIENTED KNOWLEDGE BASE MANAGEMENT TECHNOLOGY FOR SUPPORTING SCIENTIFIC RESEARCH AND APPLICATIONS

Stanley Y. W. Su and
Nabil Kamel

Pages:309–328

https://doi.org/10.1142/9789814503655_0026

Preview Abstract

No Access

APPLICATION OF NEURAL NETWORKS TO THE ANALYSIS OF STRUCTURE AND FUNCTION IN BIOLOGICALLY ACTIVE MACROMOLECULES

Michael N. Liebman

Pages:331–347

https://doi.org/10.1142/9789814503655_0027

Preview Abstract

No Access

PROTEIN CLASSIFICATION ARTIFICIAL NEURAL SYSTEM: A FILTER PROGRAM FOR DATABASE SEARCH

Pages:349–358

https://doi.org/10.1142/9789814503655_0028

Preview Abstract

A neural network classification method has been developed as an alternative approach to the large database search/organization problem. The system, termed Protein Classification Artificial Neural System (ProCANS), is implemented on a Cray Y-MP8/864 supercomputer for rapid superfamily classification of unknown proteins based on the information content of the neural interconnections. The system employs an n-gram hashing function for sequence encoding and modular back-propagation networks for classification. The system was developed with the first 2,724 entries in 690 superfamilies of the annotated PIR (Protein Identification Resource) protein sequence database. Three prediction sets were used to evaluate the system performance. The first consists of 651 annotated entries randomly chosen from the 690 superfamilies. The second set consists of 482 unclassified entries from the preliminary PIR database, whose superfamilies were identified by the fasta, blastp and sp database search methods. The third set is a subset of data set 2 with only superfamilies of more than 20 entries. At a low cut-off score of 0.01, the sensitivity is 92, 82 and 100%, respectively, for the three prediction sets. At a high cut-off score of 0.9, on the other hand, a close to 100% specificity is achieved with a reduced sensitivity. The classification accuracy is determined by three factors: the degree of similarity, the sequence length, and the size of the superfamily. The classification on neural nets is fast (i.e., less than 0.5 Cray CPU second per sequence on a full-scale system). The speed would not be constrained by database sizes because the search time grows with the number of superfamilies which is likely to remain low. Therefore, ProCANS can be used as a filter program to provide a reduced search space and speed up database searches. The rapid superfamily identification provided by ProCANS would be particularly valuable to the organization of protein sequence databases and to the gene recognition in large sequencing projects. A current extension to ProCANS is the incorporation of motif information to further improve its sensitivity. The design concept has also been applied to the classification of nucleic acid sequences. A preliminary result showed a 96% accuracy for 16S ribosomal RNA classification. The software tool is generally applicable to any second generation databases that are organized according to family relationships.

No Access

USE OF THE BACKPROPAGATION NEURAL NETWORK ALGORITHM FOR PREDICTION OF PROTEIN FOLDING PATTERNS

Pages:359–375

https://doi.org/10.1142/9789814503655_0029

Preview Abstract

No Access

USING KNOWLEDGE-BASED NEURAL NETWORKS TO REFINE EXISTING BIOLOGICAL THEORIES

Pages:377–390

https://doi.org/10.1142/9789814503655_0030

Preview Abstract

No Access

CHARACTERIZING AND LEARNING OF PROTEIN CONFORMATIONS

Su-Shing Chen

Pages:391–399

https://doi.org/10.1142/9789814503655_0031

Preview Abstract

No Access

HOMOLOGOUS GENE FAMILY DATABASE COMPILATION

H. A. Lim and
I. N. Shindyalov

Pages:403–411

https://doi.org/10.1142/9789814503655_0032

Preview Abstract

No Access

A METHOD FOR JOINING CONTIGS IN AUTOMATIC DNA SEQUENCING

Pages:413–418

https://doi.org/10.1142/9789814503655_0033

No Access

MULTIPLE SEQUENCE ALIGNMENT BASED ON NEW APPROACHES OF TREE CONSTRUCTION AND SEQUENCE COMPARISON

Pages:419–428

https://doi.org/10.1142/9789814503655_0034

Preview Abstract

No Access

IDENTIFICATION OF GENES IN GENOMIC AND EST SEQUENCES

Pages:429–433

https://doi.org/10.1142/9789814503655_0035

Preview Abstract

No Access

POCKET ARITHMETIC AND NEEDLEMAN-WUNSCH ON CRAY RESEARCH COMPUTERS

Pages:435–443

https://doi.org/10.1142/9789814503655_0036

Preview Abstract

No Access

PATTERNS OF OLIGONUCLEOTIDE DISTRIBUTION WITHIN DNA AND RNA FUNCTIONAL SITES

Pages:445–463

https://doi.org/10.1142/9789814503655_0037

Preview Abstract

No Access

GENE RECOGNITION AND ASSEMBLY IN THE GRAIL SYSTEM: PROGRESS AND CHALLENGES

Pages:465–476

https://doi.org/10.1142/9789814503655_0038

Preview Abstract

No Access

IDENTIFICATION OF SINGULAR DOMAINS ON CDNA SEQUENCES BY KOHONEN'S FEATURES MAPS

Pages:479–486

https://doi.org/10.1142/9789814503655_0039

Preview Abstract

No Access

A LINGUISTIC INTEGRATION OF A BIOLOGICAL DATABASE

Julio Collado-Vides

Pages:487–499

https://doi.org/10.1142/9789814503655_0040

Preview Abstract

No Access

A NEW AGAROSE GEL MODEL

Pages:501–504

https://doi.org/10.1142/9789814503655_0041

No Access

USE OF A VISUAL COMPARATIVE METHOD TO RESOLVE CONSERVED SEQUENCE MOTIFS IN PROTEINS

Winston Hide and
Lawrence Chan

Pages:505–511

https://doi.org/10.1142/9789814503655_0042

Preview Abstract

No Access

SORFIND: A COMPUTER PROGRAM THAT PREDICTS EXONS IN VERTEBRATE GENOMIC DNA

Pages:513–520

https://doi.org/10.1142/9789814503655_0043

Preview Abstract

No Access

COMPUTER ANALYSIS AND RECOGNITION OF FUNCTIONAL SITES VIA OLIGONUCLEOTIDE PATTERN DISTRIBUTIONS

Pages:521–543

https://doi.org/10.1142/9789814503655_0044

Preview Abstract

No Access

GENEID - A COMPUTER SERVER FOR PREDICTION OF GENES IN DNA SEQUENCES

Pages:545–553

https://doi.org/10.1142/9789814503655_0045

Preview Abstract

No Access

SIMULATIONS OF ORDERING AND SEQUENCE RECONSTRUCTION OF RANDOM DNA CLONES HYBRIDIZED WITH A SMALL NUMBER OF OLIGOMERIC PROBES

Ivan Labat and
Radoje Drmanac

Pages:555–565

https://doi.org/10.1142/9789814503655_0046

Preview Abstract

No Access

EXHAUSTIVE SEARCH OF HOMOLOGOUS REGIONS BETWEEN TWO LARGE DNA SEQUENCES USING AN ENTROPY MEASURE

Shin-yee Lu

Pages:567–572

https://doi.org/10.1142/9789814503655_0047

Preview Abstract

No Access

GENVIEWER: A COMPUTING TOOL FOR PROTEIN-CODING REGIONS PREDICTION IN NUCLEOTIDE SEQUENCES

Pages:573–587

https://doi.org/10.1142/9789814503655_0048

Preview Abstract

No Access

DISTRIBUTION OF INFORMATIVE BINARY CODES IN MAMMALIAN DNA

Pages:589–596

https://doi.org/10.1142/9789814503655_0049

Preview Abstract

No Access

A HYPERTEXT BIBLIOGRAPHY ON GENETIC SEQUENCE ANALYSIS

James F. Rose

Pages:597–607

https://doi.org/10.1142/9789814503655_0050

No Access

APPLICATION OF FRACTAL REPRESENTATION OF GENETIC TEXTS FOR RECOGNITION OF GENOME FUNCTIONAL AND CODING REGIONS

Pages:609–622

https://doi.org/10.1142/9789814503655_0051

Preview Abstract

No Access

THE EFFECTS OF NUCLEOTIDE SUBSTITUTION ON AMINO ACID SUBSTITUTION

Pages:623–634

https://doi.org/10.1142/9789814503655_0052

No Access

FRACTAL DIMENSIONS OF OLIGONUCLEOTIDE COMPOSITIONS OF DNA SEQUENCES

Pages:635–638

https://doi.org/10.1142/9789814503655_0053

Preview Abstract

Free Access

BACK MATTER

Pages:641–648

https://doi.org/10.1142/9789814503655_bmatter