Online citations, reference lists, and bibliographies.
← Back to Search

Automated Gene Identification In Large-Scale Genomic Sequences

Y. Xu, E. Uberbacher
Published 1997 · Biology, Medicine, Computer Science

Save to my Library
Download PDF
Analyze on Scholarcy Visualize in Litmaps
Reduce the time it takes to create your bibliography by a factor of 10 by using the world’s favourite reference manager
Time to take this seriously.
Get Citationsy
Computational methods for gene identification in genomic sequences typically have two phases: coding region recognition and gene parsing. While there are a number of effective methods for recognizing coding regions (exons), parsing the recognized exons into proper gene structures, to a large extent, remains an unsolved problem. We have developed a computer program which can automatically parse the recognized exons into gene models that are most consistent with the available Expressed Sequence Tags (ESTs) and a set of biological heuristics, derived empirically. The gene modeling algorithm used in this program provides a general framework for applying EST information so the modeling accuracy improves as the amount of available EST information increases. Based on preliminary tests on a number of large DNA sequences, using the dbEST database, we have observed that the algorithm can (1) accurately model complicated multiple gene structures, including embedded genes, (2) identify falsely-recognized exons and locate missed exons by the initial exon recognition phase, and (3) make more accurate exon boundary predictions, if the necessary EST information is available. We have extended this EST-based gene modeling algorithm to model genes on unfinished DNA contigs at the end of the shotgun sequencing. This extended version can automatically determine the orientations and the relative order of the DNA contigs (with gaps between them) using the available ESTs as reference models, before the gene modeling phase.
This paper references
Cloning of several cDNA segments coding for human liver proteins.
F. Costanzo (1983)
Cloning of several cDNA segments
G. Raugel (1983)
The GenBank genetic sequence data bank.
H. Bilofsky (1988)
Improved tools for biological sequence comparison.
W. Pearson (1988)
Improved tools for
D. J. Lipman (1988)
Basic local alignment search tool.
S. Altschul (1990)
Use of 3' untranslated sequences of human cDNAs for rapid chromosome assignment and conversion to STSs: implications for an expression map of the genome.
A. S. Wilcox (1991)
Complementary DNA sequencing: expressed sequence tags and human genome project
M. Adams (1991)
Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach.
E. Uberbacher (1991)
Prediction of gene structure.
R. Guigó (1992)
Prediction of gene
R. Guigo (1992)
Prediction of the exon-intron structure by a dynamic programming approach.
M. Gelfand (1993)
dbEST — database for “expressed sequence tags”
M. Boguski (1993)
Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks.
E. E. Snyder (1993)
Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames.
V. Solovyev (1994)
Constructing gene models from accurately predicted exons: an application of dynamic programming
Y. Xu (1994)
Gene structure prediction by linguistic methods.
Sai-keung Dong (1994)
Constructing gene models from a set of accurately predicted exons : An application of dynamic programming
Y. Xu (1994)
Gene structure
S. Dong (1994)
Predicting internal exons
V. V. Solovyev (1994)
Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence.
M. Adams (1995)
The Genexpress Index: a resource for gene discovery and the genic map of the human genome.
R. Houlgatte (1995)
The merck gene index: A public resource
A. R. Williamson (1995)
The geneexpress index: A resource
R. Mairage-Samson (1995)
The merck gene index : A public resource for genomics research
A. R. Williamson (1995)
Initial assessment of human gene
Adams (1995)
Evaluation of gene structure prediction programs.
M. Burset (1996)
Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data.
J. Aaronson (1996)
Gene recognition via spliced sequence alignment.
M. Gelfand (1996)
Gene Prediction by Pattern Recognition and Homology Search
Y. Xu (1996)
GRAIL: a multi-agent neural network system for gene identification
Ying Xu (1996)
A generalized hidden markov models for the representation of human genes in DNA
D. Kulp (1996)
Gene prediction by pattern recognition
E. C. Uberbacher (1996)
Evaluation of gene structure
M. Burset (1996)

This paper is referenced by
Sequence Analysis of Industrially Important Genes from Trichoderma
A. M. El-Bondkly (2014)
Gene Prediction, homology‐based (Extrinsic Gene Prediction, Look‐Up Gene Prediction)
Enrique Blanco (2014)
Fishing and Fish Consumption Patterns in the Gullah/Geechee Sea Island Population
J. H. Ellis (2013)
SnowyOwl: accurate prediction of fungal genes by using RNA-Seq and homology information to select among ab initio models
I. Reid (2014)
Reinforcement Learning for Improving Gene Identification Accuracy by Combination of Gene-Finding Programs
P. Yin (2012)
Functional characterization of the human translocator protein (18kDa) gene promoter in human breast cancer cell lines.
Amani M. Batarseh (2012)
Multicofactor proteins: structure,prediction, function
S. Hearnshaw (2011)
The Populus Genome and Comparative Genomics
C. Douglas (2010)
Genome wide analysis and comparative docking studies of new diaryl furan derivatives against human cyclooxygenase-2, lipoxygenase, thromboxane synthase and prostacyclin synthase enzymes involved in inflammatory pathway.
P. N. Sekhar (2009)
Classification d'ARN codants et d'ARN non-codants
A. Fontaine (2009)
Analysis of the NAC gene family in Triticum aestivum employing novel bioinformatics tools
D. Oren (2008)
Bioinformatic environment for analyzing large scale genome sequence
Krzysztof Sarapata (2008)
Improved genome assembly and evidence-based global gene model set for the chordate Ciona intestinalis: new insight into intron and operon populations
Y. Satou (2008)
Gene/Protein Sequence Analysis
B. Rehm (2008)
Y / ORNLJCP-94756 An Editing Environment for DNA ! & qmce Analysis and Annotation *
Uberbacher (2008)
An artificial neural network method for combining gene prediction based on equitable weights
Y. Zhou (2008)
Characterization of the differentially methylated region of the Impact gene that exhibits Glires-specific imprinting
Kohji Okamura (2008)
Ab initio gene identification: Prokaryote genome annotation with GeneScan and GLIMMER
G. Aggarwal (2002)
Statistical Feature Selection
R. Nilsson (2007)
A seventh locus for otosclerosis, OTSC7, maps to chromosome 6q13–16.1
M. Thys (2007)
Genome-wide analysis and identification of genes related to potassium transporter families in rice (Oryza sativa L.)
R. Amrutha (2007)
Statistical Feature Selection : With Applications in Life Science
R. Nilsson (2007)
BAL1 and BBAP Are Regulated by a Gamma Interferon-Responsive Bidirectional Promoter and Are Overexpressed in Diffuse Large B-Cell Lymphomas with a Prominent Inflammatory Infiltrate
Przemysław Juszczyński (2006)
The Genome of Black Cottonwood, Populus trichocarpa (Torr. & Gray)
G. Tuskan (2006)
Bioinformatics for plant genome annotation
M. Fiers (2006)
Computational analysis of the Phanerochaete chrysosporium v2.0 genome database and mass spectrometry identification of peptides in ligninolytic cultures reveal complex mixtures of secreted proteins.
A. Vanden Wymelenberg (2006)
Molecular cloning, characterization, and expression studies of a novel chitinase gene (ech30) from the mycoparasite Trichoderma atroviride strain P1.
S. Klemsdal (2006)
Contemporary Progress in Gene Structure Prediction
A. Churbanov (2006)
Meta-alignment of biological sequences
Enrique Blanco García (2006)
Roles of Wrky proteins in mediating the crosstalk of hormone signaling pathways: An approach integrating bioinformatics and experimental biology
Zhen Xie (2006)
Bioinformatic Tools for Gene and Protein Sequence Analysis
B. Rehm (2005)
Evaluation of five ab initio gene prediction programs for the discovery of maize genes
H. Yao (2005)
See more
Semantic Scholar Logo Some data provided by SemanticScholar