Online citations, reference lists, and bibliographies.
← Back to Search

Automated Gene Identification In Large-Scale Genomic Sequences

Y. Xu, E. Uberbacher
Published 1997 · Biology, Medicine, Computer Science

Save to my Library
Download PDF
Analyze on Scholarcy Visualize in Litmaps
Share
Reduce the time it takes to create your bibliography by a factor of 10 by using the world’s favourite reference manager
Time to take this seriously.
Get Citationsy
Computational methods for gene identification in genomic sequences typically have two phases: coding region recognition and gene parsing. While there are a number of effective methods for recognizing coding regions (exons), parsing the recognized exons into proper gene structures, to a large extent, remains an unsolved problem. We have developed a computer program which can automatically parse the recognized exons into gene models that are most consistent with the available Expressed Sequence Tags (ESTs) and a set of biological heuristics, derived empirically. The gene modeling algorithm used in this program provides a general framework for applying EST information so the modeling accuracy improves as the amount of available EST information increases. Based on preliminary tests on a number of large DNA sequences, using the dbEST database, we have observed that the algorithm can (1) accurately model complicated multiple gene structures, including embedded genes, (2) identify falsely-recognized exons and locate missed exons by the initial exon recognition phase, and (3) make more accurate exon boundary predictions, if the necessary EST information is available. We have extended this EST-based gene modeling algorithm to model genes on unfinished DNA contigs at the end of the shotgun sequencing. This extended version can automatically determine the orientations and the relative order of the DNA contigs (with gaps between them) using the available ESTs as reference models, before the gene modeling phase.
This paper references
10.1002/j.1460-2075.1983.tb01380.x
Cloning of several cDNA segments coding for human liver proteins.
F. Costanzo (1983)
Cloning of several cDNA segments
G. Raugel (1983)
10.1093/nar/16.5.1861
The GenBank genetic sequence data bank.
H. Bilofsky (1988)
10.1073/PNAS.85.8.2444
Improved tools for biological sequence comparison.
W. Pearson (1988)
Improved tools for
D. J. Lipman (1988)
10.1016/S0022-2836(05)80360-2
Basic local alignment search tool.
S. Altschul (1990)
10.1093/NAR/19.8.1837
Use of 3' untranslated sequences of human cDNAs for rapid chromosome assignment and conversion to STSs: implications for an expression map of the genome.
A. S. Wilcox (1991)
10.1126/science.2047873
Complementary DNA sequencing: expressed sequence tags and human genome project
M. Adams (1991)
10.1073/PNAS.88.24.11261
Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach.
E. Uberbacher (1991)
10.1016/0022-2836(92)90130-C
Prediction of gene structure.
R. Guigó (1992)
Prediction of gene
R. Guigo (1992)
10.1016/0303-2647(93)90069-O
Prediction of the exon-intron structure by a dynamic programming approach.
M. Gelfand (1993)
10.1038/NG0893-332
dbEST — database for “expressed sequence tags”
M. Boguski (1993)
10.1093/NAR/21.3.607
Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks.
E. E. Snyder (1993)
10.1093/NAR/22.24.5156
Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames.
V. Solovyev (1994)
10.1093/bioinformatics/10.6.613
Constructing gene models from accurately predicted exons: an application of dynamic programming
Y. Xu (1994)
10.1006/GENO.1994.1541
Gene structure prediction by linguistic methods.
Sai-keung Dong (1994)
Constructing gene models from a set of accurately predicted exons : An application of dynamic programming
Y. Xu (1994)
Gene structure
S. Dong (1994)
Predicting internal exons
V. V. Solovyev (1994)
Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence.
M. Adams (1995)
10.1101/GR.5.3.272
The Genexpress Index: a resource for gene discovery and the genic map of the human genome.
R. Houlgatte (1995)
The merck gene index: A public resource
A. R. Williamson (1995)
The geneexpress index: A resource
R. Mairage-Samson (1995)
The merck gene index : A public resource for genomics research
A. R. Williamson (1995)
Initial assessment of human gene
Adams (1995)
10.1006/GENO.1996.0298
Evaluation of gene structure prediction programs.
M. Burset (1996)
10.1101/GR.6.9.829
Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data.
J. Aaronson (1996)
10.1073/PNAS.93.17.9061
Gene recognition via spliced sequence alignment.
M. Gelfand (1996)
Gene Prediction by Pattern Recognition and Homology Search
Y. Xu (1996)
10.1109/5.537117
GRAIL: a multi-agent neural network system for gene identification
Ying Xu (1996)
A generalized hidden markov models for the representation of human genes in DNA
D. Kulp (1996)
Gene prediction by pattern recognition
E. C. Uberbacher (1996)
Evaluation of gene structure
M. Burset (1996)



This paper is referenced by
10.1016/B978-0-444-59576-8.00028-X
Sequence Analysis of Industrially Important Genes from Trichoderma
A. M. El-Bondkly (2014)
10.1002/9780471650126.DOB0277.PUB2
Gene Prediction, homology‐based (Extrinsic Gene Prediction, Look‐Up Gene Prediction)
Enrique Blanco (2014)
Fishing and Fish Consumption Patterns in the Gullah/Geechee Sea Island Population
J. H. Ellis (2013)
10.1186/1471-2105-15-229
SnowyOwl: accurate prediction of fungal genes by using RNA-Seq and homology information to select among ab initio models
I. Reid (2014)
10.4018/jamc.2012010104
Reinforcement Learning for Improving Gene Identification Accuracy by Combination of Gene-Finding Programs
P. Yin (2012)
10.1016/j.bbagrm.2011.09.001
Functional characterization of the human translocator protein (18kDa) gene promoter in human breast cancer cell lines.
Amani M. Batarseh (2012)
Multicofactor proteins: structure,prediction, function
S. Hearnshaw (2011)
10.1007/978-1-4419-1541-2_4
The Populus Genome and Comparative Genomics
C. Douglas (2010)
10.1016/j.jmgm.2009.08.010
Genome wide analysis and comparative docking studies of new diaryl furan derivatives against human cyclooxygenase-2, lipoxygenase, thromboxane synthase and prostacyclin synthase enzymes involved in inflammatory pathway.
P. N. Sekhar (2009)
Classification d'ARN codants et d'ARN non-codants
A. Fontaine (2009)
Analysis of the NAC gene family in Triticum aestivum employing novel bioinformatics tools
D. Oren (2008)
Bioinformatic environment for analyzing large scale genome sequence
Krzysztof Sarapata (2008)
10.1186/gb-2008-9-10-r152
Improved genome assembly and evidence-based global gene model set for the chordate Ciona intestinalis: new insight into intron and operon populations
Y. Satou (2008)
10.1007/978-1-60327-375-6_22
Gene/Protein Sequence Analysis
B. Rehm (2008)
Y / ORNLJCP-94756 An Editing Environment for DNA ! & qmce Analysis and Annotation *
Uberbacher (2008)
10.1016/j.neucom.2007.07.019
An artificial neural network method for combining gene prediction based on equitable weights
Y. Zhou (2008)
10.1186/gb-2008-9-11-r160
Characterization of the differentially methylated region of the Impact gene that exhibits Glires-specific imprinting
Kohji Okamura (2008)
10.1007/BF02703679
Ab initio gene identification: Prokaryote genome annotation with GeneScan and GLIMMER
G. Aggarwal (2002)
Statistical Feature Selection
R. Nilsson (2007)
10.1038/sj.ejhg.5201761
A seventh locus for otosclerosis, OTSC7, maps to chromosome 6q13–16.1
M. Thys (2007)
10.1016/J.PLANTSCI.2006.11.019
Genome-wide analysis and identification of genes related to potassium transporter families in rice (Oryza sativa L.)
R. Amrutha (2007)
Statistical Feature Selection : With Applications in Life Science
R. Nilsson (2007)
10.1128/MCB.02351-05
BAL1 and BBAP Are Regulated by a Gamma Interferon-Responsive Bidirectional Promoter and Are Overexpressed in Diffuse Large B-Cell Lymphomas with a Prominent Inflammatory Infiltrate
Przemysław Juszczyński (2006)
10.1126/science.1128691
The Genome of Black Cottonwood, Populus trichocarpa (Torr. & Gray)
G. Tuskan (2006)
Bioinformatics for plant genome annotation
M. Fiers (2006)
10.1016/J.FGB.2006.01.003
Computational analysis of the Phanerochaete chrysosporium v2.0 genome database and mass spectrometry identification of peptides in ligninolytic cultures reveal complex mixtures of secreted proteins.
A. Vanden Wymelenberg (2006)
10.1111/J.1574-6968.2006.00132.X
Molecular cloning, characterization, and expression studies of a novel chitinase gene (ech30) from the mycoparasite Trichoderma atroviride strain P1.
S. Klemsdal (2006)
10.2174/138920206778604395
Contemporary Progress in Gene Structure Prediction
A. Churbanov (2006)
Meta-alignment of biological sequences
Enrique Blanco García (2006)
Roles of Wrky proteins in mediating the crosstalk of hormone signaling pathways: An approach integrating bioinformatics and experimental biology
Zhen Xie (2006)
10.1385/1-59259-870-6:387
Bioinformatic Tools for Gene and Protein Sequence Analysis
B. Rehm (2005)
10.1007/s11103-005-0271-1
Evaluation of five ab initio gene prediction programs for the discovery of maize genes
H. Yao (2005)
See more
Semantic Scholar Logo Some data provided by SemanticScholar