Online citations, reference lists, and bibliographies.
Please confirm you are human
(Sign Up for free to never see this)
← Back to Search

A Comparison Of Expressed Sequence Tags (ESTs) To Human Genomic Sequences.

T. Wolfsberg, D. Landsman
Published 1997 · Biology, Medicine

Save to my Library
Download PDF
Analyze on Scholarcy
Share
The Expressed Sequence Tag (EST) division of GenBank, dbEST, is a large repository of the data being generated by human genome sequencing centers. ESTs are short, single pass cDNA sequences generated from randomly selected library clones. The approximately 415 000 human ESTs represent a valuable, low priced, and easily accessible biological reagent. As many ESTs are derived from yet uncharacterized genes, dbEST is a prime starting point for the identification of novel mRNAs. Conversely, other genes are represented by hundreds of ESTs, a redundancy which may provide data about rare mRNA isoforms. Here we present an analysis of >1000 ESTs generated by the WashU-Merck EST project. These ESTs were collected by querying dbEST with the genomic sequences of 15 human genes. When we aligned the matching ESTs to the genomic sequences, we found that in one gene, 73% of the ESTs which derive from spliced or partially spliced transcripts either contain intron sequences or are spliced at previously unreported sites; other genes have lower percentages of such ESTs, and some have none. This finding suggests that ESTs could provide researchers with novel information about alternative splicing in certain genes. In a related analysis of pairs of ESTs which are reported to derive from a single gene, we found that as many as 26% of the pairs do not BOTH align with the sequence of the same gene. We suspect that some of these unusual ESTs result from artifacts in EST generation, and caution researchers that they may find such clones while analyzing sequences in dbEST.
This paper references
Methods Enzymol
G D Schuler (1996)
10.1093/bioinformatics/6.4.373
A space-efficient algorithm for local similarities
X. Huang (1990)
J. Cell. Biol
J R Mcintosh (1995)
10.1126/SCIENCE.8091218
Gene discovery in dbEST.
M. Boguski (1994)
10.1101/GR.6.9.829
Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data.
J. Aaronson (1996)
10.1038/NG0893-332
dbEST — database for “expressed sequence tags”
M. Boguski (1993)
10.1126/science.274.5287.540
A Gene Map of the Human Genome
G. Schuler (1996)
Genome Res
Fatima Bonaldo (1996)
Comput. Chem
J Claverie (1993)
1979 Hawaii International Conference on System Sciences.
(1979)
Genome Res
L Hillier (1996)
Trends Biochem. Sci
M S Boguski (1995)
10.1093/bioinformatics/11.2.147
A local alignment tool for very long DNA sequences
K. Chao (1995)
10.1016/0092-8674(93)90586-F
Cell
AC Tose (1993)
J. Mol. Biol
S F Altschul (1990)
10.1101/GR.6.9.791
Normalization and subtraction: two approaches to facilitate gene discovery.
M. Bonaldo (1996)
Genome Res
S M Tilghman (1996)
10.1016/S0076-6879(96)66012-1
Entrez: molecular biology database and retrieval system.
G. Schuler (1996)
10.1101/GR.5.3.272
The Genexpress Index: a resource for gene discovery and the genic map of the human genome.
R. Houlgatte (1995)
10.1126/science.2047873
Complementary DNA sequencing: expressed sequence tags and human genome project
M.D. Adams (1991)
10.1016/S0092-8674(00)80099-9
Budding Yeast SKP1 Encodes an Evolutionarily Conserved Kinetochore Protein Required for Cell Cycle Progression
C. Connelly (1996)
10.1101/GR.6.9.807
Generation and analysis of 280,000 human expressed sequence tags.
L. Hillier (1996)
Nature Genet
M S Boguski (1993)
10.1126/science.270.5242.1667
Conserved Initiator Proteins in Eukaryotes
K. A. Gavin (1995)
Comput. Appl. Biosci
Huang (1990)
10.1016/S0968-0004(00)89051-9
The turning point in genome research.
M. Boguski (1995)
10.1101/GR.6.9.773
Lessons learned, promises kept: a biologist's eye view of the Genome Project.
S. Tilghman (1996)
10.1016/0097-8485(93)85010-A
Information Enhancement Methods for Large Scale Sequence Analysis
J. Claverie (1993)
Nature Biotechnol
W Bains (1996)
Genome Res
J S Aaronson (1996)
10.1083/JCB.131.6.1361
A cell biological perspective on genome research
J. McIntosh (1995)
10.1016/S0022-2836(05)80360-2
Basic local alignment search tool.
S. Altschul (1990)
Jr Science
M S Boguski (1993)
Genome Res
R Houlgatte (1995)
10.1038/NBT0696-711
Virtually sequenced: The next genomic generation
W. Bains (1996)



This paper is referenced by
10.1093/NAR/29.21.E102
TissueInfo: high-throughput identification of tissue expression profiles and specificity.
L. Skrabanek (2001)
Analysis of ovine uterus Expressed Sequence Tags (ESTs).
A. H. M. Suhaimi (2012)
10.1101/GR.695703
Reevaluating human gene annotation: a second-generation analysis of chromosome 22.
J. Collins (2003)
10.1101/GR.1590904
Mammalian overlapping genes: the comparative perspective.
Vamsi Veeramachaneni (2004)
10.1016/S0065-3233(00)54012-1
Individual variation in protein-coding sequences of human genome.
S. Sunyaev (2000)
10.1016/j.compbiolchem.2004.12.006
Overlapping genes in vertebrate genomes
I. Makałowska (2005)
Biotechnology in Coffee Research
J. Armando Muñoz-Sanchez (2009)
Splice variants but not mutations of DNA polymerase beta are common in bladder cancer.
T. E. Thompson (2002)
10.1590/S1415-47572010000400031
In silico identification of coffee genome expressed sequences potentially associated with resistance to diseases
S. Alvarenga (2010)
10.1111/J.1365-2109.2009.02355.X
Identifications of expressed sequence tags from Pacific threadfin (Polydactylus sexfilis) skeletal muscle cDNA library
S. Watanabe (2010)
10.1093/NAR/GKF633
Comparison of whole genome assemblies of the human genome.
E. Rouchka (2002)
10.1093/NAR/GKG770
Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies.
B. Haas (2003)
10.1093/nar/gki893
A genome-wide survey demonstrates widespread non-linear mRNA in expressed sequences from multiple species
R. Dixon (2005)
Human Genome Computational Inference of Homologous Gene Structures in the
Ru-Fang Yeh (2001)
10.1016/S1366-2120(08)70108-9
Identification of genetic polymorphisms in the 'expressed sequence tag' (EST) database
L. Forsberg (1998)
10.1023/A:1013765922672
Microarray data quality analysis: lessons from the AFGC project
D. Finkelstein (2004)
ORF at each locus Genome annotation past, present, and future: How to define an
M. Brent (2007)
Low detection of exon skipping in mouse genes orthologous to human genes on chromosome 22
Tzu-Ming Chern (2002)
10.1023/A:1020288812170
DNA Polymorphism in Population Genetics
Y. P. Altukhov (2004)
Identifikation und Katalogisierung retinal exprimierter Gene
Eberhard Karls (2006)
10.1007/978-94-017-0443-4
Evolutionary Theory and Processes: Modern Horizons
S. Wasser (2004)
10.1007/0-306-46823-9_8
Sequence Similarity Based Gene Prediction
R. Guigó (2002)
10.1016/S0378-1119(03)00501-8
Identification of 9 novel transcripts and two RGSL genes within the hereditary prostate cancer region (HPC1) at 1q25.
A. Silva (2003)
10.1385/1-59259-046-2:025
Strategies for cloning new MMPs and TIMPs.
G. Velasco (2001)
10.1038/76115
Analysis of expressed sequence tags indicates 35,000 human genes
Brent Ewing (2000)
10.1186/1471-2229-9-61
GarlicESTdb: an online database and mining tool for garlic EST sequences
D. Kim (2008)
10.1016/S0168-9525(99)01709-6
Comparison of gene indexing databases.
J. Bouck (1999)
10.1073/PNAS.95.14.8175
Identification of genes expressed in human CD34(+) hematopoietic stem/progenitor cells by expressed sequence tags and efficient full-length cDNA cloning.
M. Mao (1998)
Transcript map of the 3.7-Mb D19S112-D19S246 candidate tumor suppressor region on the long arm of chromosome 19.
C. Hartmann (2002)
10.1016/j.febslet.2004.12.046
Comparison of the current RefSeq, Ensembl and EST databases for counting genes and gene discovery
T. Larsson (2005)
10.1038/10297
Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis
M. Halushka (1999)
10.1016/S0167-4781(98)00211-5
Cloning of a third member of the D52 gene family indicates alternative coding sequence usage in D52-like transcripts.
C. R. Nourse (1998)
See more
Semantic Scholar Logo Some data provided by SemanticScholar