Online citations, reference lists, and bibliographies.
← Back to Search

HTSeq—a Python Framework To Work With High-throughput Sequencing Data

S. Anders, Paul Theodor Pyl, W. Huber
Published 2015 · Computer Science, Medicine

Cite This
Download PDF
Analyze on Scholarcy
Share
Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability and implementation: HTSeq is released as an open-source software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. Contact: sanders@fs.tum.de
This paper references
Software for computing
M. T. Morgan (2013)
10.1109/MCSE.2010.118
Cython: The Best of Both Worlds
Stefan Behnel (2011)
FastQC – A quality control tool for high throughput
(2011)
10.1109/MCSE.2011.37
The NumPy Array: A Structure for Efficient Numerical Computation
S. Walt (2011)
10.1186/gb-2004-5-10-r80
Bioconductor: open software development for computational biology and bioinformatics
R. C. Gentleman (2004)
featurecounts: an efficient general
Y. Liao (2014)
10.1186/gb-2010-11-10-r106
Differential expression analysis for sequence count data
S. Anders (2010)
10.1093/bioinformatics/btq033
BEDTools: a flexible suite of utilities for comparing genomic features
Aaron R. Quinlan (2010)
2011 FastQC – A quality control tool for high throughput sequence data
S Andrews
pysam: samtools interface for python
A Heger
10.1093/bioinformatics/btu170
Trimmomatic: a flexible trimmer for Illumina sequence data
A. Bolger (2014)
FastQC – A quality control tool for high
S Radiant. REFERENCES Andrews (2011)
10.1186/s13059-014-0550-8
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
M. Love (2014)
10.1145/147135.1040001
Algorithms
P. Gács (1992)
Trimmomatic: a flexible
A. M. Bolger (2014)
Rsamtools : Binary alignment ( bam ) , variant call ( bcf ) , or tabix file import
A. R. Quinlan (2010)
10.1093/bioinformatics/btp616
edgeR: a Bioconductor package for differential expression analysis of digital gene expression data
M. Robinson (2010)
Moderated estimation of fold
M. I. Love (2014)
RNA-seq gene profiling
N. A. Fonseca (2014)
10.1093/bioinformatics/btp163
Biopython: freely available Python tools for computational molecular biology and bioinformatics
P. Cock (2009)
FastQC – A quality control tool for high throughput sequence data
S. Behnel (2011)
Pybedtools: a flexible
B. S. Pedersen (2011)
FastQC – A quality control tool for high throughput sequence data
S. Andrews (2011)
The C++ standard library
Nicolai M. Josuttis (1999)
Moderated estimation of fold change and dispersion for RNA - Seq data with DESeq 2 . bioRxiv
M. Morgan (2014)
SWIG: An Easy to Use Tool for Integrating Scripting Languages with C and C++
D. Beazley (1996)
10.1093/bioinformatics/btt656
featureCounts: an efficient general purpose program for assigning sequence reads to genomic features
Y. Liao (2014)
HTSeq offers a comprehensive solution to facilitate a wide range of programming tasks in the context of highthroughput sequencing data analysis
S. Anders (2010)
10.1371/journal.pone.0107026
RNA-Seq Gene Profiling - A Systematic Empirical Comparison
Nuno A. Fonseca (2014)
10.1371/journal.pcbi.1003118
Software for Computing and Annotating Genomic Ranges
M. Lawrence (2013)
10.1093/bioinformatics/btr539
Pybedtools: a flexible Python library for manipulating genomic datasets and annotations
R. Dale (2011)
Biopython: freely available Python tools for computational molecular biology and HTSeq – A Python framework to work with high-throughput sequencing data bioinformatics
P J Cock (2009)
10.1093/bioinformatics/btp352
The Sequence Alignment/Map format and SAMtools
Heng Li (2009)
10.1093/nar/gks540
RobiNA: a user-friendly, integrated software solution for RNA-Seq-based transcriptomics
Marc Lohse (2012)



This paper is referenced by
10.1101/700997
An hourglass pattern of inter-embryo gene expression variability and of histone regulation in fly embryogenesis
Jialin Liu (2019)
10.3389/fimmu.2020.01039
Aire Gene Influences the Length of the 3′ UTR of mRNAs in Medullary Thymic Epithelial Cells
Ernna Hérida Oliveira (2020)
10.1093/hmg/ddy444
Molecular signatures of X chromosome inactivation and associations with clinical outcomes in epithelial ovarian cancer.
S. Winham (2019)
10.1186/s13059-020-02062-8
Removal of H2Aub1 by ubiquitin-specific proteases 12 and 13 is required for stable Polycomb-mediated gene repression in Arabidopsis
Lejon Kralemann (2020)
10.1093/femsle/fnz161
Metatranscriptomics reveals mycoviral populations in the ovine rumen.
Thomas C. A. Hitch (2019)
10.1093/jmcb/mjy064
Single-cell analyses identify distinct and intermediate states of zebrafish pancreatic islet development
Chong-Jian Lu (2018)
10.1152/ajplung.00103.2020
The importance of reporting house dust mite endotoxin abundance: impact on the lung transcriptome.
C. D. Pascoe (2020)
10.1101/410563
Evolution of gastrulation in cavefish: heterochronic cell movements and maternal factors
Jorge Torres-Paz (2018)
10.1091/mbc.E18-04-0258
Brr6 plays a role in gene recruitment and transcriptional regulation at the nuclear envelope
Anne de Bruyn Kops (2018)
10.1016/J.PLANTSCI.2019.110178
Tempo of gene regulation in wild and cultivated Vitis species shows coordination between cold deacclimation and budbreak.
A. P. Kovaleski (2019)
10.4109/JSLAB.26.124
次世代シーケンサーデータの解析手法 第4 回 クオリティコントロールとプログラムのインストール
建強 孫 (2015)
10.1111/tpj.13014
Genomic limitations to RNA sequencing expression profiling.
C. Hirsch (2015)
10.1621/nrs.12001
Transcriptomic analysis identifies gene networks regulated by estrogen receptor α (ERα) and ERβ that control distinct effects of different botanical estrogens
Ping Gong (2014)
10.1128/JVI.00589-18
Transcriptional and Translational Landscape of Equine Torovirus
H. Stewart (2018)
10.1016/j.ymeth.2014.06.004
DegPack: a web package using a non-parametric and information theoretic algorithm to identify differentially expressed genes in multiclass RNA-seq samples.
Jaehyun An (2014)
10.1038/srep39043
Systemic Wound Healing Associated with local sub-Cutaneous Mechanical Stimulation
C. Nardini (2016)
10.1186/s13059-016-1126-6
The genome of the Gulf pipefish enables understanding of evolutionary innovations
C. Small (2016)
10.1534/g3.116.037069
Genome-Wide Analysis of lncRNA and mRNA Expression During Differentiation of Abdominal Preadipocytes in the Chicken
Tao Zhang (2017)
10.1038/ncomms12589
Cripto is essential to capture mouse epiblast stem cell and human embryonic stem cell pluripotency
A. Fiorenzano (2016)
10.3390/ijms18040805
Radiogenomic Analysis of Oncological Data: A Technical Survey
M. Incoronato (2017)
10.1038/s41598-020-68185-x
AChR antibodies show a complex interaction with human skeletal muscle cells in a transcriptomic study
Yu Chan Hong (2020)
10.1038/leu.2016.165
Systematic chemical and molecular profiling of MLL-rearranged infant acute lymphoblastic leukemia reveals efficacy of romidepsin
M. Cruickshank (2017)
10.1371/journal.pone.0188441
Transcriptome-wide analysis of the Trypanosoma cruzi proliferative cycle identifies the periodically expressed mRNAs and their multiple levels of control
Santiago Chávez (2017)
10.1111/nph.13946
Phytophthora infestans Argonaute 1 binds microRNA and small RNAs from effector genes and transposable elements.
Anna K. M. Åsman (2016)
10.1038/srep29338
Immunophenotyping of rheumatoid arthritis reveals a linkage between HLA-DRB1 genotype, CXCR4 expression on memory CD4+ T cells, and disease activity
Y. Nagafuchi (2016)
10.1016/B978-0-12-803697-6.00007-2
From Big Data Analytics and Network Inference to Systems Modeling
Paweł Michalak (2016)
10.1038/ncomms15045
Bone-in-culture array as a platform to model early-stage bone metastases and discover anti-metastasis therapies
H. Wang (2017)
10.1101/069971
Premature termination codons signaled targeted gene repair by nonsense mRNA-mediated gene editing in E. coli
Xiaolong Wang (2017)
10.1038/s41598-017-06110-5
High resolution temporal transcriptomics of mouse embryoid body development reveals complex expression dynamics of coding and noncoding loci
B. Gloss (2017)
10.1007/s00429-017-1389-z
MicroRNAs contribute to postnatal development of laminar differences and neuronal subtypes in the rat medial entorhinal cortex
Lene C Olsen (2017)
10.1371/journal.pone.0189476
Transcriptomic analysis of chicken Myozenin 3 regulation reveals its potential role in cell proliferation
Maosen Ye (2017)
10.1186/s12864-018-4451-1
Differential gene expression in the evolution of sex pheromone communication in New Zealand’s endemic leafroller moths of the genera Ctenopseustis and Planotortrix
Alessandro Grapputo (2018)
See more
Semantic Scholar Logo Some data provided by SemanticScholar