Online citations, reference lists, and bibliographies.
← Back to Search

HTSeq—a Python Framework To Work With High-throughput Sequencing Data

S. Anders, Paul Theodor Pyl, W. Huber
Published 2015 · Computer Science, Medicine

Cite This
Download PDF
Analyze on Scholarcy
Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability and implementation: HTSeq is released as an open-source software under the GNU General Public Licence and available from or from the Python Package Index at Contact:
This paper references
Software for computing
M. T. Morgan (2013)
Cython: The Best of Both Worlds
Stefan Behnel (2011)
FastQC – A quality control tool for high throughput
The NumPy Array: A Structure for Efficient Numerical Computation
S. Walt (2011)
Bioconductor: open software development for computational biology and bioinformatics
R. C. Gentleman (2004)
featurecounts: an efficient general
Y. Liao (2014)
Differential expression analysis for sequence count data
S. Anders (2010)
BEDTools: a flexible suite of utilities for comparing genomic features
Aaron R. Quinlan (2010)
2011 FastQC – A quality control tool for high throughput sequence data
S Andrews
pysam: samtools interface for python
A Heger
Trimmomatic: a flexible trimmer for Illumina sequence data
A. Bolger (2014)
FastQC – A quality control tool for high
S Radiant. REFERENCES Andrews (2011)
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
M. Love (2014)
P. Gács (1992)
Trimmomatic: a flexible
A. M. Bolger (2014)
Rsamtools : Binary alignment ( bam ) , variant call ( bcf ) , or tabix file import
A. R. Quinlan (2010)
edgeR: a Bioconductor package for differential expression analysis of digital gene expression data
M. Robinson (2010)
Moderated estimation of fold
M. I. Love (2014)
RNA-seq gene profiling
N. A. Fonseca (2014)
Biopython: freely available Python tools for computational molecular biology and bioinformatics
P. Cock (2009)
FastQC – A quality control tool for high throughput sequence data
S. Behnel (2011)
Pybedtools: a flexible
B. S. Pedersen (2011)
FastQC – A quality control tool for high throughput sequence data
S. Andrews (2011)
The C++ standard library
Nicolai M. Josuttis (1999)
Moderated estimation of fold change and dispersion for RNA - Seq data with DESeq 2 . bioRxiv
M. Morgan (2014)
SWIG: An Easy to Use Tool for Integrating Scripting Languages with C and C++
D. Beazley (1996)
featureCounts: an efficient general purpose program for assigning sequence reads to genomic features
Y. Liao (2014)
HTSeq offers a comprehensive solution to facilitate a wide range of programming tasks in the context of highthroughput sequencing data analysis
S. Anders (2010)
RNA-Seq Gene Profiling - A Systematic Empirical Comparison
Nuno A. Fonseca (2014)
Software for Computing and Annotating Genomic Ranges
M. Lawrence (2013)
Pybedtools: a flexible Python library for manipulating genomic datasets and annotations
R. Dale (2011)
Biopython: freely available Python tools for computational molecular biology and HTSeq – A Python framework to work with high-throughput sequencing data bioinformatics
P J Cock (2009)
The Sequence Alignment/Map format and SAMtools
Heng Li (2009)
RobiNA: a user-friendly, integrated software solution for RNA-Seq-based transcriptomics
Marc Lohse (2012)

This paper is referenced by
An hourglass pattern of inter-embryo gene expression variability and of histone regulation in fly embryogenesis
Jialin Liu (2019)
Aire Gene Influences the Length of the 3′ UTR of mRNAs in Medullary Thymic Epithelial Cells
Ernna Hérida Oliveira (2020)
Molecular signatures of X chromosome inactivation and associations with clinical outcomes in epithelial ovarian cancer.
S. Winham (2019)
Removal of H2Aub1 by ubiquitin-specific proteases 12 and 13 is required for stable Polycomb-mediated gene repression in Arabidopsis
Lejon Kralemann (2020)
Metatranscriptomics reveals mycoviral populations in the ovine rumen.
Thomas C. A. Hitch (2019)
Single-cell analyses identify distinct and intermediate states of zebrafish pancreatic islet development
Chong-Jian Lu (2018)
The importance of reporting house dust mite endotoxin abundance: impact on the lung transcriptome.
C. D. Pascoe (2020)
Evolution of gastrulation in cavefish: heterochronic cell movements and maternal factors
Jorge Torres-Paz (2018)
Brr6 plays a role in gene recruitment and transcriptional regulation at the nuclear envelope
Anne de Bruyn Kops (2018)
Tempo of gene regulation in wild and cultivated Vitis species shows coordination between cold deacclimation and budbreak.
A. P. Kovaleski (2019)
次世代シーケンサーデータの解析手法 第4 回 クオリティコントロールとプログラムのインストール
建強 孫 (2015)
Genomic limitations to RNA sequencing expression profiling.
C. Hirsch (2015)
Transcriptomic analysis identifies gene networks regulated by estrogen receptor α (ERα) and ERβ that control distinct effects of different botanical estrogens
Ping Gong (2014)
Transcriptional and Translational Landscape of Equine Torovirus
H. Stewart (2018)
DegPack: a web package using a non-parametric and information theoretic algorithm to identify differentially expressed genes in multiclass RNA-seq samples.
Jaehyun An (2014)
Systemic Wound Healing Associated with local sub-Cutaneous Mechanical Stimulation
C. Nardini (2016)
The genome of the Gulf pipefish enables understanding of evolutionary innovations
C. Small (2016)
Genome-Wide Analysis of lncRNA and mRNA Expression During Differentiation of Abdominal Preadipocytes in the Chicken
Tao Zhang (2017)
Cripto is essential to capture mouse epiblast stem cell and human embryonic stem cell pluripotency
A. Fiorenzano (2016)
Radiogenomic Analysis of Oncological Data: A Technical Survey
M. Incoronato (2017)
AChR antibodies show a complex interaction with human skeletal muscle cells in a transcriptomic study
Yu Chan Hong (2020)
Systematic chemical and molecular profiling of MLL-rearranged infant acute lymphoblastic leukemia reveals efficacy of romidepsin
M. Cruickshank (2017)
Transcriptome-wide analysis of the Trypanosoma cruzi proliferative cycle identifies the periodically expressed mRNAs and their multiple levels of control
Santiago Chávez (2017)
Phytophthora infestans Argonaute 1 binds microRNA and small RNAs from effector genes and transposable elements.
Anna K. M. Åsman (2016)
Immunophenotyping of rheumatoid arthritis reveals a linkage between HLA-DRB1 genotype, CXCR4 expression on memory CD4+ T cells, and disease activity
Y. Nagafuchi (2016)
From Big Data Analytics and Network Inference to Systems Modeling
Paweł Michalak (2016)
Bone-in-culture array as a platform to model early-stage bone metastases and discover anti-metastasis therapies
H. Wang (2017)
Premature termination codons signaled targeted gene repair by nonsense mRNA-mediated gene editing in E. coli
Xiaolong Wang (2017)
High resolution temporal transcriptomics of mouse embryoid body development reveals complex expression dynamics of coding and noncoding loci
B. Gloss (2017)
MicroRNAs contribute to postnatal development of laminar differences and neuronal subtypes in the rat medial entorhinal cortex
Lene C Olsen (2017)
Transcriptomic analysis of chicken Myozenin 3 regulation reveals its potential role in cell proliferation
Maosen Ye (2017)
Differential gene expression in the evolution of sex pheromone communication in New Zealand’s endemic leafroller moths of the genera Ctenopseustis and Planotortrix
Alessandro Grapputo (2018)
See more
Semantic Scholar Logo Some data provided by SemanticScholar