Online citations, reference lists, and bibliographies.

Speech Recognition With Weighted Finite-State Transducers

M. Mohri, F. Pereira, M. Riley
Published 2008 · Computer Science

Cite This
Download PDF
Analyze on Scholarcy
Share
This chapter describes a general representation and algorithmic framework for speech recognition based on weighted finite-state transducers. These transducers provide a common and natural representation for major components of speech recognition systems, including hidden Markov models (HMMs), context-dependency models, pronunciation dictionaries, statistical grammars, and word or phone lattices. General algorithms for building and optimizing transducer models are presented, including composition for combining models, weighted determinization and minimization for optimizing time and space requirements, and a weight pushing algorithm for redistributing transition weights optimally for speech recognition. The application of these methods to large-vocabulary recognition tasks is explained in detail, and experimental results are given, in particular for the North American Business News (NAB) task, in which these methods were used to combine HMMs, full cross-word triphones, a lexicon of 40000 words, and a large trigram grammar into a single weighted transducer that is only somewhat larger than the trigram word grammar and that runs NAB in real time on a very simple decoder. Another example demonstrates that the same methods can be used to optimize lattices for second-pass recognition.
This paper references
10.1007/978-3-642-59126-6_10
Digital Images and Formal Languages
K. Culik (1997)
Weighted Automata in Text and Speech Processing
M. Mohri (2005)
10.1109/ICSLP.1996.607084
Scalable backoff language models
K. Seymore (1996)
10.3115/1218955.1218963
Statistical Modeling for Unit Selection in Speech Synthesis
M. Mohri (2004)
10.1007/978-3-663-09367-1
Transductions and context-free languages
J. Berstel (1979)
10.1016/0096-0551(80)90011-9
Introduction to Automata Theory, Languages and Computation
J. Hopcroft (1979)
10.1016/S0167-6393(98)00043-0
Network optimizations for large-vocabulary speech recognition
M. Mohri (1999)
10.2307/417760
Finite-State Language Processing
Emmanuel Roche (1997)
A weight pushing algorithm for large vocabulary speech recognition
M. Mohri (2001)
10.1007/978-94-015-9719-7
Robustness in language and speech technology
J. Junqua (2001)
10.1007/978-3-642-59126-6
Handbook of Formal Languages
G. Rozenberg (1997)
Finite-State Transducers in Language and Speech Processing
M. Mohri (1997)
10.1007/978-3-642-69959-7
Semirings, Automata, Languages
W. Kuich (1985)
10.3115/981658.981661
The Replace Operator
L. Karttunen (1995)
10.1007/978-1-4612-6264-0
Automata-Theoretic Aspects of Formal Power Series
A. Salomaa (1978)
Integrated context-dependent networks in very large vocabulary speech recognition
M. Mohri (1999)
Towards automatic closed captioning : low latency real time broadcast news transcription
M. Saraçlar (2002)
A comparison of two LVR search optimization techniques
Stephan Kanthak (2002)
10.1109/TASSP.1987.1165125
Estimation of probabilities from sparse data for the language model component of a speech recognizer
S. M. Katz (1987)
10.1016/S0022-0000(71)80005-3
Realizations by Stochastic Finite Automata
J. Carlyle (1971)
10.1016/S0304-3975(99)00014-6
The Design Principles of a Weighted Finite-State Transducer Library
M. Mohri (2000)
10.5860/choice.32-5696
Text Algorithms
M. Crochemore (1994)
10.1007/978-3-540-76336-9_3
OpenFst: A General and Efficient Weighted Finite-State Transducer Library
Cyril Allauzen (2007)
Transducer composition for context-dependent network expansion
M. Riley (1997)
10.1109/ICSLP.1996.607215
Language-model look-ahead for large vocabulary speech recognition
S. Ortmanns (1996)
10.1109/ICASSP.1998.675352
Full expansion of context-dependent networks in large vocabulary speech recognition
M. Mohri (1998)
10.1006/csla.1995.0010
Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
C. Leggetter (1995)
10.3115/981344.981376
Finite-State Approximation of Phrase Structure Grammars
Fernando C Pereira (1991)
10.25596/jalc-2002-321
Semiring Frameworks and Algorithms for Shortest-Distance Problems
M. Mohri (2002)
10.3115/980691.980716
Dynamic Compilation of Weighted Context-Free Grammars
M. Mohri (1998)
Regular Models of Phonological Rule Systems
R. Kaplan (1994)
10.1016/0304-3975(92)90142-3
Minimisation of Acyclic Deterministic Automata in Linear Time
Dominique Revuz (1992)
10.1016/S0167-6393(99)00037-0
Stochastic pronunciation modelling from hand-labelled phonetic corpora
M. Riley (1999)
10.1007/978-3-642-73235-5
Rational series and their languages
J. Berstel (1988)
10.1016/0304-3975(77)90056-1
Algebraic Structures for Transitive Closure
D. Lehmann (1977)
The Design and Analysis of Computer Algorithms
A. Aho (1974)
Efficient general lattice generation and rescoring
A. Ljolje (1999)
10.1162/089120100561610
Practical Experiments with Regular Approximation of Context-Free Languages
M. Nederhof (2000)
10.1109/ICASSP.2004.1326097
A generalized construction of integrated speech recognition transducers
Cyril Allauzen (2004)
10.25596/jalc-2003-117
Efficient Algorithms for Testing the Twins Property
Cyril Allauzen (2003)
10.1017/S1351324997001654
Multilingual text analysis for text-to-speech synthesis
R. Sproat (1996)
10.1016/j.tcs.2004.07.003
An optimal pre-determinization algorithm for weighted transducers
Cyril Allauzen (2004)
10.1007/978-94-015-9719-7_6
Regular Approximation of Context-Free Grammars through Transformation
M. Mohri (2001)
Compilers: Principles, Techniques, and Tools
A. Aho (1986)



This paper is referenced by
10.3115/v1/P14-1001
Learning Ensembles of Structured Prediction Rules
Corinna Cortes (2014)
10.1007/978-3-319-07569-3_11
EAR-TUKE: The Acoustic Event Detection System
Martin Lojka (2014)
Multilabel Classification through Structured Output Learning - Methods and Applications
Hongyu Su (2015)
10.1109/ICDAR.2017.91
An Open Vocabulary OCR System with Hybrid Word-Subword Language Models
Meng Cai (2017)
10.18653/v1/N16-1079
Phonological Pun-derstanding
Aaron Jaech (2016)
Boosting Ensembles of Structured Prediction Rules ∗
Corinna Cortes (2014)
Accurate and compact large vocabulary speech recognition on mobile devices
Xin Lei (2013)
Multimodal Interfaces for Active and Assisted Living Systems
A. Florea (2019)
10.1109/ICASSP.2016.7472820
Personalized speech recognition on mobile devices
Ian McGraw (2016)
10.1007/978-3-319-24033-6_49
Phonetic Segmentation Using KALDI and Reduced Pronunciation Detection in Causal Czech Speech
Zdenek Patc (2015)
New Baseline in Automatic Speech Recognition for Northern Sámi
New (2017)
10.1016/j.tcs.2016.08.019
A disambiguation algorithm for weighted automata
M. Mohri (2017)
10.1587/TRANSINF.E95.D.614
Bayesian Learning of a Language Model from Continuous Speech
Graham Neubig (2012)
Speech Emotion Recognition of Old People based on Weighted Finite- State Transducer
W. Li (2017)
Investigations on acoustic model speaker adaptation
J. Lööf (2016)
10.1007/978-3-642-15760-8_34
Towards the Optimal Minimization of a Pronunciation Dictionary Model
Simon Dobrisek (2010)
10.1109/ICASSP.2015.7178962
A keyword-aware grammar framework for LVCSR-based spoken keyword search
I-Fan Chen (2015)
10.18653/v1/N19-1096
The problem with probabilistic DAG automata for semantic graphs
Ieva Vasiljeva (2019)
A S ] 1 2 M ar 2 02 0 HYBRID AUTOREGRESSIVE TRANSDUCER ( HAT )
Ehsan Variani (2020)
Bringing contextual information to google speech recognition
P. Aleksic (2015)
10.1016/j.csl.2013.04.006
Direct construction of compact context-dependency transducers from data
David Rybach (2010)
10.1016/j.ic.2009.10.008
Subsequential transducers: a coalgebraic perspective
Helle Hvid Hansen (2010)
10.1109/ICASSP.2012.6288901
Joining advantages of word-conditioned and token-passing decoding
David Nolden (2012)
Rapid vocabulary addition to context-dependent decoder graphs
Cyril Allauzen (2015)
10.1017/S0960129519000094
Singular value automata and approximate minimization
B. Balle (2019)
Semi-Supervised Model Training for Unbounded Conversational Speech Recognition
Shane Walker (2017)
10.21437/INTERSPEECH.2017-103
Improved Subword Modeling for WFST-Based Speech Recognition
Peter Smit (2017)
Nonparametric Bayesian approaches for acoustic modeling
Harati Nejad Torbati (2015)
AF : Small : Collaborative Research : On-Line Learning Algorithms for Path Experts with Non-Additive Losses 1
(2015)
Re-Scoring Word Lattices from Automatic Speech Recognition System Based on Manual Error Corrections
Anna Vigdís Rúnarsdóttir (2018)
10.1109/ICASSP.2013.6639275
Open vocabulary handwriting recognition using combined word-level and character-level language models
M. Kozielski (2013)
Teme za studentske radove u Samsungovoj laboratoriji za razvoj aplikacija na Fakultetu tehničkih nauka
(2015)
See more
Semantic Scholar Logo Some data provided by SemanticScholar