Online citations, reference lists, and bibliographies.

The Measurement Of Observer Agreement For Categorical Data.

J. Landis, G. Koch
Published 1977 · Mathematics, Medicine

Cite This
Download PDF
Analyze on Scholarcy
This paper presents a general statistical methodology for the analysis of multivariate categorical data arising from observer reliability studies. The procedure essentially involves the construction of functions of the observed proportions which are directed at the extent to which the observers agree among themselves and the construction of test statistics for hypotheses involving these functions. Tests for interobserver bias are presented in terms of first-order marginal homogeneity and measures of interobserver agreement are developed as generalized kappa-type statistics. These procedures are illustrated with a clinical diagnosis example from the epidemiological literature.
This paper references
Some Further Remarks Concerning “A General Approach to the Estimation of Variance Components”
G. Koch (1968)
Analysis of categorical data by linear models.
J. Grizzle (1969)
An analysis for compounded functions of categorical data.
R. Forthofer (1973)
A review of statistical methods in the analysis of data arising from observer reliability studies (Part II)
J. Landis (1975)
A Note on the Equivalence of Two Test Criteria for Hypotheses in Categorical Data
V. P. Bhapkar (1966)
Landis, J
Mimeo Series No. (1022)
Measures of response agreement for qualitative data: Some generalizations and alternatives.
R. Light (1971)
Measures of Association for Cross Classifications III: Approximate Sampling Theory
L. Goodman (1963)
Assessing the Accuracy of Multivariate Observations
J. Fleiss (1966)
The analysis of categorical
D W. (1971)
Reliability of measurements for studies of cerebrovascular atherosclerosis.
R. Loewenson (1972)
Studies on multiple sclerosis in Winnipeg, Manitoba, and New Orleans, Louisiana. II. A controlled investigation of factors in the life history of the Winnipeg patients.
K. Westlund (1953)
An application of hierarchical
J. R. Landis (1977)
Measures of response
R. J. Light (1971)
Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit.
J. Cohen (1968)
A general methodology for the analysis of experiments with repeated measurement of categorical data.
G. Koch (1977)
The analysis of categorical data from mixed models
G. Koch (1971)
On the Hypotheses of 'No Interaction' in Contingency Tables
V. P. Bhapkar (1968)
An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers.
J. Landis (1977)
On the analysis of contingency tables with a quantitative response.
Vasant P. Bhapkar (1968)
The Measuring Process
J. Mandel (1959)
Large sample standard errors of kappa and weighted kappa.
J. Fleiss (1969)
Measuring nominal scale agreement among many raters.
J. Fleiss (1971)
Contribution to the theory of the X2 test
J Neyman (1949)
On Estimating Precision of Measuring Instruments and Product Variability
F. Grubbs (1948)
This content downloaded by the authorized user from on Mon, 19 Nov 2012 06:33:45 AM All use subject to JSTOR Terms and Conditions
A coefficient ofagreement for nominal scales
J. Cohen (1960)
The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability
J. Fleiss (1973)
Estimating Individual Rater Reliabilities from Analysis of Treatment Effects
J. Overall (1968)
A computer program for the generalized chi-square analysis of categorical data using weighted least squares (GENCAT).
J. Landis (1976)
Statistical Theory in Research
R. L. Anderson (1952)
This content downloaded by the authorized user from on Mon
Errors of Measurement, Precision, Accuracy and the Statistical Comparison of Measuring Instruments
F. Grubbs (1973)
Measuring agreement between two judges on the presence or absence of a trait.
J. Fleiss (1975)
Tests of statistical hypotheses concerning several parameters when the number of observations is large
A. Wald (1943)
A general approach to estimation of variance components
G. Koch (1965)
A new measure of agreement between rank ordered variables.
D. Cicchetti (1972)
Hypotheses Of ‘No Interaction’ In Multi-dimensional Contingency Tables
V. P. Bhapker (1968)
A general methodology for the measurement of observer agreement when the data are categorical
J. Landis (1975)

This paper is referenced by
An Examination of the Effects of Spatial Resolution and Image Analysis Technique on Indirect Fuel Mapping
M. Tanase (2008)
The test-retest reliability and concurrent validity of the Subjective Complaints Questionnaire for low back pain.
J. Ford (2009)
Contrast-Enhanced Magnetic Resonance Angiography Findings Prior to Hemodialysis Vascular access Creation: A Prospective Analysis
R. N. Planken (2008)
Analyzing unstructured text data: Using latent categorization to identify intellectual communities in information systems
Kai R. Larsen (2008)
Inconclusive results in conventional serological screening for Chagas' disease in blood banks: evaluation of cellular and humoral response.
C. R. Furucho (2008)
Accuracy of the Clinical Diagnosis of Corticobasal Degeneration
I. Litvan (1997)
Intra- and inter-rater reliability of 3D passive intervertebral motion in subjects with non-specific neck pain assessed by physical therapy students: A pilot study
G. Rossettini (2016)
IRIS: English-Irish Machine Translation System
Mihael Arcan (2016)
Running quietly reduces ground reaction force and vertical loading rate and alters foot strike technique
Xuan Phan (2017)
An investigation of aphasic naming error evolution following phonomotor treatment
Irene Minkina (2016)
Alternative stable states of tidal marsh vegetation patterns and channel complexity
Kevan B. Moffett (2016)
The role of prosody and voice quality in indirect storytelling speech: A cross-narrator perspective in four European languages
R. Montaño (2017)
"Algorithms ruin everything": #RIPTwitter, Folk Theories, and Resistance to Algorithmic Change in Social Media
Michael A. DeVito (2017)
Exploratory examination of the utility of demoralization as a diagnostic specifier for adjustment disorder and major depression.
D. Kissane (2017)
Cross-cultural adaptation, reliability, and validity of the Japanese version of the Cumberland ankle instability tool.
Shun Kunugi (2017)
The Spanish version of the Alberta Infant Motor Scale: Validity and reliability analysis
Erica Morales-Monforte (2017)
Subtype distribution of Blastocystis spp. isolated from children in Eskisehir, Turkey.
Nihal Doğan (2017)
Effectiveness and content analysis of interventions to enhance medication adherence and blood pressure control in hypertension: A systematic review and meta-analysis
E. Morrissey (2017)
An exploration of teacher learning from an educative reform‐oriented science curriculum: Case studies of teacher curriculum use
Lisa M. Marco-Bujosa (2017)
Benchmarking Swarm Rebalancing Algorithm for Relieving Imbalanced Machine Learning Problems
Jinyan Li (2018)
Multi-expert analysis and validation of objective vascular tortuosity measurements
L. Ramos (2018)
Tongue Image Database Construction Based on the Expert Opinions: Assessment for Individual Agreement and Methods for Expert Selection
Zhen Qi (2018)
Häävikko's method to assess dental age in Italian children.
A. Butti (2009)
Evaluation of an automated breast volume scanner according to the fifth edition of BI-RADS for breast ultrasound compared with hand-held ultrasound.
Eun Jung Choi (2018)
Could we use parent report as a valid proxy of child report on anxiety, depression, and distress? A systematic investigation of father–mother–child triads in children successfully treated for leukemia
C. Abate (2018)
Evidence of genotoxicity and cytotoxicity of X-rays in the oral mucosa epithelium of adults subjected to cone beam CT.
da Fonte Jb (2018)
Internalizing the spirit of entrepreneurship in early childhood education through traditional games
Muhammad Jufri (2018)
The reliability and validity of the standardized Mensendieck test in relation to disability in patients with chronic pain
Paul Keessen (2018)
Accuracy and Inter-Analyst Agreement of Visually Estimated Sea Ice Concentrations in Canadian Ice Service Ice Charts
Angela Cheng (2019)
City dweller aspirations for cities of the future: How do environmental and personal wellbeing feature?
H. Joffe (2016)
Contextualizing Individual Competencies for Managing the Corporate Social Responsibility Adaptation Process: The Apparent Influence of the Business Case Logic
E. R. Osagie (2019)
Mindfulness and Its Association With Varied Types of Motivation: A Systematic Review and Meta-Analysis Using Self-Determination Theory
James N Donald (2019)
See more
Semantic Scholar Logo Some data provided by SemanticScholar