TPF vs FPF for 108 US radiologists in study by Beam et al
( Chest film study by E. James Potchen, M.D., 1999 )
An Overview of Contemporary ROC Methodology in Medical Imaging and Computer-Assist Modalities

An Overview of Contemporary ROC Methodology in Medical Imaging and Computer-Assist Modalities

An Overview of Contemporary ROC Methodology in Medical Imaging and Computer-Assist Modalities

An Overview of Contemporary ROC Methodology in Medical Imaging and Computer-Assist Modalities.PPT
1An Overview of Contemporary ROC 37Region-of-interest (ROI) approach to
Methodology in Medical Imaging and location-specific ROC analysis . . . -
Computer-Assist Modalities Robert F. Disadvantages: Does not correspond to the
Wagner, Ph.D., OST, CDRH, FDA. 1. clinical task . . . etc. . . -
2ROC Receiver Operating Characteristic Advantages: Straightforward to account for
(historic name from radar studies) correlations w/o additional assumptions -
Relative Operating Characteristic The most straightforward method is simply
(psychology, psychophysics) Operating to resample using the patient as the
Characteristic (preferred by some). 2. statistical unit. 37.
development on present issues - The ROC STATE Classic paper: Revesz, Kundel,
Paradigm - The complication of reader Bonitatibus (1983) included various ways
variability - The multiple-reader of obtaining panel consensus truth
multiple-case (MRMC) ROC paradigm - The Authors compared three imaging methods Any
measurement scales: categories; one of the three could outperform the
patient-management/action; probability others depending on rule used for
scale - Complications from location reducing panel to truth HOWEVER, TODAY
uncertainty truth uncertainty effective TARGET IS ACTIONABLE NODULE ACCORDING TO
sample # uncertainty reader vigilance - EXPERT PANEL Classic ref. above indicates
Summary. 3. additional uncertainty present =>
4EFFORTS TOWARD CONSENSUS DEVELOPMENT Resample panel to assess additional
ON THE PRESENT ISSUES - How to use classic uncertainty. 38.
concepts of Sensitivity, Specificity, and 39UNCERTAINTY OF TRUTH STATE ?
ROC analysis to assess performance of UNCERTAINTY IN EFFECTIVE SAMPLE SIZE
diagnostic imaging and computer-assist Uncertainty in TPF ? # actually diseased
systems? - Many new issues and levels of cases Uncertainty in FPF ? # actually
complexity coming to the fore as more nondiseased cases Uncertainty in total
complex technologies emerge. 4. area under ROC curve ? effective number
5EFFORTS TOWARD CONSENSUS DEVELOPMENT of cases Harmonic mean of numbers in the
ON THE PRESENT ISSUES (II) RSNA/SPIE/MIPS two classes . . . & is a function of
Various Workshops & Literature - an the panel sample. 39.
evolving Work-in-Progress FDA/CDRH use of 40Given: 100 patients What is the best
multiple-reader multiple-case (MRMC) ROC - split between normals and abnormals
Digital Mammography PMAs - Computer Aid for purposes of estimating area under ROC?
for lung nodule detection on CXR (film) 40.
NCI Lung Image Database Consortium (LIDC) 41. . . relaxing panel criterion from
& Workshops - Consensus seeking on unanimous to majority - allows resampling
many issues - Two CDRH active members to assess variability - may increase
Communication of these resources with effective number of samples . . . these
incoming sponsors. 5. effects may tend to cancel. 41.
6Fundamentals of the ROC paradigm. 6. 42THE PROBLEM OF CONTROLLING FOR READER
7Non-diseased cases. Diseased cases. VIGILANCE Any measurement setting has
Threshold. Test result value or subjective artificial conditions vis-?-vis actual
judgement of likelihood that case is practice: Are readers more vigilant in
diseased. 7. unaided reading when theyre subjects in a
8Non-diseased cases. Diseased cases. study? Are readers less vigilant in
more typically: Test result value or unaided reading when theyre not subjects
subjective judgement of likelihood that in a study? One early suggestion: Control
case is diseased. 8. the time available to readers to mimic the
9Non-diseased cases. TPF, sensitivity. clinic (Chan et al., Invest. Radiol.
Threshold. less aggressive mindset. 1990). 42.
Diseased cases. FPF, 1-specificity. 9. 43IN SUMMARY These points reflect the
10Non-diseased cases. moderate mindset. current status of on-going interactions
TPF, sensitivity. Threshold. Diseased between and among FDA Academia Industry
cases. FPF, 1-specificity. 10. sponsors NCI and the LIDC on the topic and
11Non-diseased cases. more aggressive issues for submissions like the present
mindset. TPF, sensitivity. Threshold. one. 43.
Diseased cases. FPF, 1-specificity. 11. 44Selected References Metz CE. Basic
12Entire ROC curve. Non-diseased cases. principles of ROC analysis. Seminars in
TPF, sensitivity. Diseased cases. FPF, Nuclear Medicine 1978; 8: 283-298. Metz
1-specificity. 12. CE. ROC methodology in radiologic imaging.
13Entire ROC curve. chance line. TPF, Invest Radiol 1986; 21: 720-33. Metz CE.
sensitivity. FPF, 1-specificity. Reader Some practical issues of experimental
Skill and/or Level of Technology. 13. design and data analysis in radiological
14. . . at least thats the idea . . . . ROC studies. Invest Radiol 1989; 24:
. . now to what happens in the real world 234-245. Metz CE. Fundamentals of ROC
. . . The Complication of Reader Analysis. [In] Handbook of Medical
Variability. 14. Imaging. Vol. 1. Physics and
15In the following example from Psychophysics. Beutel J, Kundel HL, and
mammography, readers were asked to set Van Metter RL, Eds. SPIE Press (Bellingham
their threshold for action . . . . . . WA 2000), Chapter 15: 751-769. Swets JA
between their sense of the boundary and Pickett RM. Evaluation of Diagnostic
between category 3 and category 4 of the Systems. Academic Press, New York, 1982.
BIRADS scale. 15. Wagner RF, Beiden SV, Campbell G, Metz CE,
16TPF vs FPF for 108 US radiologists in and Sacks WM. Assessment of medical
study by Beam et al. 16. imaging and computer-assist systems:
17- There is no unique ROC operating Lessons from recent experience. Acad
point i.e., no unique (TPF, FPF) point - Radiol 2002; 9: 1264-1277 Wagner RF,
There is no unique ROC curve i.e., there Beiden SV, Campbell G, Metz CE, and Sacks
is a band or region of ROCs. 17. WM. Contemporary issues for experimental
18. . . dozens of examples of this design in assessment of medical imaging
phenomenon exist . . . The following is an and computer-assist systems. Proc. of the
example from plain film chest radiography SPIE-Medical Imaging 2003; 5034: 213-224.
(CXR). 18. Dodd LE, Wagner RF, Armato SG, McNitt-Gray
19( Chest film study by E. James MF, et al. Assessment methodologies and
Potchen, M.D., 1999 ). 19. statistical issues for computer-aided
20The Multiple-Reader Multiple-Case diagnosis of lung nodules in computed
(MRMC) paradigm Fully-Crossed Design * tomography: Contemporary research topics
Cases matched across modalities (i.e., relevant to the Lung Image Database
same cases read unaided vs aided) * Consortium. Acad Radiol (in print, Apr.
Readers matched across modalities (i.e., 2004). 44.
same readers read unaided vs aided) * This 45Toledano AY, Gatsonis C. Ordinal
design has the most statistical power for regression methodology for ROC curves
a given number of readers and a given derived from correlated data. Statistics
number of cases with verified truth; thus, in Medicine 1996, 15: 1807-1826. Nishikawa
its least demanding of these resources RM and Yarusso LM. Variations in measured
(least burdensome). 20. performance of CAD schemes due to database
21The Multiple-Reader Multiple-Case composition and scoring protocol. Proc. of
(MRMC) paradigm Enabled by resampling the SPIE 1998; 3338: 840-844. Giger ML.
strategies - Jackknife plus ANOVA Current issues in CAD for mammography. In:
(parametric) (Dorfman, Berbaum, Metz DBM Doi K, Giger ML, Nishikawa RM, and Schmidt
1992) - Bootstrap the experiment of RA, Eds. Digital Mammography 96. Elsevier
interest (nonpar) Draw random readers, Science B.V. 1996, 53-59. Clarke LP, Croft
random cases Carry out the experiment of BY, Staab E, Baker H, Sullivan DC,
interest. 21. National Cancer Institute initiative: Lung
22Some possible bootstrap samples of image database resource for imaging
size 15 from a dataset with 15 elements research. Acad Radiol 2001
[14, 6, 3, 5, 12, 9, 11, 14, 4, 10, 7, 12, May;8(5):447-50. Wagner RF, Beiden SV,
3, 14, 2] . . . [9, 15, 11, 2, 13, 1, 6, Metz CE. Continuous versus categorical
7, 12, 4, 8, 1, 12, 6, 14]. 22. data for ROC analysis: Some quantitative
23The Multiple-Reader Multiple-Case considerations. Acad Radiol 2001; 8:
(MRMC) paradigm Enabled by resampling 328-334. Revesz G, Kundel HL, and
strategies - Jackknife plus ANOVA Bonitatibus M. The effect of verification
(parametric) (Dorfman, Berbaum, Metz DBM on the assessment of imaging techniques.
1992) - Bootstrap the experiment of Invest. Radiol. 1983; 18: 194-198. Beiden
interest (nonpar) Draw random readers, SV, Wagner RF, Campbell G.
random cases Carry out the experiment of Components-of-variance models and
interest - Obtain mean performance over multiple-bootstrap experiments: An
readers, cases - Obtain error bars that alternative method for random-effects
account for variability of readers and receiver operating characteristic
cases. 23. analysis. Acad Radiol 2000; 7: 341-349.
24Scales used for reporting and Obuchowski NA. Multireader, multimodality
measurements: - Historic ordered receiver operating characteristic curve
categories (usu. 5 or 6) (almost studies: Hypothesis testing and sample
definitely no . . . maybe . . . almost size estimation using an analysis of
definitely yes) - Action item or variance approach with dependent
patient management scale (e.g., no observations. Acad Radiol 1995; 2
action vs F/U . . . or F/U vs biopsy) . . (Supplement 1): S22-S29. Chan HP, Doi K,
. BIRADS scale is classic example . . . - Vyborny CJ et al. Improvement in
Continuous probability rating scale radiologists detection of clustered
(e.g., probability of disease or microcalcifications on mammograms. Invest
probability of cancer) . . . actually Radiol 1990; 25: 1102. 45.
recommended in BIRADS doc . . . 24. 46Chakraborty DP and Winter L.
25Scales used for reporting and Free-response methodology: Alternate
measurements Example of Best of both analysis and a new observer-performance
worlds: Classification of benign vs experiment. Radiology 1990; 174: 873-881.
malignant ?calc clusters (Jiang, Metz CE, Starr SJ, Lusted LB. Observer
Nishikawa, Schmidt, Metz, Giger, Doi) performance in detecting multiple
Authors studied ROC curves, ROC areas . . radiographic signals: prediction and
. and (Sensitivity, Specificity) operating analysis using a generalized ROC approach.
point (means and uncertainties). 25. Radiology 1976; 121: 337-347. Starr SJ,
2626. Metz CE, Lusted LB, Goodenough DJ. Visual
27Possible reasons why we do not see detection and localization of radiographic
more of Best of both worlds ROC total images. Radiology 1975; 116: 533-538
area is TPF (Se) averaged over FPF (Sp) - Swensson RG. Unified measurement of
Var(ROC area) ~ (Binomial Var)/2 - Var(Se) observer performance in detecting and
when Sp is known = Binomial Var - Var (Se) localizing target objects on images.
when Sp is estimated > Binomial Var Medical Physics 1996; 23: 1709-1725.
Var(ROC area) is least burdensome - Both Chakraborty DP. The FROC, AFROC and DROC
worlds requires consistent conventions . variants of the ROC analysis. [In]
. . plus training (little documentation so Handbook of Medical Imaging. Vol. 1.
far) - May require consensus bodies to Physics and Psychophysics. Beutel J,
promote the practice. 27. Kundel HL, and Van Metter RL, Eds. SPIE
28The most famous slides in the ROC Press (Bellingham WA 2000), Chapter 16:
archives . . . 28. 771-796. Obuchowski NA. Multireader
29Dilemma: Which modality is better? receiver operating characteristic studies:
True Positive Fraction = Sensitivity. A comparison of study designs. Acad Radiol
False Positive Fraction = 1.0 ? 1995; 2: 709-716. Gatsonis CA, Begg CB,
Specificity. Modality B. Modality A. 1.0. Wieand S. Advances in Statistical Methods
0.0. 0.0. 1.0. 29. for Diagnostic Radiology: A Symposium.
30The dilemma is resolved after ROCs are Acad Radiol 1995; 2 (Supplement 1): S1-S84
determined (one scenario): Conclusion: (the entire supplement is the Proceedings
Modality B is better: higher TPF at same of the Symposium). Beiden SV, Wagner RF,
FPF, or lower FPF at same TPF. True Doi K, Nishikawa RM, Freedman M, Lo S-C B,
Positive Fraction = Sensitivity. False and Xu X-W. Independent versus sequential
Positive Fraction = 1.0 ? Specificity. reading in ROC studies of computer-assist
Modality B. Modality A. 1.0. 0.0. 0.0. modalities: Analysis of components of
1.0. 30. variance. Acad Radiol 22002; 9: 1036-
31A different scenario: Same ROC. True 1043. 46.
Positive Fraction = Sensitivity. False 47Metz CE. Evaluation of CAD Methods.
Positive Fraction = 1.0 ? Specificity. In: Doi K, MacMahon H, Giger ML, and
Modality B. Modality A. 1.0. 0.0. 0.0. Hoffmann KR, eds. Computer-Aided Diagnosis
1.0. 31. in Medical Imaging. Amsterdam: Elsevier
32. . . yet another scenario: Science B.V. (Excerpta Medica
Conclusion: Modality A is better: higher International Congress Series, Vol. 1182),
TPF at same FPF, or. lower FPF at same 1999, 543-554. Chakraborty, DP.
TPF. True Positive Fraction = Sensitivity. Statistical power in observer performance
False Positive Fraction = 1.0 ? studies: Comparison of the receiver
Specificity. Modality B. Modality A. 1.0. operating characteristic and free-response
0.0. 0.0. 1.0. 32. methods in tasks involving localization.
33When ROC curves cross . . . total area Acad Radiol 2002; 9: 147-156. Dorfman DD,
under the ROC curve is not a sufficient Berbaum KS, Metz CE. Receiver operating
summary measure of performance . . . other characteristic rating analysis:
summary measures may be necessary. When generalization to the population of
this is anticipated, the study protocol is readers and patients with the jackknife
expected to address this. 33. method. Invest Radiol 1992; 27: 723-731.
34Location scoring: - The basic ROC Chakraborty DP and Berbaum KS: Comparing
paradigm is an assessment of the decision Inter-Modality Diagnostic Accuracies in
making at the level of the patient. - In Tasks Involving Lesion Localization: A
complex imaging, assessment of decision Jackknife AFROC Approach. Supplement to
making at a finer level is desired, i.e., Radiology, Volume 225 (P), 259, 2002.
assessment of localization is desired. - Obuchowski NA, Lieber ML, Powell KA. Data
Localization ? adds more information, ? analysis for detection and localization of
more statistical power. 34. multiple abnormalities with application to
35The problem of location-specific ROC mammography. Acad Radiol 2000; 7: 516-525.
or LROC analysis - Measurement of a Rutter CM. Bootstrap estimation of
hit depends on localization criterion diagnostic accuracy with patient-clustered
(thus, results are not unique) - Monotonic data. Acad Radiol 2000; 7 : 413-419. Efron
relationship between ROC and LROC for B, Tibshirani RJ. An introduction to the
special case of zero or one lesion - More bootstrap. Chapman and Hall, New York,
elaborate models require assumptions of 1993. Beam CA, Layde PM, Sullivan DC.
independence among multiple lesions, Variability in the interpretation of
regions - Lack of validated software for screening mammograms by US radiologists.
analysis of experiments. 35. Arch Intern Med 1996; 156: 209-213. Jiang
36Region-of-interest (ROI) approach to Y, Nishikawa RM, Schmidt RA, Metz CE,
location-specific ROC analysis . . . . . . Giger ML, Doi K. Improving breast cancer
only require localization to within a diagnosis with computer-aided diagnosis.
quadrant . . . . . . or some other unit . Acad Radiol 1999; 6: 22-33. 47.
. . 36.
An Overview of Contemporary ROC Methodology in Medical Imaging and Computer-Assist Modalities.PPT

An Overview of Contemporary ROC Methodology in Medical Imaging and Computer-Assist Modalities

An Overview of Contemporary ROC Methodology in Medical Imaging and Computer-Assist Modalities

