<<  A Detailed Introduction to the Basic Operation of SerialEM Transportable transport  >>
An Overview of Contemporary ROC Methodology in Medical Imaging and
An Overview of Contemporary ROC Methodology in Medical Imaging and
ROC Receiver Operating Characteristic (historic name from radar
ROC Receiver Operating Characteristic (historic name from radar
OUTLINE: - Efforts toward consensus development on present issues -
OUTLINE: - Efforts toward consensus development on present issues -
EFFORTS TOWARD CONSENSUS DEVELOPMENT ON THE PRESENT ISSUES - How to
EFFORTS TOWARD CONSENSUS DEVELOPMENT ON THE PRESENT ISSUES - How to
EFFORTS TOWARD CONSENSUS DEVELOPMENT ON THE PRESENT ISSUES (II)
EFFORTS TOWARD CONSENSUS DEVELOPMENT ON THE PRESENT ISSUES (II)
Fundamentals of the ROC paradigm
Fundamentals of the ROC paradigm
Non-diseased cases
Non-diseased cases
Non-diseased cases
Non-diseased cases
Non-diseased cases
Non-diseased cases
Non-diseased cases
Non-diseased cases
Non-diseased cases
Non-diseased cases
Entire ROC curve
Entire ROC curve
Entire ROC curve
Entire ROC curve
. . . at least thats the idea
. . . at least thats the idea
In the following example from mammography, readers were asked to set
In the following example from mammography, readers were asked to set
TPF vs FPF for 108 US radiologists in study by Beam et al
TPF vs FPF for 108 US radiologists in study by Beam et al
- There is no unique ROC operating point i.e., no unique (TPF, FPF)
- There is no unique ROC operating point i.e., no unique (TPF, FPF)
. . . dozens of examples of this phenomenon exist
. . . dozens of examples of this phenomenon exist
( Chest film study by E. James Potchen, M.D., 1999 )
( Chest film study by E. James Potchen, M.D., 1999 )
The Multiple-Reader Multiple-Case (MRMC) paradigm Fully-Crossed
The Multiple-Reader Multiple-Case (MRMC) paradigm Fully-Crossed
The Multiple-Reader Multiple-Case (MRMC) paradigm Enabled by
The Multiple-Reader Multiple-Case (MRMC) paradigm Enabled by
Some possible bootstrap samples of size 15 from a dataset with 15
Some possible bootstrap samples of size 15 from a dataset with 15
The Multiple-Reader Multiple-Case (MRMC) paradigm Enabled by
The Multiple-Reader Multiple-Case (MRMC) paradigm Enabled by
Scales used for reporting and measurements: - Historic ordered
Scales used for reporting and measurements: - Historic ordered
Scales used for reporting and measurements Example of Best of both
Scales used for reporting and measurements Example of Best of both
26
26
Possible reasons why we do not see more of Best of both worlds ROC
Possible reasons why we do not see more of Best of both worlds ROC
The most famous slides in the ROC archives
The most famous slides in the ROC archives
Dilemma: Which modality is better
Dilemma: Which modality is better
The dilemma is resolved after ROCs are determined (one scenario):
The dilemma is resolved after ROCs are determined (one scenario):
A different scenario: Same ROC
A different scenario: Same ROC
. . . yet another scenario:
. . . yet another scenario:
When ROC curves cross
When ROC curves cross
Location scoring: - The basic ROC paradigm is an assessment of the
Location scoring: - The basic ROC paradigm is an assessment of the
The problem of location-specific ROC or LROC analysis - Measurement
The problem of location-specific ROC or LROC analysis - Measurement
Region-of-interest (ROI) approach to location-specific ROC analysis
Region-of-interest (ROI) approach to location-specific ROC analysis
Region-of-interest (ROI) approach to location-specific ROC analysis
Region-of-interest (ROI) approach to location-specific ROC analysis
THE PROBLEM OF UNCERTAINTY OF TRUTH STATE Classic paper: Revesz,
THE PROBLEM OF UNCERTAINTY OF TRUTH STATE Classic paper: Revesz,
UNCERTAINTY OF TRUTH STATE
UNCERTAINTY OF TRUTH STATE
Given: 100 patients  What is the best split between normals and
Given: 100 patients What is the best split between normals and
. . . relaxing panel criterion from unanimous to majority - allows
. . . relaxing panel criterion from unanimous to majority - allows
THE PROBLEM OF CONTROLLING FOR READER VIGILANCE Any measurement
THE PROBLEM OF CONTROLLING FOR READER VIGILANCE Any measurement
IN SUMMARY These points reflect the current status of on-going
IN SUMMARY These points reflect the current status of on-going
Selected References Metz CE
Selected References Metz CE
Toledano AY, Gatsonis C. Ordinal regression methodology for ROC curves
Toledano AY, Gatsonis C. Ordinal regression methodology for ROC curves
Chakraborty DP and Winter L. Free-response methodology: Alternate
Chakraborty DP and Winter L. Free-response methodology: Alternate
Metz CE
Metz CE

: An Overview of Contemporary ROC Methodology in Medical Imaging and Computer-Assist Modalities. : . : An Overview of Contemporary ROC Methodology in Medical Imaging and Computer-Assist Modalities.PPT. zip-: 373 .

An Overview of Contemporary ROC Methodology in Medical Imaging and Computer-Assist Modalities

An Overview of Contemporary ROC Methodology in Medical Imaging and Computer-Assist Modalities.PPT
1 An Overview of Contemporary ROC Methodology in Medical Imaging and

An Overview of Contemporary ROC Methodology in Medical Imaging and

Computer-Assist Modalities Robert F. Wagner, Ph.D., OST, CDRH, FDA

1

2 ROC Receiver Operating Characteristic (historic name from radar

ROC Receiver Operating Characteristic (historic name from radar

studies) Relative Operating Characteristic (psychology, psychophysics) Operating Characteristic (preferred by some)

2

3 OUTLINE: - Efforts toward consensus development on present issues -

OUTLINE: - Efforts toward consensus development on present issues -

The ROC Paradigm - The complication of reader variability - The multiple-reader multiple-case (MRMC) ROC paradigm - The measurement scales: categories; patient-management/action; probability scale - Complications from location uncertainty truth uncertainty effective sample # uncertainty reader vigilance - Summary

3

4 EFFORTS TOWARD CONSENSUS DEVELOPMENT ON THE PRESENT ISSUES - How to

EFFORTS TOWARD CONSENSUS DEVELOPMENT ON THE PRESENT ISSUES - How to

use classic concepts of Sensitivity, Specificity, and ROC analysis to assess performance of diagnostic imaging and computer-assist systems? - Many new issues and levels of complexity coming to the fore as more complex technologies emerge

4

5 EFFORTS TOWARD CONSENSUS DEVELOPMENT ON THE PRESENT ISSUES (II)

EFFORTS TOWARD CONSENSUS DEVELOPMENT ON THE PRESENT ISSUES (II)

RSNA/SPIE/MIPS Various Workshops & Literature - an evolving Work-in-Progress FDA/CDRH use of multiple-reader multiple-case (MRMC) ROC - Digital Mammography PMAs - Computer Aid for lung nodule detection on CXR (film) NCI Lung Image Database Consortium (LIDC) & Workshops - Consensus seeking on many issues - Two CDRH active members Communication of these resources with incoming sponsors

5

6 Fundamentals of the ROC paradigm

Fundamentals of the ROC paradigm

6

7 Non-diseased cases

Non-diseased cases

Diseased cases

Threshold

Test result value or subjective judgement of likelihood that case is diseased

7

8 Non-diseased cases

Non-diseased cases

Diseased cases

more typically:

Test result value or subjective judgement of likelihood that case is diseased

8

9 Non-diseased cases

Non-diseased cases

TPF, sensitivity

Threshold

less aggressive mindset

Diseased cases

FPF, 1-specificity

9

10 Non-diseased cases

Non-diseased cases

moderate mindset

TPF, sensitivity

Threshold

Diseased cases

FPF, 1-specificity

10

11 Non-diseased cases

Non-diseased cases

more aggressive mindset

TPF, sensitivity

Threshold

Diseased cases

FPF, 1-specificity

11

12 Entire ROC curve

Entire ROC curve

Non-diseased cases

TPF, sensitivity

Diseased cases

FPF, 1-specificity

12

13 Entire ROC curve

Entire ROC curve

chance line

TPF, sensitivity

FPF, 1-specificity

Reader Skill and/or Level of Technology

13

14 . . . at least thats the idea

. . . at least thats the idea

. . . . . now to what happens in the real world . . . The Complication of Reader Variability

14

15 In the following example from mammography, readers were asked to set

In the following example from mammography, readers were asked to set

their threshold for action . . . . . . between their sense of the boundary between category 3 and category 4 of the BIRADS scale

15

16 TPF vs FPF for 108 US radiologists in study by Beam et al

TPF vs FPF for 108 US radiologists in study by Beam et al

16

17 - There is no unique ROC operating point i.e., no unique (TPF, FPF)

- There is no unique ROC operating point i.e., no unique (TPF, FPF)

point - There is no unique ROC curve i.e., there is a band or region of ROCs

17

18 . . . dozens of examples of this phenomenon exist

. . . dozens of examples of this phenomenon exist

. . The following is an example from plain film chest radiography (CXR)

18

19 ( Chest film study by E. James Potchen, M.D., 1999 )

( Chest film study by E. James Potchen, M.D., 1999 )

19

20 The Multiple-Reader Multiple-Case (MRMC) paradigm Fully-Crossed

The Multiple-Reader Multiple-Case (MRMC) paradigm Fully-Crossed

Design * Cases matched across modalities (i.e., same cases read unaided vs aided) * Readers matched across modalities (i.e., same readers read unaided vs aided) * This design has the most statistical power for a given number of readers and a given number of cases with verified truth; thus, its least demanding of these resources (least burdensome)

20

21 The Multiple-Reader Multiple-Case (MRMC) paradigm Enabled by

The Multiple-Reader Multiple-Case (MRMC) paradigm Enabled by

resampling strategies - Jackknife plus ANOVA (parametric) (Dorfman, Berbaum, Metz DBM 1992) - Bootstrap the experiment of interest (nonpar) Draw random readers, random cases Carry out the experiment of interest

21

22 Some possible bootstrap samples of size 15 from a dataset with 15

Some possible bootstrap samples of size 15 from a dataset with 15

elements [14, 6, 3, 5, 12, 9, 11, 14, 4, 10, 7, 12, 3, 14, 2] . . . [9, 15, 11, 2, 13, 1, 6, 7, 12, 4, 8, 1, 12, 6, 14]

22

23 The Multiple-Reader Multiple-Case (MRMC) paradigm Enabled by

The Multiple-Reader Multiple-Case (MRMC) paradigm Enabled by

resampling strategies - Jackknife plus ANOVA (parametric) (Dorfman, Berbaum, Metz DBM 1992) - Bootstrap the experiment of interest (nonpar) Draw random readers, random cases Carry out the experiment of interest - Obtain mean performance over readers, cases - Obtain error bars that account for variability of readers and cases

23

24 Scales used for reporting and measurements: - Historic ordered

Scales used for reporting and measurements: - Historic ordered

categories (usu. 5 or 6) (almost definitely no . . . maybe . . . almost definitely yes) - Action item or patient management scale (e.g., no action vs F/U . . . or F/U vs biopsy) . . . BIRADS scale is classic example . . . - Continuous probability rating scale (e.g., probability of disease or probability of cancer) . . . actually recommended in BIRADS doc . . .

24

25 Scales used for reporting and measurements Example of Best of both

Scales used for reporting and measurements Example of Best of both

worlds: Classification of benign vs malignant ?calc clusters (Jiang, Nishikawa, Schmidt, Metz, Giger, Doi) Authors studied ROC curves, ROC areas . . . and (Sensitivity, Specificity) operating point (means and uncertainties)

25

26 26

26

27 Possible reasons why we do not see more of Best of both worlds ROC

Possible reasons why we do not see more of Best of both worlds ROC

total area is TPF (Se) averaged over FPF (Sp) - Var(ROC area) ~ (Binomial Var)/2 - Var(Se) when Sp is known = Binomial Var - Var (Se) when Sp is estimated > Binomial Var Var(ROC area) is least burdensome - Both worlds requires consistent conventions . . . plus training (little documentation so far) - May require consensus bodies to promote the practice

27

28 The most famous slides in the ROC archives

The most famous slides in the ROC archives

. .

28

29 Dilemma: Which modality is better

Dilemma: Which modality is better

True Positive Fraction = Sensitivity

False Positive Fraction = 1.0 ? Specificity

Modality B

Modality A

1.0

0.0

0.0

1.0

29

30 The dilemma is resolved after ROCs are determined (one scenario):

The dilemma is resolved after ROCs are determined (one scenario):

Conclusion:

Modality B is better:

higher TPF at same FPF, or lower FPF at same TPF

True Positive Fraction = Sensitivity

False Positive Fraction = 1.0 ? Specificity

Modality B

Modality A

1.0

0.0

0.0

1.0

30

31 A different scenario: Same ROC

A different scenario: Same ROC

True Positive Fraction = Sensitivity

False Positive Fraction = 1.0 ? Specificity

Modality B

Modality A

1.0

0.0

0.0

1.0

31

32 . . . yet another scenario:

. . . yet another scenario:

Conclusion:

Modality A is better:

higher TPF at same FPF, or

lower FPF at same TPF

True Positive Fraction = Sensitivity

False Positive Fraction = 1.0 ? Specificity

Modality B

Modality A

1.0

0.0

0.0

1.0

32

33 When ROC curves cross

When ROC curves cross

. . total area under the ROC curve is not a sufficient summary measure of performance . . . other summary measures may be necessary. When this is anticipated, the study protocol is expected to address this.

33

34 Location scoring: - The basic ROC paradigm is an assessment of the

Location scoring: - The basic ROC paradigm is an assessment of the

decision making at the level of the patient. - In complex imaging, assessment of decision making at a finer level is desired, i.e., assessment of localization is desired. - Localization ? adds more information, ? more statistical power

34

35 The problem of location-specific ROC or LROC analysis - Measurement

The problem of location-specific ROC or LROC analysis - Measurement

of a hit depends on localization criterion (thus, results are not unique) - Monotonic relationship between ROC and LROC for special case of zero or one lesion - More elaborate models require assumptions of independence among multiple lesions, regions - Lack of validated software for analysis of experiments

35

36 Region-of-interest (ROI) approach to location-specific ROC analysis

Region-of-interest (ROI) approach to location-specific ROC analysis

. . . . . only require localization to within a quadrant . . .

. . . or some other unit . . .

36

37 Region-of-interest (ROI) approach to location-specific ROC analysis

Region-of-interest (ROI) approach to location-specific ROC analysis

. . - Disadvantages: Does not correspond to the clinical task . . . etc. . . - Advantages: Straightforward to account for correlations w/o additional assumptions - The most straightforward method is simply to resample using the patient as the statistical unit

37

38 THE PROBLEM OF UNCERTAINTY OF TRUTH STATE Classic paper: Revesz,

THE PROBLEM OF UNCERTAINTY OF TRUTH STATE Classic paper: Revesz,

Kundel, Bonitatibus (1983) included various ways of obtaining panel consensus truth Authors compared three imaging methods Any one of the three could outperform the others depending on rule used for reducing panel to truth HOWEVER, TODAY TARGET IS ACTIONABLE NODULE ACCORDING TO EXPERT PANEL Classic ref. above indicates additional uncertainty present => Resample panel to assess additional uncertainty

38

39 UNCERTAINTY OF TRUTH STATE

UNCERTAINTY OF TRUTH STATE

UNCERTAINTY IN EFFECTIVE SAMPLE SIZE Uncertainty in TPF ? # actually diseased cases Uncertainty in FPF ? # actually nondiseased cases Uncertainty in total area under ROC curve ? effective number of cases Harmonic mean of numbers in the two classes . . . & is a function of the panel sample

39

40 Given: 100 patients  What is the best split between normals and

Given: 100 patients What is the best split between normals and

abnormals for purposes of estimating area under ROC?

40

41 . . . relaxing panel criterion from unanimous to majority - allows

. . . relaxing panel criterion from unanimous to majority - allows

resampling to assess variability - may increase effective number of samples . . . these effects may tend to cancel

41

42 THE PROBLEM OF CONTROLLING FOR READER VIGILANCE Any measurement

THE PROBLEM OF CONTROLLING FOR READER VIGILANCE Any measurement

setting has artificial conditions vis-?-vis actual practice: Are readers more vigilant in unaided reading when theyre subjects in a study? Are readers less vigilant in unaided reading when theyre not subjects in a study? One early suggestion: Control the time available to readers to mimic the clinic (Chan et al., Invest. Radiol. 1990)

42

43 IN SUMMARY These points reflect the current status of on-going

IN SUMMARY These points reflect the current status of on-going

interactions between and among FDA Academia Industry sponsors NCI and the LIDC on the topic and issues for submissions like the present one

43

44 Selected References Metz CE

Selected References Metz CE

Basic principles of ROC analysis. Seminars in Nuclear Medicine 1978; 8: 283-298. Metz CE. ROC methodology in radiologic imaging. Invest Radiol 1986; 21: 720-33. Metz CE. Some practical issues of experimental design and data analysis in radiological ROC studies. Invest Radiol 1989; 24: 234-245. Metz CE. Fundamentals of ROC Analysis. [In] Handbook of Medical Imaging. Vol. 1. Physics and Psychophysics. Beutel J, Kundel HL, and Van Metter RL, Eds. SPIE Press (Bellingham WA 2000), Chapter 15: 751-769. Swets JA and Pickett RM. Evaluation of Diagnostic Systems. Academic Press, New York, 1982. Wagner RF, Beiden SV, Campbell G, Metz CE, and Sacks WM. Assessment of medical imaging and computer-assist systems: Lessons from recent experience. Acad Radiol 2002; 9: 1264-1277 Wagner RF, Beiden SV, Campbell G, Metz CE, and Sacks WM. Contemporary issues for experimental design in assessment of medical imaging and computer-assist systems. Proc. of the SPIE-Medical Imaging 2003; 5034: 213-224. Dodd LE, Wagner RF, Armato SG, McNitt-Gray MF, et al. Assessment methodologies and statistical issues for computer-aided diagnosis of lung nodules in computed tomography: Contemporary research topics relevant to the Lung Image Database Consortium. Acad Radiol (in print, Apr. 2004).

44

45 Toledano AY, Gatsonis C. Ordinal regression methodology for ROC curves

Toledano AY, Gatsonis C. Ordinal regression methodology for ROC curves

derived from correlated data. Statistics in Medicine 1996, 15: 1807-1826. Nishikawa RM and Yarusso LM. Variations in measured performance of CAD schemes due to database composition and scoring protocol. Proc. of the SPIE 1998; 3338: 840-844. Giger ML. Current issues in CAD for mammography. In: Doi K, Giger ML, Nishikawa RM, and Schmidt RA, Eds. Digital Mammography 96. Elsevier Science B.V. 1996, 53-59. Clarke LP, Croft BY, Staab E, Baker H, Sullivan DC, National Cancer Institute initiative: Lung image database resource for imaging research. Acad Radiol 2001 May;8(5):447-50. Wagner RF, Beiden SV, Metz CE. Continuous versus categorical data for ROC analysis: Some quantitative considerations. Acad Radiol 2001; 8: 328-334. Revesz G, Kundel HL, and Bonitatibus M. The effect of verification on the assessment of imaging techniques. Invest. Radiol. 1983; 18: 194-198. Beiden SV, Wagner RF, Campbell G. Components-of-variance models and multiple-bootstrap experiments: An alternative method for random-effects receiver operating characteristic analysis. Acad Radiol 2000; 7: 341-349. Obuchowski NA. Multireader, multimodality receiver operating characteristic curve studies: Hypothesis testing and sample size estimation using an analysis of variance approach with dependent observations. Acad Radiol 1995; 2 (Supplement 1): S22-S29. Chan HP, Doi K, Vyborny CJ et al. Improvement in radiologists detection of clustered microcalcifications on mammograms. Invest Radiol 1990; 25: 1102.

45

46 Chakraborty DP and Winter L. Free-response methodology: Alternate

Chakraborty DP and Winter L. Free-response methodology: Alternate

analysis and a new observer-performance experiment. Radiology 1990; 174: 873-881. Metz CE, Starr SJ, Lusted LB. Observer performance in detecting multiple radiographic signals: prediction and analysis using a generalized ROC approach. Radiology 1976; 121: 337-347. Starr SJ, Metz CE, Lusted LB, Goodenough DJ. Visual detection and localization of radiographic images. Radiology 1975; 116: 533-538 Swensson RG. Unified measurement of observer performance in detecting and localizing target objects on images. Medical Physics 1996; 23: 1709-1725. Chakraborty DP. The FROC, AFROC and DROC variants of the ROC analysis. [In] Handbook of Medical Imaging. Vol. 1. Physics and Psychophysics. Beutel J, Kundel HL, and Van Metter RL, Eds. SPIE Press (Bellingham WA 2000), Chapter 16: 771-796. Obuchowski NA. Multireader receiver operating characteristic studies: A comparison of study designs. Acad Radiol 1995; 2: 709-716. Gatsonis CA, Begg CB, Wieand S. Advances in Statistical Methods for Diagnostic Radiology: A Symposium. Acad Radiol 1995; 2 (Supplement 1): S1-S84 (the entire supplement is the Proceedings of the Symposium). Beiden SV, Wagner RF, Doi K, Nishikawa RM, Freedman M, Lo S-C B, and Xu X-W. Independent versus sequential reading in ROC studies of computer-assist modalities: Analysis of components of variance. Acad Radiol 22002; 9: 1036- 1043.

46

47 Metz CE

Metz CE

Evaluation of CAD Methods. In: Doi K, MacMahon H, Giger ML, and Hoffmann KR, eds. Computer-Aided Diagnosis in Medical Imaging. Amsterdam: Elsevier Science B.V. (Excerpta Medica International Congress Series, Vol. 1182), 1999, 543-554. Chakraborty, DP. Statistical power in observer performance studies: Comparison of the receiver operating characteristic and free-response methods in tasks involving localization. Acad Radiol 2002; 9: 147-156. Dorfman DD, Berbaum KS, . Receiver operating characteristic rating analysis: generalization to the population of readers and patients with the jackknife method. Invest Radiol 1992; 27: 723-731. Chakraborty DP and Berbaum KS: Comparing Inter-Modality Diagnostic Accuracies in Tasks Involving Lesion Localization: A Jackknife AFROC Approach. Supplement to Radiology, Volume 225 (P), 259, 2002. Obuchowski NA, Lieber ML, Powell KA. Data analysis for detection and localization of multiple abnormalities with application to mammography. Acad Radiol 2000; 7: 516-525. Rutter CM. Bootstrap estimation of diagnostic accuracy with patient-clustered data. Acad Radiol 2000; 7 : 413-419. Efron B, Tibshirani RJ. An introduction to the bootstrap. Chapman and Hall, New York, 1993. Beam CA, Layde PM, Sullivan DC. Variability in the interpretation of screening mammograms by US radiologists. Arch Intern Med 1996; 156: 209-213. Jiang Y, Nishikawa RM, Schmidt RA, , Giger ML, Doi K. Improving breast cancer diagnosis with computer-aided diagnosis. Acad Radiol 1999; 6: 22-33.

47

An Overview of Contemporary ROC Methodology in Medical Imaging and Computer-Assist Modalities
http://900igr.net/prezentacija/anglijskij-jazyk/an-overview-of-contemporary-roc-methodology-in-medical-imaging-and-computer-assist-modalities-210841.html
c

29
900igr.net > > > An Overview of Contemporary ROC Methodology in Medical Imaging and Computer-Assist Modalities