Remote tool-based adjudication for grading diabetic retinopathy

M Schaekermann, N Hammel, M Terry… - … Vision Science & …, 2019 - tvst.arvojournals.org
M Schaekermann, N Hammel, M Terry, TK Ali, Y Liu, B Basham, B Campana, W Chen, X Ji…
Translational Vision Science & Technology, 2019tvst.arvojournals.org
Purpose: To present and evaluate a remote, tool-based system and structured grading rubric
for adjudicating image-based diabetic retinopathy (DR) grades. Methods: We compared
three different procedures for adjudicating DR severity assessments among retina specialist
panels, including (1) in-person adjudication based on a previously described procedure
(Baseline),(2) remote, tool-based adjudication for assessing DR severity alone (TA), and (3)
remote, tool-based adjudication using a feature-based rubric (TA-F). We developed a …
Abstract
Purpose: To present and evaluate a remote, tool-based system and structured grading rubric for adjudicating image-based diabetic retinopathy (DR) grades.
Methods: We compared three different procedures for adjudicating DR severity assessments among retina specialist panels, including (1) in-person adjudication based on a previously described procedure (Baseline),(2) remote, tool-based adjudication for assessing DR severity alone (TA), and (3) remote, tool-based adjudication using a feature-based rubric (TA-F). We developed a system allowing graders to review images remotely and asynchronously. For both TA and TA-F approaches, images with disagreement were reviewed by all graders in a round-robin fashion until disagreements were resolved. Five panels of three retina specialists each adjudicated a set of 499 retinal fundus images (1 panel using Baseline, 2 using TA, and 2 using TA-F adjudication). Reliability was measured as grade agreement among the panels using Cohen's quadratically weighted kappa. Efficiency was measured as the number of rounds needed to reach a consensus for tool-based adjudication.
Results: The grades from remote, tool-based adjudication showed high agreement with the Baseline procedure, with Cohen's kappa scores of 0.948 and 0.943 for the two TA panels, and 0.921 and 0.963 for the two TA-F panels. Cases adjudicated using TA-F were resolved in fewer rounds compared with TA (P< 0.001; standard permutation test).
Conclusions: Remote, tool-based adjudication presents a flexible and reliable alternative to in-person adjudication for DR diagnosis. Feature-based rubrics can help accelerate consensus for tool-based adjudication of DR without compromising label quality.
Translational Relevance: This approach can generate reference standards to validate automated methods, and resolve ambiguous diagnoses by integrating into existing telemedical workflows.
ARVO Journals
Showing the best result for this search. See all results