Authors

Jan Cychnerski, Tomasz Dziubich

Read online

https://link.springer.com/chapter/10.1007/978-3-030-85082-1_19

Abstract

Deployment of different techniques of deep learning including Convolutional Neural Networks (CNN) in image classification systems has accomplished outstanding results. However, the advantages and potential impact of such a system can be completely negated if it does not reach a target accuracy. To achieve high classification accuracy with low variance in medical image classification system, there is needed the large size of the training data set with suitable quality score. This paper presents a study on the use of various consistency checking methods to refine the quality of annotations. It is assumed that tagging was done by volunteers (crowd-sourcing model). The aim of this work was to evaluate the fitness of this approach in the medical field and the usefulness of our innovative web tool, called MedTagger, designed to facilitate large-scale annotation of magnetic resonance (MR) images, as well as the accuracy of crowd-source assessment using this tool, comparing to expert classification. We present the methodology followed to annotate the collection of kidney MR scans. All of the 156 images were acquired from the Medical University of Gdansk. Two groups of students (with and without medical educational background) and three nephrologists were engaged. This research supports the thesis that some types of MR image annotations provided by naive individuals are comparable to expert annotation, but this process could be shortened in time. Furthermore, it is more cost-effective in the simultaneous preservation of image analysis accuracy. With pixel-wise majority voting, it was possible to create crowd-sourced organ segmentations that match the quality of those created by individual medical experts (mAP up to 94% ±3.9%).