A Thrifty Annotation Generation Approach for Semantic Segmentation of Biofilms

Page created by Jaime Chen

Travel

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

A Thrifty Annotation Generation Approach for Semantic Segmentation of Biofilms

2020 IEEE 20th International Conference on BioInformatics and BioEngineering (BIBE)

           A Thrifty Annotation Generation Approach for
                Semantic Segmentation of Biofilms
          Adithi D. Chakravarthy                                   Parvathi Chundi                                   Mahadevan Subramaniam
              College of IS&T                                      College of IS&T                                        College of IS&T
      University of Nebraska at Omaha                      University of Nebraska at Omaha                        University of Nebraska at Omaha
             Omaha, NE, USA                                       Omaha, NE, USA                                         Omaha, NE, USA
       achakravarthy@unomaha.edu                               pchundi@unomaha.edu                                 msubramaniam@unomaha.edu

                       Shankarachary Ragi                                                              Venkata R. Gadhamshetty
               Department of Electrical Engineering                                               Civil & Environmental Engineering
            South Dakota School of Mines & Technology                                         South Dakota School of Mines & Technology
                      Rapid City, SD, USA                                                                Rapid City, SD, USA
                  shankarachary.ragi@sdsmt.edu                                                     venkata.gadhamshetty@sdsmt.edu

      Abstract— Recent advances in semantic segmentation using
  deep learning methods have achieved promising results on several
  benchmark datasets. However, the primary challenge involved in                 equipment to be generated, and must be further annotated by
  such segmentation approaches is the availability of applicable                 multiple physicians or engineering scientists. Annotating
  training data. Since only experts are equipped to effectively
                                                                                 multiple entities in each image such as bacterial cells or biofilms
  annotate (or label) any available data for training semantic
                                                                                 on materials (technologically relevant metals, polymers and in
  segmentation networks, the effort and cost involved can be
  considerable, especially for larger datasets. In this paper, we aim
                                                                                 certain cases living substances such as human skin and tissue) is
  to address this problem by proposing a Thrifty Annotation                      a time consuming and tedious task. Consequently, a common
  Generation approach that records high performance on                           limitation of image segmentation in these domains is that
  segmentation networks with minimal expert effort and cost                      datasets include scarce (not enough training examples) or weak
  (intervention). We present a deep active learning framework that               image annotations (training examples are annotated at the image
  combines the use of marker-controlled watershed (MC-WS)                        level and no annotation is available at the pixel level) resulting
  algorithm to generate pseudo labels for segmentation networks (U-              in limited training data. In these settings, even the most
  Net) and active learning to significantly minimize effort and cost             advanced image segmentation models may fail to generalize
  by selecting only the most impactful training data for labeling. We            from training examples to real-world scenarios. Therefore, it is
  built the initial U-Net model by generating pseudo labels for the              important to develop solutions that can deal with scarce or weak
  training data using MC-WS. We then make use of the uncertainty                 image annotations for semantic segmentation.
  information (entropy) of each image provided by the U-Net to
  determine the most uncertain or effective images for expert                        In this paper, we propose a technique, called thrifty
  labeling. We evaluated the TAG approach using the 2012 ISBI                    annotation generation (TAG) based on the cost-effective
  Challenge dataset for 2D segmentation and a novel Biofilm                      annotation approach to build a model for semantic segmentation
  dataset. Our approach achieved promising segmentation accuracy                 of datasets for which no manual annotations are available. We
  (IoU) and classification accuracy with minimal expert                          focus on datasets where the foreground is an object of interest
  intervention. The results of our experiments also indicate that the            (such as neural cells or a biofilm on a background material
  TAG approach can be generalized to achieve high-performance                    surface). The TAG approach is based on the semi-supervised
  segmentation results on any dataset using minimal expert effort                learning with pseudo labels. It first generates pseudo labels
  and cost.                                                                      (      ) by applying the popular watershed segmentation
                                                                                 algorithm on a given unlabeled dataset . The pseudo labels are
     Keywords— Watershed algorithm, Semantic segmentation,                       used to train a model,   for semantic segmentation of .
  Pseudo labels, Biofilms, and Active learning.
                                                                                     The TAG approach uses a cost-effective active learning
                          I. INTRODUCTION                                        method based on entropy to choose images from for which
      Semantic segmentation of two-dimensional (2D) images is                    labels are obtained from experts. If the model obtained using
  one of key problems in computer vision applications in medical                        were successful in identifying distinct features within an
  and bioengineering fields. Recently semantic segmentation has                  image, then classification probabilities output by model       on
  made much progress due to the design and performance of deep                   that image would not be noisy. So, the TAG approach chooses
  convolutional models for image segmentation [1], [2]. However,                 those images whose classification probabilities when labeled by
  these advancements require large, high-quality annotated data                      have large entropy as candidates for expert annotation. The
  sets which are expensive to acquire particularly in medical and                pseudo labels are replaced by expert labels for these images to
  bioengineering domains where images need expensive                             train another classifier for semantic segmentation.

2471-7819/20/$31.00 ©2020 IEEE                                             602
DOI 10.1109/BIBE50027.2020.00103

             Authorized licensed use limited to: ASU Library. Downloaded on June 23,2021 at 15:38:12 UTC from IEEE Xplore. Restrictions apply.

We study if the proposed TAG approach would be effective II. RELATED WORK
at all by first simulating it on a scarce annotation setting. Here,
the dataset contains a large number of labeled images with A. Watershed Transform
ground truth labels ( ) obtained from experts. However, only Due to the interesting properties of the watershed transform
a small number (up to 10%) of the images from are used [3], its application has been very useful especially in medical
each time, on-demand, in training to mimic a scarce annotation image segmentations [4]–[6]. However, a well-known
setting. The scarce annotation dataset simulation study challenge of the watershed transform reported in earlier works
establishes the legitimacy of the proposed TAG approach, which was over-segmentation. In this paper, we utilize a marker-
is then be used on datasets with no annotations. controlled watershed algorithm (MC-WS) [7] to alleviate over-
So, we conduct the following two studies segmentation and obtain initial pseudo labels for training
images without labels.
x Scarce Annotation Simulation Study: Let be the
model constructed using the labeled dataset and B. Active Learning
be the (classification or segmentation) accuracy Similar to active learning frameworks [8], [9], the TAG
of . Generate the labeled dataset by applying employs an iterative approach where in each iteration the current
watershed segmentation on all the unlabeled images . model is applied to classify (segment) a set of unlabeled
Let be the model built using the dataset . instances out of which a few are selected for manual annotation
Iterative improvement: Evolve model to . Done based on the uncertainty of the model, and added to the training
only when the accuracy of built using the current set to generate the next model. Recently there has been a lot of
is less than . Iterative improvement selects a interest in developing deep active learning approaches with
few images with maximum noise from the output of CNN-based network for semantic segmentation of medical
model , produces the next by replacing pseudo- images [10]–[12] given the high cost and potential variability
labels in the current by labels from and manual image annotations.
generates the model . The TAG active deep learning approach differs from these
x No Annotation Study: In this case there are no deep active learning approaches in using an automated
annotations, i.e., is empty. Generate using segmentation method such as the watershed to generate a
watershed and and iteratively generate next model preliminary set of annotations for the entire training dataset.
and next by identifying images with highest noise Unlike the above approaches, the TAG can choose for correction
in the current model output and ask an expert to provide either the output of the watershed method or the output of the
labels for these images. The expert annotated images are model, whichever has a lower entropy. The TAG approach
added to . We continue the process of successive allows for varying amounts of expert annotations resulting in a
refinement as long as the average noise in the model that has been trained mostly on pseudo annotations unlike
classification probabilities of the model output the above approaches that require some form of human input for
decreases. Finally, the model with the best accuracy, each training data item. Further, the use of automated segmented
which is determined using accumulated , is output. images for training the model not only reduces the burden on
human annotators for deep networks but also has the potential to
In both above situations, the TAG approach is guaranteed to reduce the inter-rater variability. To the best of our knowledge
terminate since only a finite number of pseudo label to expert our work is the first application of active deep learning for
label replacements are possible. We consider the TAG approach semantic segmentation of biofilms in material science domain.
to be legitimate for a scarce annotation and therefore, applicable
to a no annotation study if the model output in the scarce III. APPROACH
annotation simulation, i) achieves accuracy ± , for a A. Datasets
small chosen threshold , ii) the average noise in the successive
model outputs is non-increasing and iii) the dataset , the 1) EM Dataset
subset of the dataset used across iterations, | | ≪ | |, The Electron Microscope (EM) data is a set of grayscale
i.e., size of is much smaller than the size of . images (512 × 512 pixels) from a serial section Transmission
Electron Microscopy data set of the Drosophila first instar larva
We applied the TAG approach to two datasets (one scarce ventral nerve cord [13]. This dataset was published as part of the
annotation simulation study and one no annotation study). The IEEE ISBI 2012 challenge on 2D segmentation. The goal of the
models built using the TAG approach were able to achieve challenge was to determine the boundary map (or binary label)
greater than 80% segmentation accuracy with less than 7% of each grayscale image, where “1” or white indicates a pixel
expert effort. inside a cell, and “0” indicates a pixel at the boundary between
The rest of this paper is organized as follows. Section II cross sections. A binary label was considered equivalent to a
discusses related work on semantic segmentation with scarce segmentation of the image. The ground truth binary labels for
labels. The description of datasets, the TAG approach, pre- the training images were provided as part of the challenge.
processing of the images using the watershed method along with 2) BF Dataset
the network architectures used for building models are described The Biofilm (BF) dataset consists of Scanning Electron
in Section III. Experimental results are presented in Section IV Microscope (SEM) images of Desulfovibrio alaskensis G20
and conclusions are discussed in Section V. (DA-G20, a sulfate reducing bacteria (SRB) and their biofilm

This work is partially supported by the NSF grant #1920954.

603

Authorized licensed use limited to: ASU Library. Downloaded on June 23,2021 at 15:38:12 UTC from IEEE Xplore. Restrictions apply.

grown on bare mild steel surfaces in batch microbiologically sure foreground and sure background regions. Marker labeling
influenced corrosion (MIC) experiments). The details of the was implemented by labeling all sure regions with positive
growth procedures and biocorrosion tests were discussed in [14]. integers and labeling all unknown (or boundary) regions with a
Owing to its high ductility, weldability, and low cost, mild steel 0. Finally, watershed was applied on the maker image to modify
remains a popular choice of metal in in civil infrastructure, the boundary region to obtain the watershed segmentation mask
transportation and oil and gas industry applications, and routine or binary label of the image.
applications. However, under aqueous conditions, mild steel is
susceptible to MIC caused by microorganisms including SRB. C. U-Net for Segmenation
The goal of semantic segmentation of BF dataset is to identify The U-Net [16] is an improved FCN which consists of an
the shape and size of each bacterial cell or a cluster of cells from encoder (contracting path) and decoder (expansive path)
each image to detect and track metal corrosion. designed specifically to perform segmentation tasks on medical
images. The contracting path is a stack of convolutional and
max-pooling layers where high-level semantic information at
each layer is acquired, while the expansive path recovers the
spatial information of the image at each layer using transposed
convolutions. Bottleneck layers combine the information from
the contracting and expansive paths by concatenating the feature
maps, resulting in a symmetrical network in contrast to
traditional FCNs. The U-Net architecture used in this paper is
similar to the one proposed by Ronneberger et. al [16] and
accepts a set of unlabeled images with corresponding binary
labels as input to train a model.
D. TAG Algorithm
The inputs to the TAG algorithm are – a set of unlabeled or
original images = { … } and an optional set of ground
truth binary labels, = { , … , }, ≤ . The output of
Fig. 1. (A)–(C) depict an EM dataset original unlabeled image, the the algorithm is a model and (binary) labels that semantically
coresponding watershed binary label of (A), and the corresponsing ground truth segment unlabeled images . In this paper, we focus on the
binary label of (A) respectively. (D), (E) depict a BF dataset original unlabeled binary segmentation of image pixels. The TAG employs an
patch, and the corresponsing watershed binary label of the original patch. iterative algorithm that uses a sequence of training set of pseudo
labels = ( , … , ), to build a sequence of models
B. Pre-Processing and Watershed Segmentation ( , … , ), that are used to produce a sequence of sets of
Every input image in the EM and BF datasets was considered binary labels = ( , … , ) for . The model is generated
as an unlabeled image. Contrast limited adaptive histogram at the iteration using the training set . The model is applied
equalization [15] was applied to improve edge definitions and to the set to generate a set of binary labels = ( , … , ).
contrast. To account for the low data volume in the BF dataset, The binary labels, are used to segment (annotate) the
each unlabeled training image was divided into non-overlapping corresponding images in using model . The algorithm
patches each 128 × 128 pixels in height and width. Next a uses a set of ground truth labels to successively refine the pseudo
marker-controlled watershed algorithm (MC-WS) [7] along labels. Let denote the set of ground truth labels used across
with distance transform was applied to the processed EM images all the iterations. Initially, = {}.
and patches from the BF dataset respectively to automatically
generate binary labels corresponding to each image and each The TAG algorithm also takes a parameter as its input,
patch. Finally, every patch and its corresponding binary label which specifies the number of images for which are obtained
from the BF dataset was resized to 512 × 512 pixels in height (from experts) in each step of the iteration.
and width. We use the term image to refer to images as well as The main steps of the TAG algorithm are given below:
patches henceforth in the paper.
1. Generate Initial Model ( ): Create an ensemble of
Noise and local irregularities often lead to over-
three watershed segmentations , , and
segmentation while using watershed transform. The MC-WS
apply it to the set to generate labels
enhancement is used to flood the topographic image surface
, , . Use majority voting to determine
from a pre-defined a set of markers, thereby preventing over-
the initial set of pseudo binary labels, . Train a
segmentation. To apply MC-WS on each image, an approximate
segmentation network on pair < , > to obtain
estimate of the foreground objects in the image was first found
using binarization. White noise and small holes in the image initial model .
were removed using morphological opening and closing, 2. Generate Next Models using Experts: Apply model
respectively. In order to extract the sure foreground region of the (1 ≤ ≤ ) to to generate the next set of binary
image, distance transform was then used to apply a threshold. In labels . Each element of is a set of binary labels,
order to extract the sure background region of the image, dilation one per image in . Identify elements, ( , … , )
was applied on the image. Finally, the boundaries of the from with the highest entropy value, calculated using
foreground objects were computed as the difference between the prediction confidence values obtained from output of

604

Authorized licensed use limited to: ASU Library. Downloaded on June 23,2021 at 15:38:12 UTC from IEEE Xplore. Restrictions apply.

. Obtain expert annotated binary labels ( , … , ) The experiments were carried out in a two-stage approach –
corresponding to each of these elements of U. Add by 1) evaluating the TAG approach using the EM dataset and 2)
( , … , ) to . Generate the next training set studying the effectiveness of the TAG approach on the BF
by replacing binary labels corresponding to ( , … , ) dataset. For both EM and BF datasets, thresholds of 100, 110
in the training set with expert annotated ground truth and 120 were used while implementing MC-WS. Note that
binary labels ( , … , ) respectively and generate model built on the < , > pair was built using the initial
pseudo labels, for any value of . Fig. 1 (A) and (B) show
next model by training the segmentation network
an unlabeled image from the EM dataset and its pseudo label in
on pair < , >.
T1. Fig. 1 (D) and (E) show an unlabeled image from the BF
3. Test and Terminate: Apply on to generate the dataset and its pseudo label in .
next set of masks . When ( )>
B. Evaluation Metrics
( ) , i.e., the confidence of model is
lesser than that of stop. The decrease in model We evaluated the results of the TAG approach using
confidence indicates that the model is unable to learn intersection over union (IoU) and classification accuracy. IoU
(also known as segmentation accuracy) measures the percentage
any new patterns during training at the ( + 1)
of overlap between the ground truth labels and the predicted
iteration. Evaluate the performance of all + 1 models
outputs given by (2) below. IoU is preferred over classification
using intersection over union (IoU) and accuracy. The
accuracy when there are only a few pixels in an image
accuracy of the models is calculated using all available
representing objects. In such a case the overlap between the
ground truth labels (i.e., ∪ ). Choose the
ground truth and the prediction pixels measured how many of
with the highest mean IoU and mean accuracy as the
the pixels representing objects were classified correctly by the
best or most thrifty model to obtain binary labels for
model. Classification accuracy, given in (3) includes both true
using the least expert intervention.
negatives and true positives, giving a more balanced measure of
Entropy, a measure of image information content can be the model performance.
understood as the average degree of uncertainty in the image.
Higher entropy values highlight images in the data that are ( )=
important or interesting in terms of exhibiting more variation or =
change in their local neighborhood compared with other images.
The entropy of an image is found by applying the following C. EM Dataset
formula to the entire image: For the EM dataset, we used the values = {1, 3} to apply
−∑ log the TAG algorithm and generated three more models ,
and for every value of . The values of were chosen to
where is the number of gray levels (usually 256 for 8 bit reflect the minimum possible value ( = 1) and 10% ( = 3) of
images but in this paper, we bucketed 256 levels further into 10 the training set.
levels), is the probability of a pixel having gray level and
is the base of the logarithm function (here = 2 ). Above,
denotes the mean entropy over a set of images.
Note that in the scarce annotation simulation study, the
inputs of the algorithm include the optional input , where
| | = . In this case, the ground truth labels in each iteration
are generated by a simple lookup of and these labels are
accumulated in . In the no annotation study, | | = 0 and Fig. 2. Output of TAG approach on Fig. 1(A). (A) Binary label from
the ground truth labels in each iteration are obtained by querying , = 1 and (B) Segmentation of Fig. 1(A) using binary label . (C) Binary
label from , = 1 (D) Segmentation of Fig. 1(A) using binary label .
the expert. The size of the set provides a measure of the
human annotation effort involved. In order to generate models for = {1,3} we first
IV. EXPERIMENTS & RESULTS computed the entropy for all binary labels obtained from
using the entropy formula given by (1). binary labels with the
A. Setup highest entropy were then picked to be replaced with the
Training for the U-Net models was implemented using Keras corresponding expert annotations in set to generate training
with a Tensorflow backend as the deep learning framework on set , i.e., for = 1, binary label with highest entropy was
an Ubuntu workstation with 12-Core Intel iO-9920x and 128GB replaced with corresponding (expert annotations) to
RAM. A random selection of × 0.3 was used in the generate training set for training . Similarly, for = 3,
iteration for validation within 25 epochs having a batch size binary labels ( , ) with top-3 highest entropies
of 16 and the prediction of the model was tested on . The were replaced with corresponding expert annotations
model was then compiled with Adam [17] optimizer using ( , ) to generate training set for training .
binary cross entropy loss function since each pixel gets either a To generate training set for training , binary labels with
“0” or “1” value. We used early-stop mechanism on the highest entropy are replaced in for corresponding values.
validation set to avoid over-fitting.

605

Authorized licensed use limited to: ASU Library. Downloaded on June 23,2021 at 15:38:12 UTC from IEEE Xplore. Restrictions apply.

Fig. 3. (A) Classification accuracy of all models on EM dataset (B) Segmentation accuracy (IoU) and (B) Classification accuracy of all models on BF dataset while
applying the TAG approach.

By picking the binary labels with higher entropy we The image level comparison of IoU values between binary
intuitively replaced the labeled images exhibiting high labels generated by the model , = 1 and built using all
classification uncertainty with the corresponding ground truth of the ground truth labels can be found in Fig. 4. The figure
labels to train the next model thereby reducing – 1) the overall shows two bar plots for each training image in the EM dataset.
uncertainty of segmentation output and 2) the need for expert The Y-axis plots the IoU values. The height of each bar shows
annotations for all input images. the IoU between the binary label generated from the model (
or ) and the ground truth label. The mean IoU of , = 1
Fig. 2 illustrates the output of models constructed in an
surpassed the mean IoU of by 0.7%. However, the IoU of
iterative manner for one image, , depicted in Fig. 1(A) from
is higher than that of , = 1 for 12 out of 30 images. The
the EM dataset. Fig. 2(A) and Fig. 2(B) show the binary label
IoU of is lower than that of , = 1 for 9 out of 30 images.
of from , = 1 and the segmentation of using 21 .
For the remaining 9 images, the IoU values computed from both
Fig. 2(D) shows the label generated by the model , = 1,
models are approximately the same.
constructed in the iterative step, Step 2 of the TAG approach.

TABLE I. ENTROPY CHANGE ON APPLYING THE TAG APPROACH
EM Dataset BF Dataset
Model = . = .
= = = =
2.568 2.658 1.855 1.721
2.423 2.465 1.589 1.712
2.576 2.897 1.581 2.917
- - 1.883 -

From Table I. we can observe the entropy values of the
segmentation labels for each model constructed iteratively by Fig. 4. Segmentation accuracy (IoU) of . , = 1 on the EM dataset
the TAG approach. Initially, the binary labels computed from while applying the TAG approach.
had the highest mean entropy of 3.029. Model , = 1 has
the lowest mean entropy of 2.423. However, the mean entropy Although Fig. 3(A) and Fig. 4 show a weak link between
of predicted labels for , = 1 increases to 2.576 and the TAG mean entropy and mean classification accuracy, i.e., higher the
approach terminates. We also observe how mean entropy values entropy, lower the classification accuracy, the link did not hold
decrease for models , , and increases for when = 3. when we ran more experiments. More study is required to
The TAG approach terminates at this point. establish the presence or absence of a relationship between these
two measures.
Fig. 3(A) shows the classification accuracy of all the models
computed by the TAG approach for the EM dataset. The mean From these experimental results, we established that the
classification accuracy of model (constructed using all of the model computed by the TAG approach generated binary labels
with optimal IoU values for 70% of the training images. It also
ground truth labels) is the highest at 0.828. Model , = 1 has
has the lowest entropy as well as the highest IoU and mean
recorded the highest mean classification accuracy at 0.832 of all
accuracy, higher than the mean accuracy obtained from using all
the models computed by the TAG approach. This is slightly
of the available ground truth labels ( ). Thus, we achieved the
higher (0.004) than that of which may be somewhat
best performance using < 7% expert intervention (only 2 labels
surprising and needs further study.
out of 30 labels and | | = 2 ). These experiments
Although the mean entropy for , = 1 and , = 3 are establish the legitimacy of the TAG approach.
close to , = 1 (2.576 and 2.465 respectively), their mean
classifications accuracies are significantly lower (0.649 and D. BF Dataset
0.799 respectively) than , = 1 and they involve more For the BF dataset, we adjusted the values since the size of
replacements (or additional expert intervention). Hence their the BF dataset was different than that of the EM dataset to =
performance is not optimal in line with our ‘thrifty’ approach. {1, 16} consistent with using the minimum possible value and

606

Authorized licensed use limited to: ASU Library. Downloaded on June 23,2021 at 15:38:12 UTC from IEEE Xplore. Restrictions apply.

10% of the training set to apply the TAG algorithm. We first                        using < 7% expert intervention. Next, we applied the TAG
compute model       using the pseudo labels obtained from the                       approach on a novel Biofilm dataset and attained an IoU of 0.809
MC-WS algorithm. Then, we generated three more models           ,                   using < 2% expert intervention. To the best of our knowledge
    and     for = 16 and four more models           ,   ,    and                    this is the first application of active deep learning for semantic
    for = 1 to reach the terminating condition for the TAG                          segmentation of biofilms, specifically the microbial corrosion
approach. Note that we could not compute       for the BF dataset                   domain. The results of our extensive experiments using the TAG
as we did not have access to    for the entire dataset.                             approach demonstrated that high-performance segmentation
                                                                                    output can be achieved on any dataset with limited or minimal
    Table I. illustrates the entropy results of the TAG approach                    expert effort and cost.
on the BF dataset.       had the lowest mean entropy of 1.048.
Since we could not compute , we needed to construct more                                We plan to study the proposed TAG approach further by
models to see the gradient of mean entropy. For = 1, mean                           evaluating it on more benchmark datasets and fine-tuning the U-
entropy of      increases to 1.855 and drops to 1.581 at     only                   Net architecture to achieve state-of-the-art performance. We
to spike to 1.883 at         meeting the terminating condition.                     also plan to evaluate the model performance in terms of other
Similarly, for = 16, mean entropy of         decreases from 1.721                   metrics like pixel errors, and random errors.
( ) to 1.712 and increases to 2.917 at               meeting the
terminating condition.                                                                                                 REFERENCES
                                                                                    [1]   Y. Guo, Y. Liu, T. Georgiou, and M. S. Lew, “A review of semantic segmentation
    Although all of these models had higher mean entropy                                  using deep neural networks,” Int. J. Multimed. Inf. Retr., vol. 7, no. 2, pp. 87–93,
values when compared to , we needed additional measures                                   Jun. 2018.
such as IoU and classification accuracy to establish the best                       [2]   N. Tajbakhsh, L. Jeyaseelan, Q. Li, J. N. Chiang, Z. Wu, and X. Ding, “Embracing
model. In contrast to the EM dataset, the BF dataset came with                            Imperfect Datasets: A Review of Deep Learning Solutions for Medical Image
                                                                                          Segmentation.”
only 50% ground truth binary labels, . Hence, we used this
                                                                                    [3]   A. Kornilov and I. Safonov, “An Overview of Watershed Algorithm
50%      to estimate IoU and accuracy for all the models as                               Implementations in Open Source Libraries,” J. Imaging, vol. 4, no. 10, p. 123, Oct.
shown in Fig. 3(B) and Fig. 3(C). , = 1 recorded the highest                              2018.
mean IoU of 0.809 and mean accuracy of 0.626 across all                             [4]   H. P. Ng, S. H. Ong, K. W. C. Foong, P. S. Goh, and W. L. Nowinski, “Medical
models. Though , = 16 obtained a high mean IoU of 0.772,                                  Image Segmentation Using K-Means Clustering and Improved Watershed
the mean accuracy of the model was only 0.464.                                            Algorithm,” 2006 IEEE Southwest Symp. Image Anal. Interpret., no. February 2001,
                                                                                          pp. 61–65, 2001.
    The difference in segmentation outputs of              for both                 [5]   V. Grau, R. Kikinis, M. Alcañiz, and S. K. Warfield, “Cortical gray matter
values of can be seen in Fig. 5. Moreover, , = 16 took a                                  segmentation using an improved watershed transform,” in Annual International
total 48 label replacements over four iterations to achieve a mean                        Conference of the IEEE Engineering in Medicine and Biology - Proceedings, 2003,
                                                                                          vol. 1, pp. 618–621.
IoU comparable to        , = 1, whereas        , = 1 took only 3
label replacements (|      | = 3) over four iterations to record the                [6]   V. Grau, A. U. J. Mewes, M. Alcañiz, R. Kikinis, and S. K. Warfield, “Improved
                                                                                          watershed transform for medical image segmentation using prior information,”
highest mean IoU and mean accuracy. Thus, we chose             , =                        IEEE Trans. Med. Imaging, vol. 23, no. 4, pp. 447–458, Apr. 2004.
1 as the thriftiest model to obtain binary labels for segmentation                  [7]   F. Meyer and S. Beucher, “Morphological segmentation,” J. Vis. Commun. Image
using MC-WS and < 2% expert intervention i.e., experts                                    Represent., vol. 1, no. 1, pp. 21–46, Sep. 1990.
provided       labels only for 3 out of the 160 training images.                    [8]   D. A. Cohn, Z. Ghahramani, and M. I. Jordan, “Active learning with statistical
                                                                                          models,” J. Artif. Intell. Res., vol. 4, pp. 129–145, Mar. 1996.
                                                                                    [9]   B. Settles, “Computer Sciences Department Active Learning Literature Survey,”
                                                                                          2009.
                                                                                    [10] L. Yang, Y. Zhang, J. Chen, S. Zhang, and D. Z. Chen, “Suggestive Annotation: A
                                                                                          Deep Active Learning Framework for Biomedical Image Segmentation,” Lect. Notes
                                                                                          Comput. Sci., vol. 10435 LNCS, pp. 399–407, Jun. 2017.
                                                                                    [11] M. L. di Scandalea, C. S. Perone, M. Boudreau, and J. Cohen-Adad, “Deep Active
                                                                                          Learning for Axon-Myelin Segmentation on Histology Data,” Jul. 2019.
Fig. 5. Output of TAG approach on Fig. 1 (D). (A) Binary label     from ,           [12] T. Kim et al., “Active learning for accuracy enhancement of semantic segmentation
  = 16 and (B) Segmentation of Fig. 1 (D) using binary label   . (C) Binary               with CNN-corrected label curations: Evaluation on kidney segmentation in
label    from , = 1 (D) Segmentation of Fig. 1 (D) using binary label     .               abdominal CT,” Sci. Rep., vol. 10, no. 1, pp. 1–7, Dec. 2020.
                                                                                    [13] I. Arganda-Carreras et al., “Crowdsourcing the creation of image segmentation
               V. CONCLUSIONS & FUTURE WORK                                               algorithms for connectomics,” Front. Neuroanat., vol. 9, no. November, pp. 1–13,
                                                                                          Nov. 2015.
    In this paper, we presented a new framework for biomedical                      [14] G. Chilkoor et al., “Maleic anhydride-functionalized graphene nanofillers render
image and biofilm segmentation by combining semi-supervised                               epoxy coatings highly resistant to corrosion and microbial attack,” Carbon N. Y.,
learning and active learning with minimal expert intervention.                            vol. 159, pp. 586–597, Apr. 2020.
Our new method provides two main contributions – (1) a MC-                          [15] K. Zuiderveld, “Contrast Limited Adaptive Histogram Equalization,” in Graphics
WS based approach that can generate pseudo labels for images                              Gems, 1994, pp. 474–485.
without annotation labels, to build segmentation models; (2) a                      [16] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for
                                                                                          biomedical image segmentation,” in Lecture Notes in Computer Science, 2015, vol.
cost-effective (thrifty) annotation generation approach that can                          9351, pp. 234–241.
then direct expert intervention to the most effective label areas
                                                                                    [17] D. P. Kingma and J. L. Ba, “Adam: A method for stochastic optimization,” in 3rd
to achieve high-performance segmentation output. We first                                 International Conference on Learning Representations, ICLR 2015 - Conference
validated the TAG approach using the 2012 ISBI Challenge                                  Track Proceedings, 2015.
dataset for 2D segmentation and achieved an mean IoU of 0.807

                                                                              607

            Authorized licensed use limited to: ASU Library. Downloaded on June 23,2021 at 15:38:12 UTC from IEEE Xplore. Restrictions apply.

You can also read