Revisiting animal photo-identification using deep metric learning and network analysis
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Received: 19 June 2020 | Accepted: 28 January 2021 DOI: 10.1111/2041-210X.13577 RESEARCH ARTICLE Revisiting animal photo-identification using deep metric learning and network analysis Vincent Miele1 | Gaspard Dussert1 | Bruno Spataro1 | Simon Chamaillé-Jammes2,3,4 | Dominique Allainé1,4 | Christophe Bonenfant1,4 1 Université de Lyon, Université Lyon 1, CNRS UMR5558, Laboratoire de Biométrie Abstract et Biologie Évolutive, Villeurbanne, France 1. An increasing number of ecological monitoring programmes rely on photographic 2 CEFE, Univ Montpellier, CNRS, EPHE, IRD, capture–recapture of individuals to study distribution, demography and abun- Univ Paul Valéry Montpellier 3, Montpellier, France dance of species. Photo-identification of individuals can sometimes be done using 3 Department of Zoology & Entomology, idiosyncratic coat or skin patterns, instead of using tags or loggers. However, Mammal Research Institute, University of Pretoria, Pretoria, South Africa when performed manually, the task of going through photographs is tedious and 4 LTSER France, Zone Atelier ‘Hwange’, rapidly becomes too time-consuming as the number of pictures grows. Hwange National Park, Dete, Zimbabwe 2. Computer vision techniques are an appealing and unavoidable help to tackle this Correspondence apparently simple task in the big-data era. In this context, we propose to revisit Vincent Miele animal re-identification using image similarity networks and metric learning with Email: vincent.miele@univ-lyon1.fr convolutional neural networks (CNNs), taking the giraffe as a working example. Funding information 3. We first developed an end-to-end pipeline to retrieve a comprehensive set of re- French National Center for Scientific Research (CNRS); Statistical Ecology identified giraffes from about 4,000 raw photographs. To do so, we combined Research Group (EcoStat) CNN-based object detection, SIFT pattern matching and image similarity net- Handling Editor: Robert Freckleton works. We then quantified the performance of deep metric learning to retrieve the identity of known individuals, and to detect unknown individuals never seen in the previous years of monitoring. 4. After a data augmentation procedure, the re-identification performance of the CNN reached a Top-1 accuracy of about 90%, despite the very small number of images per individual in the training dataset. While the complete pipeline suc- ceeded in re-identifying known individuals, it slightly under-performed with un- known individuals. 5. Fully based on open-source software packages, our work paves the way for fur- ther attempts to build automatic pipelines for re-identification of individual ani- mals, not only in giraffes but also in other species. KEYWORDS deep metric learning, image similarity networks, individual identification, open-source software 1 | I NTRO D U C TI O N of animals in wild populations (Clutton-Brock & Sheldon, 2010; Hayes & Schradin, 2017). At the heart of such monitoring is the In many respects, population and behavioural ecology have im- ability to recognize individuals. Individual identification is often mensely benefited from individual-based, long-term monitoring achieved by actively marking animals, such as deploying ear-t ags or Methods Ecol Evol. 2021;00:1–11. wileyonlinelibrary.com/journal/mee3© 2021 British Ecological Society | 1
2 | Methods in Ecology and Evolu on MIELE et al. leg rings, cutting fingers or feathers, or scratching scales in reptiles species (Bogucki et al., 2019; Bouma et al., 2019; Chen et al., 2020; (Silvy et al., 2005). In some species, however, individuals display Ferreira et al., 2020; Hansen et al., 2018; He et al., 2019; Körschens natural marks that make them uniquely identifiable. For instance, et al., 2018; Moskvyak et al., 2019; Schneider et al., 2020; Schofield many large African mammals such as leopard Panthera pardus, zebra et al., 2019), re-identification remains a challenging task when ap- Equus sp., kudu Tragelaphus strepsiceros, wildebeest Connochaetes plied to animals in the wild where re-observations are limited in taurinus or giraffe Giraffa camelopardalis, all present idiosyncratic number to train the model satisfactorily sensu largo (Schneider fur and coat patterns. Non-invasive and reliable identification of et al., 2019). individuals in the wild has long been known to be feasible from In practice, current CNN-based approaches have to be tailored comparisons of these distinctive coat patterns (Estes, 1991). As the to the needs of field ecologists interested in using these tools for number of individuals to identify increases, however, people-based individual recognition. For instance, batches of new images are reg- visual comparisons of pictures can rapidly become overwhelming. ularly added to the reference database following yearly fieldwork With the recent move to digital technologies (namely digital cam- sessions because of the recruitment of newborns or of immigrants if eras and camera traps), the problem becomes even more acute as the study population is demographically open. Therefore, we expect the number of pictures to process can easily reach the thousands the re-sighting of known individuals, as well as the observation of or ten of thousands. individuals never seen before. In other words, this standard sampling Over the last decade, the use of computer vision rapidly spread design implies to solve the re-identification in a mixture of known into biological sciences to become a standard tool in animal ecol- and unknown individuals. Chen et al. (2020) referred to this problem ogy for many repetitive tasks (Weinstein, 2018). In a seminal as the ‘open set’ identification problem, and they proposed to iden- publication, Bolger et al. (2012) first presented computer-aided tify images from unknown individuals and to assign them a single photo-identification, initially for giraffes but more recently applied ‘unknown’ label. Automatically identifying currently unknown indi- to dolphins (Renó et al., 2019). The underlying computer tech- viduals speeds up the picture sorting process, and facilitates adding nique is a feature matching algorithm, the Scale Invariant Feature them to the database of individuals whose life history is monitored. Transform operator (SIFT; Lowe, 2004), where each image is asso- A classical CNN classifier can re-identify already known individ- ciated with the k-nearest best matches. The current use of SIFT for uals (usually with a softmax last layer) but will fail to identify new ecologists requires human intervention to validate the proposed individuals because the number of predicted classes must match the candidate images within a graphical interface (Bolger et al., 2011). number of known individuals. We therefore crucially need a CNN- In the same vein, other feature-based proposals were developed in based approach that can filter out individuals unknown at the time the last decade to apply computer vision to different types of idio- of the analysis. We propose to rely on deep metric learning (DML, syncrasies (Hartog & Reijns, 2014; Moya et al., 2015). A drawback see Hoffer & Ailon, 2015) as an ideal candidate to solve the ‘open of the method frequently arises when two images are considered set’ identification problem. DML consists in training a CNN model similar not because of similar skin or coat patterns of animals, but to embed the input data (input images) into a multidimensional because of similarities in the backgrounds (presence of distinctive Euclidean space such that data from a common class (e.g. images of tree for instance), hence leading to false-positive matches. For the a given individual) are, in terms of Euclidean distance, much closer best results with computer vision, all images should be cropped than with the rest of the data. before so that only the relevant part of the animal appears in the Here, we addressed the problem of photo-identification with images to be analysed and compared (e.g. excluding most of the an updated, open-s ource and end-to-e nd automatic pipeline ap- neck, head, legs and background for large herbivores). Until now, plied to the case of the iconic, endangered giraffe. In the first this cropping operation was most often done manually (Halloran step, we applied state-of-t he art techniques for object detection et al., 2015), despite being a highly time-consuming task when pro- with CNNs (Lin et al., 2017) to automatically crop giraffe flanks of cessing thousands of images. about 4,000 raw photographs shot in the field at Hwange National Meanwhile, the Deep Learning (DL) revolution was underway in Park, Zimbabwe. Indeed, the most recent CNN approaches clearly computer vision, showing breakthrough performance improvements outperformed other approaches (Girshick et al., 2014), including (Christin et al., 2019). In particular, convolutional neural networks the Histogram of Oriented Gradients (HOG) approach that was (CNNs) are now the front-line computer technique to deal with a recently used with giraffes too (Buehler et al., 2019). Second, fol- large range of image processing questions in ecology and environ- lowing Bolger et al. (2012), we used the SIFT operator to calculate mental sciences (Lamba et al., 2019). Many recent studies tackle the a numeric distance between all pairs of giraffe flanks. From the general problem of re-identification using CNNs, which has been n × n calculated distances, we followed the new framework of mostly developed and extensively used for humans (Wu et al., 2019). image similarity network (Wang et al., 2018) and applied unsu- Technically, re-identification consists in using a CNN to classify im- pervised learning to retrieve different clusters of images coming ages of different individuals, some of them being not necessarily from different individuals, hence removing any human interven- seen before, that is, unknown individuals. However, despite the tion in the process of individual identification. Third, we manually availability of proven and efficient techniques (Zheng et al., 2016), validated a subset of our results to build a ground-t ruth dataset and several successful attempts to apply the method to non-human of different individuals (n = 82). Using this dataset as a training
MIELE et al. Methods in Ecology and Evolu on | 3 set, we developed a supervised learning strategy using CNNs learning is a specific method aiming at training a CNN on a small and evaluated its predictive accuracy with a cross-v alidation number of images that do no start CNN training ‘from scratch’ with procedure. some random model parameters, but uses the parameters of a model previously trained on a large dataset and for similar tasks as the one of interest (Willi et al., 2019). This approach works because the pre- 2 | M ATE R I A L S A N D M E TH O DS trained model has already learnt a wide range of relevant and generic features. 2.1 | Photograph database We manually prepared our training dataset by cropping bounding boxes around giraffe flanks, excluding most of the neck, head, legs We carried out this study in the northeast of Hwange National Park and background, with the labelImg open-source program for image (HNP), Zimbabwe. HNP park covers a 14,650 km2 area (Chamaillé- annotation (https://github.com/tzutalin/labelImg). We performed Jammes et al., 2009). The giraffe sub-species currently present in transfer learning with RetinaNet to detect a single object class, the HNP could be either G. c. angolensis or G. c. giraffa according to the giraffe flank, from a pre-trained model shipped with RetinaNet, that IUCN (Muller et al., 2018). Here, we used data from a regular moni- is a ResNet50 backbone trained on the COCO dataset (80 differ- toring of individuals conducted between 2014 and 2018. Each year ent classes of common objects including giraffes among a few other for at least three consecutive weeks, we drove the road network animal species; see Lin et al. (2014). We trained the model with 30 daily within
4 | Methods in Ecology and Evolu on MIELE et al. 2.3.2 | Image similarity network, community can be used for machine learning tasks. In this context, we trained detection and clusters of images a CNN model using the triplet loss (Hermans et al., 2017), in line with recent studies on other species (Bouma et al., 2019; Moskvyak Following the computation of distances between all pairs of giraffe et al., 2019). The triplet loss principle relies on triplets of images flanks obtained with the SIFT operator approach, we searched for composed by a first image called anchor and another positive image clusters of flank images that should come from one single individ- of the same class (same giraffe here) and a third negative image of an- ual giraffe. We first defined a network made of nodes and repre- other class (any different giraffe; see Bouma et al., 2019, for details). senting giraffe flank images, and of edges: we considered that two The training step consists in optimizing the CNN model such that nodes were connected by an edge, that is, two flanks were similar the Euclidean distance computed using the last CNN layer (hereaf- and came from the same giraffe if the SIFT-based distance between ter called CNN-based distance) between any anchor and its positive paired images felt below a given threshold (see below for more de- image is minimal while maximizing the distance between this anchor tails). Therefore, the so-called connected components of this network image with its negative counterpart. We used an improved algorithm should associate images from different individuals. called semi-hard triplet loss (Schroff et al., 2015) that deals only with We estimated this distance threshold value by taking advantage triplets where the positive and negative images are close (in other of a property of complex networks called the explosive percolation words, the ‘hard’ cases), using the TripletSemiHardLoss function in (Achlioptas et al., 2009). The explosive percolation predicts a phase TensorFlow Addons. After training completion, we computed the transition of the network just above a distance threshold point. At Euclidean distances between any pair of giraffe flank photographs, this point, adding a small number of edges in the network, for ex- again using the vector composing the last layer of our CNN model. ample by slightly increasing the distance threshold (Hayasaka, 2016), leads to the sudden appearance of a giant component encompassing the majority of nodes. In other words, at some point, a small increase 2.4.2 | Data augmentation, training and test datasets in the distance threshold leads to considering that almost all images come from the same giraffe. We determined this threshold value We derived the training and test datasets required for the CNN graphically, selecting the transition point where the giant component approach from the photograph clusters identified by the SIFT al- starts to increase dramatically (Supporting Information Figure S2). gorithm. We retained only those clusters fulfilling the following An additional issue arose when different nodes were erroneously conditions: (a) the cluster contains a minimum of two sequences of connected (example in Figure S1), that is, when two flanks were errone- images shot at least 1 hr apart; (b) the cluster can be divided into a ously considered similar. Moreover, in some cases, the body of two or first set of sequences large enough to perform training (we imposed more giraffes could overlap in one photograph. In this situation, two or at least five images), and a second set of sequences; (c) the cluster more nodes might be linked by edges, when we actually should consider demonstrated a perfect and verified consistency. We used the first different giraffes. To solve this problem, we applied a network cluster- set of sequences for CNN training, and the second as an independ- ing algorithm called community detection, developed in network science ent test dataset to assess the model performance. The first condi- (Fortunato, 2010), to split—only when relevant—any connected com- tion ensured that we have complete independence between training ponent into different groups of nodes that are significantly much more and test datasets, that is, giraffes being seen under different con- connected between themselves than with the others, the so-called com- ditions (time, season or location). The third condition is of upmost munity. Indeed, the presence of many edges inside a group of images importance because errors in the dataset would lead to sub-optimal suggested it was consistent and taken from the same individual, whereas performances of the machine learning approach. We therefore care- the absence of many edges between two groups clearly informed about fully checked, manually, that the SIFT-based clusters we used in the their inconsistency and heterogeneity (i.e. from two different individu- CNN were perfectly unambiguous. We achieved this high level of als). We applied the community detection with the InfoMap algorithm data quality by discarding all cases where two or more giraffes over- (Rosvall & Bergstrom, 2008). The final product of the community detec- lapped on the same frame, or when giraffes were indifferently ori- tion algorithm was a set of clusters of images corresponding either to a ented from the back to the front (orientation ambiguities). connected component or to a community retrieved by InfoMap. We cropped all flank images to focus on the central part of the flank, keeping 80% of the original width and 60% of the height (in particular excluding the neck and its background). By doing so, we 2.4 | Re-identification of individuals, using wanted to prevent our CNN model from capturing background noise. supervised learning Additionally, we homogenized contrast of images by normalizing the three colour channels using the imagemagick package (normalize op- 2.4.1 | Deep metric learning and triplet loss tion; https://imagemagick.org). In a final step, we resized all images with CNN to 224 × 224 pixels. We ended up with five flanks per individual at least, and a me- The principle of deep metric learning is to find an optimal way to pro- dian of seven (Table 1) in the training set. This particularly low number ject images into an Euclidean space such that the Euclidean distance of images available to train the CNN led us to consider the few shot
MIELE et al. Methods in Ecology and Evolu on | 5 TA B L E 1 Flank images were selected to ensure independence between test images and representative ones when they came from of observation, and then used for individual giraffe re-identification the same known individual. Similarly, we calculated the CNN-based from coat patterns with a convolutional neural network. We distance between representative images and images of the so-called tabulated the average number (and the associated range in squared brackets) of images and sequences (i.e. separated by at least 1-hr unknown individuals. We also considered that two images can come interval) per individual in the train, test and unknown datasets over from the same individual if their distance was below a given thresh- 10 trials old. This distance threshold was a stringency condition that arbi- trarily varied between 0 and 1. Nb. Nb. Nb. images sequences We quantified the predictive performance of the trained CNN Nb. images indiv. per indiv. per indiv. model on the range of distance threshold values. First, we computed Train 503 [479–529] 62 7 [5–24] 2 [1–5] Top-1 accuracy for known individuals, consisting in checking for each query image if a representative image from the same individual was the Test 121 [118–126] 62 2 [1–5] 1 [1–4] one with smallest distance (i.e. the Top-1 image) and with a distance Unknown 40 20 2 2 indiv. below the threshold. In the following, Top-1 accuracy was also called true-positive (TP) rate. Then, we computed the false-positive rate (FP), learning framework, a class of problems where only a few images are checking cases where the Top-1 image was from a different individ- available for training. We implemented a 10-fold data augmentation ual. Finally, we quantified the CNN ability to sort out images from un- procedure where we made extensive use of image augmentation known individuals. Again, over the range of distance threshold values, using the imgaug Python library (https://github.com/aleju/imgaug). we checked if Top-1 image of unknown individual images felt below For each image in the training dataset, we performed a random set of the threshold. If not, we considered that we successfully detected an transformations such as modifying orientation and size, adding blur, unknown individual, hence computing the true-negative (TN) rate. performing edge detection, adding Gaussian noise and modifying co- lours or brightness (details in the available Python code). We finally used this set of 11 images per original image to train our CNN model, 3 | R E S U LT S that is, the original one and ten modified versions of this image. 3.1 | From thousands of photographs to thousands of images of giraffe flank 2.5 | Evaluation of CNN-based re-identification We trained the object detection method with RetinaNet (Lin To quantify the overall predictive performance of our CNN deep met- et al., 2017) on a set of 400 photographs for which the cropping of ric learning, we replicated the following procedure 10 times. We first the 469 giraffe flanks have been previously done manually. Training randomly selected 25% of the individuals of the dataset and, for the took approximately 30 min on a Titan X card. When applying the purpose of the evaluation here, considered these as unknown indi- automatic cropping procedure on our 3,940 photographs (see viduals. Then, for each of them, we randomly selected two images, Figure 1a), we retrieved 5,019 images with associated bounding one in each of the sequences (see above). With this dataset, we aimed boxes, supposed to contain a single giraffe flank (see Figure 2a). The to test the ability of the CNN model to detect unknown individuals. cropping failed for 186 photographs (failure rate: 4.7%), mostly due The remaining 75% individuals were considered known individuals. to foreground vegetation and, unusual and difficult orientation of gi- For these known individuals, we selected all photographs from the raffes in the photograph (see examples on Figure 1b). In a few cases, first sequence and used it to built a training dataset for the CNN. We a bounding box could contain the bodies of two overlapping giraffes, kept all images from the remaining sequences as the test dataset for one being partially in front of the other (see Figure 2a). Similarly, in known individuals. This ensured a good independence between train- some rare instances, giraffes were standing very close to each other ing and test data, mostly thanks to the 1 hr (at least) time lag between on a photograph, a situation where RetinaNet could fail in retrieving observations. Once the selection of individuals was completed, we the exact boundaries of each giraffe flank (see the worst case that we performed transfer learning using the pre-trained model ResNetV2 experienced, from a partially blurry photograph in Figure 2b). readily available in Keras. We estimated the model parameters using the augmented training dataset with 80 epochs with batches of size 42. We used the stochastic gradient descent optimizer with a rate of 3.2 | From thousands of images down to 0.2. Our pipeline was implemented with Keras 2.3.0. hundreds of identified individuals To mimic re-identification per se, literally re-seeing known in- dividuals, we considered that we had a ‘reference book’ with five Running the SIFT algorithm (Lowe, 2004) to compare all pairs of representative images per known individuals: these images were flanks took about 800 CPU hours of heterogeneous computing re- randomly drawn out of the training dataset. We then calculated the sources. We estimated the threshold value for the giant component CNN-based distance between these representative images and each (see Section 2) at a distance of 340 (see Figure S2a), and obtained image from the test dataset. In essence, we expected small distances an image similarity network composed of 5,019 nodes and 11,249
6 | Methods in Ecology and Evolu on MIELE et al. (a) (a) 2,500 Occurences 1,000 0 0 1 2 3 4 5 6 7 8 Number of identified giraffe flanks/image (b) Head Backlight No giraffe Fuzzy (b) Back-side Not explained Front-side Too far Foreground vegetation F I G U R E 2 Examples of automatic cropping of giraffe photographs with RetinaNet to retrieve the flank of the animal F I G U R E 1 Performance of RetinaNet flank detection of giraffes body (red squares). Photographs were shot at Hwange National from a set of 3,940 photographs taken at Hwange National Park, Park, Zimbabwe, between 2014 and 2018. In (a) the best-case Zimbabwe, between 2014 and 2018. In total, we could extract scenario where all giraffes stand separately on the photograph, 5,019 images of giraffe flanks automatically. (a) Number of and RetinaNet successfully finds the flanks of the four individuals; identified flanks per image; (b) Manual classification of cropping (b) Worst-case, but rare, scenario where the body of the different problems encountered in 186 images where Retinanet failed to individuals overlap, combined to a blur caused by the car window identify a giraffe flank in the photographs on the right-hand side of the photograph. In this case, RetinaNet missed two individuals, and cropped the body of two giraffes into one single image edges, yielding 1,417 connected components among which 781 were singletons of one image. Our network-based approach, relying on community detection, contained at least two different sequences of photographs shot at retrieved consistent clusters of flank images (different colours in least with a 1-hr interval (see Section 2). Those 82 clusters were Figure 3). The cluster size distribution is by definition more con- made of 822 images of giraffe flanks from which we evaluated the centrated after network clustering (see Figure S3) with a maximal performance of our re-identification pipeline based on deep met- size of 35 instead of 373. Indeed, this very large connected com- ric learning. Once trained using data augmentation (Figure 4), the ponent was clearly an artefact due to a chain of giraffe overlaps, CNN returned a Top-1 accuracy (TP rate) of about 85% on average and has been successfully split by our procedure (see Figure S4). (Figure 5) for images of known individuals. However, 11 images We detected 316 clusters with more than 5 images, and 105 with were found to be repeatedly impossible to classify because of bad more than 10 images. However, in rare cases, some images from the orientation of the giraffe body on the photograph, or because of same individuals were found in different clusters (see Figure S4). the presence of conspicuous and disturbing elements at the fore- Because these clusters arose from a single connected component, front (Figure S6). Without these problematic images, we achieved we could a posteriori check for consistencies by comparing clusters a Top-1 accuracy > 90%, on average. Interestingly, the associated of the same component manually (such as performed for Figure S4). false-p ositive rate was close to 0 (Figure 5). In other words, when a Top-1 image existed below a given threshold (here 1 at most), this Top-1 image was almost always from the correct known individual 3.3 | From identified individuals to a deep learning (Figure S5a). approach for re-identification With our deep metric learning approach, images were pro- jected into an Euclidean space. We expected images from the To perform a fair evaluation of the CNN performance, we saved same known individual to be close in this space, whereas im- 82 human-validated, unambiguous SIFT-b ased clusters that ages from unknown individuals should be distant from those of
MIELE et al. Methods in Ecology and Evolu on | 7 known individuals. This prediction was partly supported only. If, (Figure S5b). Interestingly, a particular threshold value (d = 0.25; for small distance threshold values (d ≤ 0.1), the true-n egative crossing point in Figure 5) where both TP and TN rates reached rate was TN > 95%, TN decreased markedly with the distance 80% offered the best compromise. threshold (Figure 5). At the same time, the positive rate started from TP < 70% for (d ≤ 0.1) but rapidly levelled off to 80% as the 100 distance threshold increased (Figure 5). Hence, our CNN often predicted an unexpected small distance between a given image of unknown individual and another image of a known individual 80 Performance (%) 60 1 6 40 2 TP (Top-1 acc) 5 1 3 20 FP 4 2 3 TN (unkn. indiv.) 4 0 5 0.00 0.25 0.50 0.75 1.00 Threshold 6 F I G U R E 5 Performance of our convolutional neural network (CNN) pipeline for the re-identification of giraffes at Hwange F I G U R E 3 Example of a connected component split into four National Park, Zimbabwe (between 2014 and 2018). We decided clusters using the InfoMap algorithm (see Section 2) to assign that two flank images came from the same giraffe using the images of giraffe flank to a given individual for re-identification. Euclidean distance between the two images defined by our deep Each cluster, representing one individual giraffe, is delineated by metric learning method. If the distance between the two images an ellipse of different colour. Node 2 is an image with two giraffes felt below a certain threshold distance, it was concluded they that we also have in images 1 and 3, respectively, accounting for belonged to the same individual. Here, we report on the true- why their two respective clusters (on the left) are connected. positive rate (TP), or Top-1 accuracy, as function of the distance Clusters can sometimes be connected even if the flanks belong to threshold and calculated on images of know individuals in the test two different giraffes. We illustrate this case with images 3 and 4, dataset, with (plain) or without (dashed) 11 problematic images. We which are considered similar because of the presence of the same also report the corresponding false-positive rate (FP), and the true- tree in the background. The same issue arises for images 5 and 6. negative rate (TN) calculated on images of unknown individuals. We applied this method to re-identify giraffes from coat patterns True-negative rate displays the performance of the CNN model to on a collection of photographs taken at Hwange National Park, detect new giraffes entering the dataset that is those individuals Zimbabwe, between 2014 and 2018 never seen before when training the CNN F I G U R E 4 Training a convolutional neural network (CNN) requires a large and varied set of images (here giraffe flanks) to achieve reasonable performance when applied on new cases. In this study, we took giraffe photographs at Hwange National Park, Zimbabwe, between 2014 and 2018 but in the field, the opportunity to shoot pictures of the same giraffe in a variety of situations in terms of location or light condition is very limited. Therefore, we performed image data augmentation by randomly changing orientation and size, adding blur, performing edge detection, adding noise and modifying colours or brightness using the imgaug Python library (see Section 2). Here, we show an example of data augmentation, with the original image (left) and four different modified versions used to train our CNN for giraffe re-identification
8 | Methods in Ecology and Evolu on MIELE et al. 4 | D I S CU S S I O N are declared different because of differences in lighting conditions or animal orientation) since community detection is robust to pos- We propose two complementary approaches to re-identify individ- sibly missing edges: indeed, a missing edge can be compensated by ual giraffes from a set of photographs taken in the field. Based on the other edges inside a cluster. This step is fully reproducible and the new framework of image similarity networks, our unsupervised applicable to other animal species, as long as a feature matching al- method goes one step further compared to previous solutions from gorithm can be used, be it SIFT or any other alternative methods the literature since its end product is a comprehensive list of clus- such as Oriented FAST and rotated BRIEF (ORB Rublee et al., 2011), ters of images, one cluster per identified individuals. Our supervised or deep features (Dusmanu et al., 2019; Ma et al., 2020). method, that relies on deep metric learning, achieves a very good re- We tackled the problem of animal re-identification, literally de- identification of giraffes from a ‘reference book’ of known individu- tecting and identifying previously seen animals, considering that we als despite the rather small number of photographs per individuals had a ‘reference book’ with photographs of these known individ- available to train the model. uals. This fits the needs of field researchers that want to monitor As a first step, we took advantage of the most recent computer the fate of animals by regularly adding new observations in time, for vision techniques to perform object detection and crop the giraffe instance by collecting photographs with camera traps. To do so, we flanks before comparing coat patterns of giraffes. Image cropping evaluated the possibility to use the rapidly developing convolutional proves to be particularly efficient when the body of several giraffes neural networks in a supervised learning framework to achieve deep do not overlap in photographs. However, cascade of problems arises metric learning. Solving this problem was particularly challenging when overlapping occurs, including erroneous cropping and difficul- because of the size of our dataset. Previous studies on animal re- ties to assign a bounding box to a single individual because in this identification with CNN indeed relied on a high number of photo- case, the coat patterns of two individuals are mixed. We show that a graphs per individuals (Ferreira et al., 2020; Schneider et al., 2020). limited number of labelled photographs is needed to train RetinaNet In our case, we had to train the CNN with a few images per indi- (a few hundreds) with a very good performance on new photographs. viduals only (see Snell et al., 2017, on few shot learning methods) To what extent our RetinaNet model parameters could be efficient shot in the field with contrasting environmental and light conditions. in other study sites with different background vegetation (in ‘Terra This situation corresponds to many field studies, and particularly on Incognita’, quoting Beery et al., 2018) remains an open question. large mammals (possibly with the exception of primates), for which Nevertheless, fine-tuning RetinaNet for a particular task and dataset population density and animal detection rate are low, limiting the is within the reach of many researchers dealing with animal photo- expected number of photograph per individuals. To circumvent this graphs thanks to the associated code we provide. Further perspec- problem, we developed a data augmentation strategy to increase ar- tives now arise with contour segmentation methods (He et al., 2017) tificially the variability of observation conditions encountered in the than can extract contours of an object such as the whole body or training dataset, and improved the model performance substantially. any part of an animal by creating the so-called segmentation mask In terms of overall predictive performance, we reached about (Brodrick et al., 2019). Giraffe body contouring could possibly help 90% Top-1 accuracy, which is comparable to the previously reported for the individual re-identification by removing background residual performance in animal re-identification of known individuals (see noise, but building a training set by manually contouring hundreds of Schneider et al., 2019, for a review) but usually achieved with a much animal bodies remains a huge effort. higher number of photographs. The combination of recent deep We then recast the animal identification problem from photo- learning algorithm and data augmentation appears very competitive graphs into a statistical one, namely a clustering problem in an image and efficient, with possible application to difficult practical cases similarity network. In other words, given a network that we build like when working on endangered or elusive species living at very using a distance between pairs of images, we can efficiently retrieve low abundance such as leopard Panthera pardus or the Iberian lynx the image set of a given individual as a cluster in a network. We com- Lynx pardinus. Compared to the more robust SIFT operator, we found puted a distance based on pattern matching between flanks with that the performance of the CNN is affected by the orientation of the well-known SIFT operator (Bellavia & Colombo, 2020) as used giraffe body and noticeably by deviation from perfect side shot. In by Bolger et al. (2012). The proposed network-based approach was terms of computing requirements, training our CNN remained time- particularly useful and efficient to deal with false-positive matches. consuming because the number of images to process is increased False-positive matches are a recurrent issue occurring when two dramatically by the data augmentation. This problem is partially images have very similar background. This situation is often found counter-balanced by the more computationally efficient calculation when the same tree appears on two images (see nodes 3 and 4 in of CNN-based distances that increases linearly with the number of Figure 3), when giraffe orientation perfectly matches (see Figure S1), photographs (computing one projection per image), compared to the or when the bodies of two giraffes overlap on the same image, which SIFT-based approach for which the computing time is proportional is the most frequent configuration we faced (see node 2 in Figure 3). to the square of the number of photographs (computing one match- In this latter case, this image linked two sets of images correspond- ing per image pair). For instance, we got all distances in a minute with ing to the two overlapping individuals. Our network-based approach the CNN and about 2 hr with the SIFT operator when applied on the also handles false-negative cases (e.g. two images of the same animal same test dataset (see Table 2).
MIELE et al. Methods in Ecology and Evolu on | 9 TA B L E 2 Computing time needed to compare 310 Another point to pay attention to is the background which, if too sim- representative images versus 121 test images (CNN training with ilar on the same images (e.g. photographs shot from the very same about 5,500 images) extracted from giraffe photographs shot at spot) with obvious structures (tree, pond, rocks, etc.) will likely mis- Hwange National Park, Zimbabwe, between 2014 and 2018. The hardware we used for these calculations was an Intel Xeon CPU E5- lead the computer vision algorithm, even on cropped images because 2650 v4 2.30 GHz (CPU) and Nvidia Titan X card (GPU) cropping is rectangular and do not delineate the animal body. This situation often arises while photographing animals moving in line, as Task Avg. computing time giraffes and many others often do. A last point is the heterogeneity SIFT-based distance About 1 hr 45 min of situations under which animals were observed. We did our best CNN-based distance About 1 min to improve the training dataset with data augmentation, however, CNN training About 3 hr 45 min photographing animals in as many different conditions as possible (with GPU) could improve the results. This includes light conditions (dawn, dusk, noon), orientation of individual or background (open vs. more densely Our approach was also designed to deal with datasets where vegetated areas). More specific to CNN re-identification is the need known and unknown individuals were present. Dealing with un- to have a greater number of pictures of photographs per individuals known individuals is extremely challenging because no image of (>50) than what is currently available, so a particular attention should these new individuals are available in the training dataset. Indeed, be given, in the field under optimal shooting conditions, to the oppor- most classical CNN-based approaches solve classification problems tunity to take more photographs of each observed individual. where the number of classes, the number of individuals for us, was fixed. We showed here that it was possible to filter out unknown AC K N OW L E D G E M E N T S from known individuals while re-identifying a large fraction of known We thank Jeanne Duhayer for her considerable help in analysing individuals at the same time with a success of 80% (for both TP and our preliminary findings, and Laurent Jacob and Franck Picard for TN). However, this trade-off came at the cost of a lower Top-1 ac- their insights on deep learning. This work was performed using the curacy, which we acknowledge is not fully satisfying and already computing facilities of the CC LBBE/PRABI. Funding was provided experienced by other authors (Ferreira et al., 2020). Still, in most by the French National Center for Scientific Research (CNRS) and cases, we could validate the proposed identification by examining the Statistical Ecology Research Group (EcoStat) of the CNRS. We the Top-1 for each query image (i.e. checking its closest image) for are also grateful to Derek Lee for his kind advice in processing pho- both known and unknown individuals. Despite not being fully auto- tographs, and for sharing with us his experience in the monitoring mated, our CNN approach would require little human intervention. of giraffes. Finally, we acknowledge the director of the Zimbabwe To what extent the performance of our CNN-based pipeline Parks and Wildlife Management Authority for authorizing this re- could be improved with more data? Since it is suitable to any spe- search, and support from the CNRS Zone Atelier/LTSER program for cies, further data analysis on other species will help answer this fieldwork and some of the photographs (collection by P.A. Seeber). question. However, additional strategies would help including the integration of contextual information (Beery et al., 2019; Terry AU T H O R S ' C O N T R I B U T I O N S et al., 2020) such as time, GPS positioning or animal social context. V.M., D.A. and C.B. conceived the study with some inputs from Using accurate segmentation of animal body (Brodrick et al., 2019; S.C.-J.; V.M. and G.D. developed the approach and performed the He et al., 2017) will undoubtedly be a solution against side effects analysis; V.M. and S.C.-J. supervised G.D.; D.A. and C.B. provided of rectangular cropping. Moreover, this pipeline can be used in an the photographs; B.S. set up the computing architecture. All authors active learning strategy where the machine learning model is as- contributed to the writing of the manuscript. sisted by human intervention on some specific cases (Norouzzadeh et al., 2021). Indeed, using the proposed distance threshold in the PEER REVIEW Euclidean space, one can iteratively enrich the training dataset after The peer review history for this article is available at https://publons. manual checking of the most confident Top-1 candidates (below a com/publon/10.1111/2041-210X.13577. small distance threshold, to guarantee optimal TN rate) and re-run the estimation procedure. DATA AVA I L A B I L I T Y S TAT E M E N T Finally, this inter-disciplinary work provides guidelines about best The curated dataset of re-identified giraffe individuals is freely avail- practices to collect identification images in the field, if to be used able at ftp://pbil.univ-lyon1.fr/pub/datasets/miele2021. The code later with an automated pipeline such as the one presented here. to reproduce the analysis is available at https://plmlab.math.cnrs.fr/ Better results can be achieved with simple framing rules of animals vmiele/animal-reid/ with explanations and test cases. with cameras. First, the field operator should try to avoid as much as possible overlaying bodies of two or more individuals as this was ORCID the most acute issue in our giraffe experience. Note that several but Vincent Miele https://orcid.org/0000-0001-7584-0088 well separated individuals in the same photograph is not a problem at Simon Chamaillé-Jammes https://orcid.org/0000-0003-0505-6620 all thanks to the CNN cropping performed at the preliminary stage. Christophe Bonenfant https://orcid.org/0000-0002-9924-419X
10 | Methods in Ecology and Evolu on MIELE et al. REFERENCES Fortunato, S. (2010). Community detection in graphs. Physics Reports, 486, 75–174. https://doi.org/10.1016/j.physrep.2009.11.002 Achlioptas, D., D'Souza, R. M., & Spencer, J. (2009). Explosive perco- Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hier- lation in random networks. Science, 323, 1453–1455. https://doi. archies for accurate object detection and semantic segmentation. In org/10.1126/science.1167782 The IEEE conference on computer vision and pattern recognition (CVPR). Beery, S., Van Horn, G., & Perona, P. (2018). Recognition in terra incog- IEEE. nita. In Proceedings of the European conference on computer vision Halloran, K. M., Murdoch, J. D., & Becker, M. S. (2015). Applying (ECCV) (pp. 456–473). Springer. computer-aided photo-identification to messy datasets: A case study Beery, S., Wu, G., Rathod, V., Votel, R., & Huang, J. (2019). Context R- of Thornicroft's giraffe (Giraffa camelopardalis thornicrofti). African CNN: Long term temporal context for per-c amera object detection. Journal of Ecology, 53, 147–155. In Proceedings of the IEEE/CVF Conference on Computer Vision and Hansen, M. F., Smith, M. L., Smith, L. N., Salter, M. G., Baxter, E. M., Pattern Recognition, pp. 13075–13085. Farish, M., & Grieve, B. (2018). Towards on-farm pig face recognition Bellavia, F., & Colombo, C. (2020). Is there anything new to say about sift using convolutional neural networks. Computers in Industry, 98, 145– matching? International Journal of Computer Vision, 128, 1847–1866. 152. https://doi.org/10.1016/j.compind.2018.02.016 Bochkovskiy, A., Wang, C. Y., & Liao, H. Y. M. (2020). Yolov4: Optimal speed Hartog, J., & Reijns, R. (2014). Interactive individual identification system and accuracy of object detection. arXiv preprint arXiv:200410934. (I3S). Free Software Foundation Inc. Bogucki, R., Cygan, M., Khan, C. B., Klimek, M., Milczek, J. K., & Mucha, Hayasaka, S. (2016). Explosive percolation in thresholded networks. M. (2019). Applying deep learning to right whale photo identifica- Physica A: Statistical Mechanics and its Applications, 451, 1–9. https:// tion. Conservation Biology, 33, 676–684. https://doi.org/10.1111/ doi.org/10.1016/j.physa.2016.01.001 cobi.13226 Hayes, L. D., & Schradin, C. (2017). Long-term field studies of mammals: Bolger, D. T., Morrison, T. A., Vance, B., Lee, D., & Farid, H. (2012). A What the short-term study cannot tell us. Journal of Mammalogy, 98, computer-assisted system for photographic mark–recapture anal- 600–602. https://doi.org/10.1093/jmammal/gyx027 ysis. Methods in Ecology and Evolution, 3, 813–822. https://doi. He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask R-CNN. In org/10.1111/j.2041-210X.2012.00212.x Proceedings of the IEEE international conference on computer vision (pp. Bolger, D., Vance, B., Morrison, T., & Farid, H. (2011). Wild id user guide: 2961–2969). IEEE. Pattern extraction and matching software for computer-assisted pho- He, Q., Zhao, Q., Liu, N., Chen, P., Zhang, Z., & Hou, R. (2019). tographic mark. Retrieved from https://github.com/Conser vationInt Distinguishing individual red pandas from their faces. In Chinese ernational/Wild.ID/ conference on pattern recognition and computer vision (PRCV) (pp. 714– Bouma, S., Pawley, M. D. M., Hupman, K., & Gilman, A. (2019). Individual 724). Springer. common dolphin identification via metric embedding learning. arXiv Hermans, A., Beyer, L., & Leibe, B. (2017). In defense of the triplet loss for preprint arXiv:1901.03662. person re-identification. arXiv preprint arXiv:1703.07737. Bradski, G. (2000). The OpenCV library. Dr Dobb's Journal of Software Hoffer, E., & Ailon, N. (2015). Deep metric learning using triplet network. Tools. In International workshop on similarity-based pattern recognition (pp. Brodrick, P. G., Davies, A. B., & Asner, G. P. (2019). Uncovering ecologi- 84–92). Springer. cal patterns with convolutional neural networks. Trends in Ecology & Körschens, M., Barz, B., & Denzler, J. (2018). Towards automatic identifi- Evolution, 34(8), 734–745. https://doi.org/10.1016/j.tree.2019.03.006 cation of elephants in the wild. arXiv preprint arXiv:181204418. Buehler, P., Carroll, B., Bhatia, A., Gupta, V., & Lee, D. E. (2019). An au- Lamba, A., Cassey, P., Segaran, R. R., & Koh, L. P. (2019). Deep learn- tomated program to find animals and crop photographs for indi- ing for environmental conservation. Current Biology, 29, R977–R982. vidual recognition. Ecological Informatics, 50, 191–196. https://doi. https://doi.org/10.1016/j.cub.2019.08.016 org/10.1016/j.ecoinf.2019.02.003 Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for Chamaillé-Jammes, S., Valeix, M., Bourgarel, M., Murindagomo, F., & dense object detection. In Proceedings of the IEEE international confer- Fritz, H. (2009). Seasonal density estimates of common large herbi- ence on computer vision (pp. 2980–2988). IEEE. vores in Hwange national park, Zimbabwe. African Journal of Ecology, Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, 47, 804–8 08. https://doi.org/10.1111/j.1365-2028.2009.01077.x P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. Chen, P., Swarup, P., Wojciech, M. M., Kong, A. W. K., Han, S., Zhang, Z., & In European conference on computer vision (pp. 740–755). Springer. Rong, H. (2020). A study on giant panda recognition based on images Lowe, D. G. (2004). Distinctive image features from scale-invariant key- of a large proportion of captive pandas. Ecology and Evolution, 10(7), points. International Journal of Computer Vision, 60, 91–110. https:// 3561–3573. https://doi.org/10.1002/ece3.6152 doi.org/10.1023/B:VISI.0000029664.99615.94 Christin, S., Hervet, E., & Lecomte, N. (2019). Applications for deep Ma, J., Jiang, X., Fan, A., Jiang, J., & Yan, J. (2020). Image matching from learning in ecology. Methods in Ecology and Evolution, 10, 1632–1644. handcrafted to deep features: A survey. International Journal of https://doi.org/10.1111/2041-210X.13256 Computer Vision, 129, 23–79. Clutton-Brock, T., & Sheldon, B. C. (2010). Individuals and populations: Moskvyak, O., Maire, F., Armstrong, A. O., Dayoub, F., & Baktashmotlagh, The role of long-term, individual-based studies of animals in ecology M. (2019). Robust re-identification of manta rays from natural and evolutionary biology. Trends in Ecology & Evolution, 25, 562–573. markings by learning pose invariant embeddings. arXiv preprint https://doi.org/10.1016/j.tree.2010.08.002 arXiv:1902.10847. Dusmanu, M., Rocco, I., Pajdla, T., Pollefeys, M., Sivic, J., Torii, A., & Moya, Ó., Mansilla, P. L., Madrazo, S., Igual, J. M., Rotger, A., Romano, A., Sattler, T. (2019). D2-Net: A trainable CNN for joint description and & Tavecchia, G. (2015). Aphis: A new software for photo-matching detection of local features. In Proceedings of the IEEE conference on in ecological studies. Ecological Informatics, 27, 64–70. https://doi. computer vision and pattern recognition (pp. 8092–8101). IEEE. org/10.1016/j.ecoinf.2015.03.003 Estes, R. D. (1991). The behavior guide to African mammals: Including Muller, Z., Bercovitch, F., Brand, R., Brown, D., Brown, M., Bolger, hoofed mammals, carnivores. In Primates (pp. 509–519). Univ of D., Carter, K., Deacon, F., Doherty, J., Fennessy, J., Fennessy, S., California Press. Hussein, A., Lee, D., Marais, A., Strauss, M., Tutchings, A., & Wube, Ferreira, A. C., Silva, L. R., Renna, F., Brandl, H. B., Renoult, J. P., Farine, T. (2018). Giraffa camelopardalis (amended version of 2016 assess- D. R., Covas, R., & Doutrelant, C. (2020). Deep learning-based meth- ment). The IUCN Red List of threatened species 2018: e.t9194a1362 ods for individual recognition in small birds. Methods in Ecology and 66699. Evolution, 11, 1072–1085. https://doi.org/10.1111/2041-210X.13436
MIELE et al. Methods in Ecology and Evolu on | 11 Norouzzadeh, M. S., Morris, D., Beery, S., Joshi, N., Jojic, N., & Clune, J. and transfer learning. IEEE Transactions on Medical Imaging, 35, 1285– (2021). A deep active learning system for species identification and 1298. https://doi.org/10.1109/TMI.2016.2528162 counting in camera trap images. Methods in Ecology and Evolution, 12, Silvy, N. J., Lopez, R. R., & Peterson, M. J. (2005). Wildlife marking tech- 150–161. https://doi.org/10.1111/2041-210X.13504 niques. In Techniques for wildlife investigations and management (pp. Parham, J., Stewart, C., Crall, J., Rubenstein, D., Holmberg, J., & Berger- 339–376). The Wildlife Society. Wolf, T. (2018). An animal detection pipeline for identification. In Snell, J., Swersky, K., & Zemel, R. (2017). Prototypical networks for few- 2018 IEEE winter conference on applications of computer vision (WACV) shot learning. In Advances in neural information processing systems (pp. (pp. 1075–1083). IEEE. 4077–4 087). MIT Press. Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look Terry, J. C. D., Roy, H. E., & August, T. A. (2020). Thinking like a natural- once: Unified, real-time object detection. In Proceedings of the IEEE ist: Enhancing computer vision of citizen science images by harness- conference on computer vision and pattern recognition (pp. 779–788). ing contextual data. Methods in Ecology and Evolution, 11, 303–315. IEEE. https://doi.org/10.1111/2041-210X.13335 Renó, V., Dimauro, G., Labate, G., Stella, E., Fanizza, C., Cipriano, G., Wang, B., Pourshafeie, A., Zitnik, M., Zhu, J., Bustamante, C. D., Batzoglou, Carlucci, R., & Maglietta, R. (2019). A sift-based software system for S., & Leskovec, J. (2018). Network enhancement as a general method the photo-identification of the Risso's dolphin. Ecological Informatics, to denoise weighted biological networks. Nature Communications, 9, 50, 95–101. https://doi.org/10.1016/j.ecoinf.2019.01.006 1–8. https://doi.org/10.1038/s41467-018-05469-x Rosvall, M., & Bergstrom, C. T. (2008). Maps of random walks on complex Weinstein, B. G. (2018). A computer vision for animal ecology. networks reveal community structure. Proceedings of the National Journal of Animal Ecology, 87, 533–545. https://doi.org/10.1111/ Academy of Sciences of the United States of America, 105, 1118–1123. 1365-2656.12780 https://doi.org/10.1073/pnas.0706851105 Willi, M., Pitman, R. T., Cardoso, A. W., Locke, C., Swanson, A., Boyer, Rublee, E., Rabaud, V., Konolige, K., & Bradski, G. (2011). Orb: An effi- A., Veldthuis, M., & Fortson, L. (2019). Identifying animal spe- cient alternative to sift or surf. In 2011 International conference on cies in camera trap images using deep learning and citizen sci- computer vision (pp. 2564–2571). IEEE. ence. Methods in Ecology and Evolution, 10, 80–91. https://doi. Sadegh Norouzzadeh, M., Morris, D., Beery, S., Joshi, N., Jojic, N., & Clune, org/10.1111/2041-210X.13099 J. (2019). A deep active learning system for species identification and Wu, D., Zheng, S. J., Zhang, X. P., Yuan, C. A., Cheng, F., Zhao, Y., Lin, counting in camera trap images. arXiv preprint arXiv:191009716. Y. J., Zhao, Z. Q., Jiang, Y. L., & Huang, D. S. (2019). Deep learning- Schneider, S., Taylor, G. W., & Kremer, S. (2018). Deep learning object based methods for person re-identification: A comprehensive re- detection methods for ecological camera trap data. In 2018 15th con- view. Neurocomputing, 337, 354–371. https://doi.org/10.1016/j. ference on computer and robot vision (CRV) (pp. 321–328). IEEE. neucom.2019.01.079 Schneider, S., Taylor, G. W., & Kremer, S. C. (2020). Similarity learning Zheng, L., Yang, Y., & Hauptmann, A. G. (2016). Person re-identification: networks for animal individual re-identification-beyond the capabili- Past, present and future. arXiv preprint arXiv:161002984. ties of a human observer. In Proceedings of the IEEE winter conference on applications of computer vision workshops (pp. 44–52). IEEE. Schneider, S., Taylor, G. W., Linquist, S., & Kremer, S. C. (2019). Past, present and future approaches using computer vision for animal S U P P O R T I N G I N FO R M AT I O N re-identification from camera trap data. Methods in Ecology and Additional supporting information may be found online in the Evolution, 10, 461–470. https://doi.org/10.1111/2041-210X.13133 Supporting Information section. Schofield, D., Nagrani, A., Zisserman, A., Hayashi, M., Matsuzawa, T., Biro, D., & Carvalho, S. (2019). Chimpanzee face recognition from vid- eos in the wild using deep learning. Science Advances, 5, eaaw0736. https://doi.org/10.1126/sciadv.aaw0736 How to cite this article: Miele V, Dussert G, Spataro B, Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embed- Chamaillé-Jammes S, Allainé D, Bonenfant C. Revisiting ding for face recognition and clustering. In Proceedings of the IEEE con- animal photo-identification using deep metric learning and ference on computer vision and pattern recognition (pp. 815–823). IEEE. network analysis. Methods Ecol Evol. 2021;00:1–11. https:// Shin, H. C., Roth, H. R., Gao, M., Lu, L., Xu, Z., Nogues, I., Yao, J., Mollura, D., & Summers, R. M. (2016). Deep convolutional neural networks for doi.org/10.1111/2041-210X.13577 computer-aided detection: CNN architectures, dataset characteristics
You can also read