Large-Scale Open-Set Classification Protocols for ImageNet
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
This is a pre-print of the original paper accepted at the Winter Conference on Applications of Computer Vision (WACV) 2023.
Large-Scale Open-Set Classification Protocols for ImageNet
Andres Palechor Annesha Bhoumik Manuel Günther
Department of Informatics, University of Zurich, Andreasstrasse 15, CH-8050 Zurich
https://www.ifi.uzh.ch/en/aiml.html
arXiv:2210.06789v2 [cs.CV] 18 Oct 2022
Abstract beyond imagination a decade before. Supervised im-
age classification algorithms have achieved tremendous
Open-Set Classification (OSC) intends to adapt success when it comes to detecting classes from a finite
closed-set classification models to real-world scenar- number of known classes, what is commonly known
ios, where the classifier must correctly label samples as evaluation under the closed-set assumption. For
of known classes while rejecting previously unseen un- example, the deep learning algorithms that attempt
known samples. Only recently, research started to in- the classification of ten handwritten digits [16] achieve
vestigate on algorithms that are able to handle these more than 99% accuracy when presented with a digit,
unknown samples correctly. Some of these approaches but it ignores the fact that the classifier might be con-
address OSC by including into the training set negative fronted with non-digit images during testing [6]. Even
samples that a classifier learns to reject, expecting that the well-known ImageNet Large Scale Visual Recog-
these data increase the robustness of the classifier on nition Challenge (ILSVRC) [26] contains 1000 classes
unknown classes. Most of these approaches are evalu- during training, and the test set contains samples from
ated on small-scale and low-resolution image datasets exactly these 1000 classes, while the real world contains
like MNIST, SVHN or CIFAR, which makes it diffi- many more classes, e.g., the WordNet hierarchy [19]
cult to assess their applicability to the real world, and currently knows more than 100’000 classes.1 Training
to compare them among each other. We propose three a categorical classifier that can differentiate all these
open-set protocols that provide rich datasets of natural classes is currently not possible – only feature compar-
images with different levels of similarity between known ison approaches [22] exist – and, hence, we have to deal
and unknown classes. The protocols consist of subsets with samples that we do not know how to classify.
of ImageNet classes selected to provide training and Only recently, research on methods to improve clas-
testing data closer to real-world scenarios. Addition- sification in presence of unknown samples has gained
ally, we propose a new validation metric that can be more attraction. These are samples from previously
employed to assess whether the training of deep learn- unseen classes that might occur during deployment of
ing models addresses both the classification of known the algorithm in the real world and that the algorithm
samples and the rejection of unknown samples. We needs to handle correctly by not assigning them to any
use the protocols to compare the performance of two of the known classes. Bendale and Boult [2] provided
baseline open-set algorithms to the standard SoftMax the first algorithm that incorporates the possibility to
baseline and find that the algorithms work well on neg- reject a sample as unknown into a deep network that
ative samples that have been seen during training, and was trained on a finite set of known classes. Later,
partially on out-of-distribution detection tasks, but drop other algorithms were developed to improve the detec-
performance in the presence of samples from previously tion of unknown samples. Many of these algorithms
unseen unknown classes. require to train on samples from some of the unknown
classes that do not belong to the known classes of in-
terest – commonly, these classes are called known un-
1. Introduction known [18], but since this formulation is more confusing
than helpful, we will term these classes as the negative
Automatic classification of objects in images has classes. For example, Dhamija et al. [6] employed sam-
been an active direction of research for several decades ples from a different dataset, i.e., they trained their sys-
now. The advent of Deep Learning has brought algo- tem on MNIST as known classes and selected EMNIST
rithms to a stage where they can handle large amounts
of data and produce classification accuracies that were 1 https://wordnet.princeton.eduKnown Physical
Entity
Negative
Unknown Organism Artifact
Animal
Dog Fox Wolf Bear Ungulate Musteline Device Vehicle Food
P1
Hunting
Dog
P2
Bird Insect Fish Monkey Fungus Fruit Furniture Computer Car Truck Boat
P3
Figure 1: Class Sampling in our Open-Set Protocols. We make use of the WordNet hierarchy [19] to define
three protocols of different difficulties. In this figure, we show the superclasses from which we sample the final classes, all of
which are leaf nodes taken from the ILSVRC 2012 dataset. Dashed lines indicate that the lower nodes are descendants, but
they might not be direct children of the upper nodes. Additionally, all nodes have more descendants than those shown in
the figure. The colored bars below a class indicate that its subclasses are sampled for the purposes shown in the top-left of
the figure. For example, all subclasses of “Dog” are used as known classes in protocol P1 , while the subclasses of “Hunting
Dog” are partitioned into known and negatives in protocol P2 . For protocol P3 , several intermediate nodes are partitioned
into known, negative and unknown classes.
letters as negatives. Other approaches try to create case the performance of three simple algorithms in this
negative samples by utilizing known classes in different paper. We decided to build our protocols based on the
ways, e.g., Ge et al. [8] used a generative model to form well-known and well-investigated ILSVRC 2012 dataset
negative samples, while Zhou et al. [30] try to utilize [26], and we build three evaluation protocols P1 , P2
internal representations of mixed known samples. and P3 that provide various difficulties based on the
One issue that is inherent in all of these approaches WordNet hierarchy [19], as displayed in Fig. 1. The
– with only a few exceptions [2, 25] – is that they eval- protocols are publicly available,3 including source code
uate only on small-scale datasets with a few known for the baseline implementations and the evaluation,
classes, such as 10 classes in MNIST [16], CIFAR-10 which enables the reproduction of the results presented
[14], SVHN[21] or mixtures of these. While many algo- in this paper. With these new protocols, we hope to
rithms claim that they can handle unknown classes, foster more comparable and reproducible research into
the number of known classes is low, and it is un- the direction of open-set object classification as well
clear whether these algorithms can handle more known as related topics such as out-of-distribution detection.
classes, or more diverse sets of unknown classes. Only This allows researchers to test their algorithms on our
lately, a large-scale open-set validation protocol is de- protocols and directly compare with our results.
fined on ImageNet [28], but it only separates unknown The contributions of this paper are as follows:
samples based on visual 2 and not semantic similarity.
Another issue of research on open-set classification is • We introduce three novel open-set evaluation pro-
that most of the employed evaluation criteria, such as tocols with different complexities for the ILSVRC
accuracy, macro-F1 or ROC metrics, do not evaluate 2012 dataset.
open-set classification as it would be used in a real-
world task. Particularly, the currently employed vali- • We propose a novel evaluation metric that can be
dation metrics that are used during training a network used for validation purposes when training open-
do not reflect the target task and, thus, it is unclear set classifiers.
whether the selected model is actually the best model
for the desired task. • We train deep networks with three different tech-
In this paper we, therefore, propose large-scale open- niques and report their open-set performances.
set recognition protocols that can be used to train and
test various open-set algorithms – and we will show- • We provide all source code3 for training and eval-
2 In fact, Vaze et al. [28] do not specify their criteria to se- uation of our models to the research community.
lect unknown classes and only mention visual similarity in their
supplemental material. 3 https://github.com/AIML-IfI/openset-imagenet2. Related Work as known and the remaining classes as unknown [9].
Sometimes, other datasets serve the roles of unknowns,
In open-set classification, a classifier is expected to e.g., when MNIST build the known classes, EMNIST
correctly classify known test samples into their respec- letters [11] are used as negatives and/or unknowns.
tive classes, and correctly detect that unknown test Similarly, the known classes are composed of CIFAR-10
samples do not belong to any known class. The study and other classes from CIFAR-100 or SVHN are nega-
of unknown instances is not new in the literature. For tives or unknowns [15, 6]. Only few papers make use
example, novelty detection, which is also known as of large-scale datasets such as ImageNet, where they
anomaly detection and has a high overlap with out- either use the classes of ILSVRC 2012 as known and
of-distribution detection, focuses on identifying test in- other classes from ImageNet as unknown [2, 28], or
stances that do not belong to training classes. It can be random partitions of ImageNet [25, 24].
seen as a binary classification problem that determines
Oftentimes, evaluation protocols are home-grown
if an instance belongs to any of the training classes or
and, thus, the comparison across algorithms is very
not, but without exactly deciding which class [4], and
difficult. Additionally, there is no clear distinction on
includes approaches in supervised, semi-supervised and
the similarities between known, negative and unknown
unsupervised learning [13, 23, 10].
classes, which makes it impossible to judge in which
However, all these approaches only consider the clas-
scenarios a method will work, and in which not. Fi-
sification of samples into known and unknown, leaving
nally, the employed evaluation metrics are most often
the later classification of known samples into their re-
not designed for open-set classification and, hence, fail
spective classes as a second step. Ideally, these two
to address typical use-cases of open-set recognition.
steps should be incorporated into one method. An easy
approach would be to threshold on the maximum class
probability of the SoftMax classifier using a confidence 3. Approach
threshold, assuming that for an unknown input, the 3.1. ImageNet Protocols
probability would be distributed across all the classes
and, hence, would be low [17]. Unfortunately, often in- Based on [3], we design three different protocols to
puts overlap significantly with known decision regions create three different artificial open spaces, with in-
and tend to get misclassified as a known class with creasing level of similarity in appearance between in-
high confidence [6]. It is therefore essential to devise puts – and increasing complexity and overlap between
techniques that are more effective than simply thresh- features – of known and unknown classes. To allow
olding SoftMax probabilities in detecting unknown in- for the comparison of algorithms that require negative
puts. Some initial approaches include extensions of 1- samples for training, we carefully design and include
class and binary Support Vector Machines (SVMs) as negative classes into our protocols. This also allows
implemented by Scheirer et al. [27] and devising recog- us to compare how well these algorithms work on pre-
nition systems to continuously learn new classes [1, 25]. viously seen negative classes and how on previously
While the above methods make use only of known unseen unknown classes.
samples in order to disassociate unknown samples, In order to define our three protocols, we make use
other approaches require samples of some negative of the WordNet hierarchy that provides us with a tree
classes, hoping that these samples generalize to all un- structure for the 1000 classes of ILSVRC 2012. Partic-
seen classes. For example, Dhamija et al. [6] utilize neg- ularly, we exploit the robustness Python library [7]
ative samples to train the network to provide low confi- to parse the ILSVRC tree. All the classes in ILSVRC
dence values for all known classes when presented with are represented as leaf nodes of that graph, and we
a sample from an unknown class. Many researchers use descendants of several intermediate nodes to form
[8, 29, 20] utilize generative adversarial networks to our known and unknown classes. The definition of the
produce negative samples from the known samples. protocols and their open-set partitions are presented in
Zhou et al. [30] combined pairs of known samples to Fig. 1, a more detailed listing of classes can be found
define negatives, both in input space and deeper in the in the supplemental material. We design the protocols
network. Other approaches to open-set recognition are such that the difficulty levels of closed- and open-set
discussed by Geng et al. [9]. evaluation varies. While protocol P1 is easy for open-
One problem that all the above methods possess is set, it is hard for closed-set classification. On the con-
that they are evaluated on small-scale datasets with trary, P3 is more easy for closed-set classification and
low-resolution images and low numbers of classes. Such more difficult in open-set. Finally, P2 is somewhere in
datasets include MNIST [16], SVHN [21] and CIFAR- the middle, but small enough to run hyperparameter
10 [14] where oftentimes a few random classes are used optimization that can be transferred to P1 and P3 .Table 1: ImageNet Classes used in the Protocols. This table shows the ImageNet parent classes that were
used to create the three protocols. Known and negative classes are used for training the open-set algorithms, while known,
negative and unknown classes are used in testing. Given are the numbers of classes: training / validation / test samples.
Known Negative Unknown
All dog classes Other 4-legged animal classes Non-animal classes
P1
116: 116218 / 29055 / 5800 67: 69680 / 17420 / 3350 166 : — / — / 8300
Half of hunting dog classes Half of hunting dog classes Other 4-legged animal classes
P2
30: 28895 / 7224 / 1500 31: 31794 / 7949 / 1550 55: — / — / 2750
Mix of common classes including Mix of common classes including Mix of common classes including
P3 animals, plants and objects animals, plants and objects animals, plants and objects
151: 154522 / 38633 / 7550 97: 98202 / 24549 / 4850 164: — / — / 8200
In the first protocol P1 , known and unknown classes for training, one for validation and one for testing. The
are semantically quite distant, and also do not share training and validation partitions are taken from the
too many visual features. We include all 116 dog original ILSVRC 2012 training images by randomly
classes as known classes – since dogs represent the splitting off 80% for training and 20% for validation.
largest fine-grained intermediate category in ImageNet Since training and validation partitions are composed
which makes closed-set classification difficult – and of known and negative data only, no unknown data is
select 166 non-animal classes as unknowns. P1 can, provided here. The test partition is composed of the
therefore, be used to test out-of-distribution detec- original ILSVRC validation set containing 50 images
tion algorithms since knowns and unknowns are not per class, and is available for all three groups of data:
very similar. In the second protocol P2 , we only look known, negative and unknown. This assures that dur-
into the animal classes. Particularly, we use several ing testing no single image is used that the network has
hunting dog classes as known and other classes of 4- seen in any stage of the training.
legged animals as unknown. This means that known
and unknown classes are still semantically relatively 3.2. Open-Set Classification Algorithms
distant, but image features such as fur is shared be- We select three different techniques to train deep
tween known and unknown. This will make it harder networks. While other algorithms shall be tested in
for out-of-distribution detection algorithms to perform future work, we rely on three simple, very similar and
well. Finally, the third protocol P3 includes ancestors well-known methods. In particular, all three loss func-
of various different classes, both as known and un- tions solely utilize the plain categorical cross-entropy
known classes, by making use of the mixed 13 classes loss JCCE on top of SoftMax activations (often termed
defined in the robustness library. Since known and as the SoftMax loss) in different settings. Generally,
unknown classes come from the same ancestors, it is the weighted categorical cross-entropy loss is:
very unlikely that out-of-distribution detection algo-
rithms will be able to discriminate between them, and N
1 XX
C
real open-set classification methods need to be applied. JCCE =− wc tn,c log yn,c (1)
N n=1 c=1
To enable algorithms that require negative samples,
the negative classes are selected semantically similar where N is the number of samples in our dataset
to the known or at least in-between the known and (note that we utilize batch processing), tn,c is the
the unknown. It has been shown that selecting nega- target label of the nth sample for class c, wc is a
tive samples too far from the known classes does not class-weight for class c and yn,c is the output prob-
help in creating better-suited open-set algorithms [6]. ability of class c for sample n using SoftMax activation:
Naturally, we can only define semantic similarity based ezc,n
on the WordNet hierarchy, but it is unclear whether yc,n = C
(2)
these negative classes are also structurally similar to ezc0 ,n
P
the known classes. Tab. 1 displays a summary of the c0 =1
parent classes used in the protocols, and a detailed list of the logits zn,c , which are the network outputs.
of all classes is presented in the supplemental material. The three different training approaches differ with
Finally, we split our data into three partitions, one respect to the targets tn,c and the weights wc , and hownegative samples are handled. The first approach is the the ones that provide a separate probability for the
plain SoftMax loss (S) that is trained on only samples unknown class and those that do not.
from the K known classes. In this case, the number of The final evaluation on the test set should differ-
classes C = K is equal to the number of known classes, entiate between the behavior of known and unknown
and the targets are computed as one-hot encodings: classes, and at the same time include the accuracy of
( the known classes. Many evaluation techniques pro-
1 c = τn posed in the literature do not follow these require-
∀n, c ∈ {1, . . . , C} : tn,c = (3)
0 otherwise ments. For example, computing the area under the
where 1 ≤ τn ≤ K is the label of the sample n. For ROC curve (AUROC) will only consider the binary
simplicity, we select the weights for each class to be classification task: known or unknown, but does not
identical: ∀c : wc = 1, which is the default behav- tell us how well the classifier performs on the known
ior when training deep learning models on ImageNet. classes. Another metric that is often applied is the
By thresholding the maximum probability max yc,n , macro-F1 metric [2] that balances precision and recall
c for a K+1-fold binary classification task. This metric
cf. Sec. 3.3, this method can be turned into a simple has many properties that are counter-intuitive in the
out-of-distribution detection algorithm. open-set classification task. First, a different thresh-
The second approach is often found in object detec- old is computed for each of the classes, so it is pos-
tion models [5] which collect a lot of negative samples sible that the same sample can be classified both as
from the background of the training images. Similarly, one or more known classes and as unknown. These
this approach is used in other methods for open-set thresholds are even optimized on the test set, and of-
learning, such as G-OpenMax [8] or PROSER [30].4 In ten only the maximum F1-value is reported. Second,
this Background (BG) approach, the negative data is the method requires to define a particular probability
added as an additional class, so that we have a total of being unknown, which is not provided by two of our
of C = K + 1 classes. Since the number of negative three networks. Finally, the metric does not distin-
samples is usually higher than the number for known guish between known and unknown classes, but just
classes, we use class weights to balance them: treats all classes identically, but consequences of classi-
N fying an unknown sample as known are different from
∀c ∈ {1, ..., C} : wc = (4)
CNc misclassifying a known sample.
where Nc is the number of training samples for class c. The evaluation metric that follows our intuition
Finally, we use one-hot encoded targets tn,c according best is the Open-Set Classification Rate (OSCR),
to (3), including label τn = K + 1 for negative samples. which handles known and unknown samples separately
As the third method, we employ the Entropic [6]. Based on a single probability threshold θ, we
Open-Set (EOS) loss [6], which is a simple extension compute both the Correct Classification Rate (CCR)
of the SoftMax loss. Similar to our first approach, we and the False Positive Rate (FPR):
have one output for each of the known classes: C = K. {xn | τn ≤ K ∧ arg max yn,c = τn ∧ yn,c > θ}
For known samples, we employ one-hot-encoded target CCR(θ) =
1≤c≤K
values according to (3), whereas for negative samples |NK |
we use identical target values: {xn | τn > K ∧ max yn,c > θ}
1 FPR(θ) =
1≤c≤K
(6)
∀n, c ∈ {1, . . . , C} : tn,c = (5) |NU |
C
Sticking to the implementation of Dhamija et al. [6], we where NK and NU are the total numbers of known
select the class weights to be ∀c : wc = 1 for all classes and unknown test samples, while τn ≤ K indicates a
including the negative class, and leave the optimization known sample and τn > K refers to an unknown test
of these values for future research. sample. By varying the threshold θ between 0 and 1,
we can plot the OSCR curve [6]. A closer look to (6)
3.3. Evaluation Metrics
reveals that the maximum is only taken over the known
Evaluation of open-set classification methods is a classes, purposefully leaving out the probability of the
more tricky business. First, we must differentiate be- unknown class in the BG approach.5 Finally, this def-
tween validation metrics to monitor the training pro- inition differs from [6] in that we use a > sign for both
cess and testing methods for the final reporting. Sec- FPR and CCR when comparing to θ, which is critical
ond, we need to incorporate both types of algorithms, 5 A low probability for the unknown class does not indicate
4 While these methods try to sample better negatives for a high probability for any of the known classes. Therefore, the
training, they rely on this additional class for unknown samples. unknown class probability does not add any useful information.when SoftMax probabilities of unknown samples reach 4. Experiments
1 to the numerical limits.
Note that the computation of the Correct Classifi- Considering that our goal is not to achieve the
cation Rate – which is highly related to the classifica- highest closed-set accuracy but to analyze the perfor-
mance of open-set algorithms in our protocols, we use
tion accuracy on the known classes – has the issue that
a ResNet-50 model [12] as it achieves low classification
it might be biased when known classes have different
amount of samples. Since the number of test samples errors on ImageNet, trains rapidly, and is commonly
in our known classes is always balanced, in our evalu- used in the image classification task. We add one fully-
ation we are not affected by this bias, so we leave the connected layer with C nodes. For each protocol, we
adaptation of that metric to unbalanced datasets as fu- train models using the three loss functions SoftMax
(S), SoftMax with Background class (BG) and Entropic
ture work. Furthermore, the metric just averages over
Open-Set (EOS) loss. Each network is trained for 120
all samples, telling us nothing about different behav-
ior of different classes – it might be possible that one epochs using Adam optimizer with a learning rate of
known class is always classified correctly while another 10−3 and default beta values of 0.9 and 0.999. Addi-
class never is. For a better inspection of these cases, tionally, we use standard data preprocessing, i.e., first,
open-set adaptations to confidence matrices need to be the smaller dimension of the training images is resized
to 256 pixels, and then a random crop of 224 pixels is
developed in the future.
selected. Finally, we augment the data using a random
3.4. Validation Metrics horizontal flip with a probability of 0.5.
Fig. 2 shows OSCR curves for the three methods
For validation on SoftMax-based systems, often clas-
on our three protocols using logarithmic FPR axes –
sification accuracy is used as the metric. In open-set
for linear FPR axes, please refer to the supplemental
classification, this is not sufficient since we need to bal-
material. We plot the test set performance for both
ance between accuracy on the known classes and on the
the negative and the unknown test samples. Hence,
negative class. While using (weighted) accuracy might
we can see how the methods work with unknown sam-
work well for the BG approach, networks trained with
ples of classes6 that have or have not been seen during
standard SoftMax and EOS do not provide a proba-
training. We can observe that all three classifiers in ev-
bility for the unknown class and, hence, accuracy can-
ery protocol reach similar CCR values in the closed-set
not be applied for validation here. Instead, we want to
case (FPR=1), in some cases, EOS or BG even outper-
make use of the SoftMax scores to evaluate our system.
form the baseline. This is good news since, oftentimes,
Since the final goal is to find a threshold θ such that
open-set classifiers trade their open-set capabilities for
the known samples are differentiated from unknown
reduced closed-set accuracy. In the supplemental mate-
samples, we propose to compute the validation metric
rial, we also provide a table with detailed CCR values
using our confidence metric:
for particular selected FPRs, and γ + and γ − values
1 X
NN
1
computed on the test set.
γ =
−
1 − max yn,c + δC,K Regarding the performance on negative samples of
NN n=1 1≤c≤K K
(7) the test set, we can see that BG and EOS outperform
1 X
NK
γ+ + γ− the SoftMax (S) baseline, indicating that the classifiers
γ =
+
yτ γ= learn to discard negatives. Generally, EOS seems to
NK n=1 n 2
be better than BG in this task. Particularly, in P1
For known samples, γ + simply averages the SoftMax EOS reaches a high CCR at FPR=10−2 , showing the
score for the correct class, while for negative samples, classifier can easily reject the negative samples, which is
γ − computes the average deviation from the minimum to be expected since negative samples are semantically
possible SoftMax score, which is 0 in case of the BG and structurally far from the known classes.
class (where C = K + 1), and K 1
if no additional back- When evaluating the unknown samples of the test
ground class is available (C = K). When summing set that belong to classes that have not been seen dur-
over all known and negative samples, we can see how ing training, BG and EOS classifiers drop performance,
well our classifier has learned to separate known from and compared to the gains on the validation set the be-
negative samples. The maximum γ score is 1 when all havior is almost similar to the SoftMax baseline. Espe-
known samples are classified as the correct class with cially when looking into P2 and P3 , training with the
probability 1, and all negative samples are classified as negative samples does not clearly improve the open-set
any known class with probability 0 or K 1
. When look- 6 Remember that the known and negative sets are split into
ing at γ and γ individually, we can also detect if the
+ −
training and test samples so that we never evaluate with samples
training focuses on one part more than on the other. seen during training.P1 Negative P2 Negative P3 Negative
0.8
0.6
0.4
0.2
0.0
CCR
P1 Unknown P2 Unknown P3 Unknown
0.8
0.6
0.4
0.2
0.010 4 10 3 10 2 10 1 10010 4 10 3 10 2 10 1 10010 4 10 3 10 2 10 1 100
S BG EOS FPR
Figure 2: Open-Set Classification Rate Curves. OSCR curves are shown for test data of each protocol. The
top row is calculated using negative test samples, while the bottom row uses unknown test samples. Curves that do not
extend to low FPR values indicate that the threshold in (6) is maximized at θ = 1.
classifiers. However, in P1 EOS still outperforms S and provides lower confidences for negative samples (γ − ).
BG for higher FPR, indicating the classifier learned This indicates that the class balancing technique in (4)
to discard unknown samples up to some degree. This might have been too drastic and that higher weights
shows that the easy task of rejecting samples very far for negative samples might improve results of the BG
from the known classes can benefit from EOS train- method. Similarly, employing lower weights for the
ing with negative samples, i.e., the denoted open-set EOS classifier might improve the confidence scores for
method is good for out-of-distribution detection, but known samples at the cost of lower confidence for nega-
not for the more general task of open-set classification. tives. Finally, from an open-set perspective, our confi-
dence metric provides insightful information about the
5. Discussion model training; so far, we have used it to explain the
model performance, but together with more parameter
After we have seen that the methods perform well tuning, the joint γ metric can be used as criterion for
on negative and not so well on unknown data, let us early stopping as shown in the supplemental material.
analyze the results. First, we show how our novel vali- We also analyze the SoftMax scores according to (2)
dation metrics can be used to identify gaps and incon- of known and unknown classes in every protocol. For
sistencies during training of the open-set classifiers BG samples from known classes we use the SoftMax score
and EOS. Fig. 3 shows the confidence progress across of the correct class, while for unknown samples we take
the training epochs. During the first epochs, the con- the maximum SoftMax score of any known class. This
fidence of the known samples (γ + , left in Fig. 3) is enables us to make a direct comparison between differ-
low since the SoftMax activations produce low values ent approaches in Fig. 4. When looking at the score
for all classes. As the training progresses, the models distributions of the known samples, we can see that
learn to classify known samples, increasing the correct many samples are correctly classified with a high prob-
SoftMax activation of the target class. Similarly, be- ability, while numerous samples provide almost 0 prob-
cause of low activation values, the confidence of neg- ability for the correct class. This indicates that a more
ative samples (γ − , right) is close to 1 at the begin- detailed analysis, possibly via a confusion matrix, is
ning of the training. Note that EOS keeps the low required to further look into the details of these errors,
activations during training, learning to respond only but this goes beyond the aim of this paper.
to known classes, particularly in P1 , where values are More interestingly, the distribution of scores for un-
close to 1 during all epochs. On the other hand, BG known classes differ dramatically between approachesP1 Known P1 Negative P1 S P1 BG P1 EOS
1.0
0.8
3000
2400
0.6 1800
0.4
1200
600
0.2 0 P2 S P2 BG P2 EOS
750
P2 Known P2 Negative 600
1.0 450
0.8 300
0.6 150
0 P3 S P3 BG P3 EOS
0.4 5000
0.2 4000
3000
P3 Known P3 Negative 2000
1.0 1000
0.8 00.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0
0.6 Known Unknown Score
0.4
0.2
20 40 60 80 100 120 20 40 60 80 100 120 Figure 4: Histograms of SoftMax Scores. We
S BG EOS Epoch evaluate SoftMax probability scores for all three methods
and all three protocols. For known samples, we present his-
tograms of SoftMax score of the target class. For unknown
Figure 3: Confidence Propagation. Confidence samples, we plot the maximum SoftMax score of any known
values according to (7) are shown across training epochs class. For S and EOS, the minimum possible value of the
of S, BG and EOS classifiers. On the left, we show the 1
latter is K , which explains the gaps on the left-hand side.
confidence of the known samples (γ + ), while on the right
the confidence of negative samples (γ − ) is displayed.
the protocols is provided in the supplemental material.
We evaluate the performance of three classifiers in
and protocols. For P1 , EOS is able to suppress high
every protocol using OSCR curves and our proposed
scores almost completely, whereas both S and BG still
confidence validation metric. Our experiments show
have the majority of the cases providing high probabil-
that the two open-set algorithms can reject negative
ities of belonging to a known class. For P2 and, par-
samples, where samples of the same classes have been
ticularly, P3 a lot of unknown samples get classified as
seen during training, but face a performance degrada-
a known class with very high probability, throughout
tion in the presence of unknown data from previously
the evaluated methods. Interestingly, the plain Soft-
unseen classes. For a more straightforward scenario
Max (S) method has relatively high probability scores
such as P1 , it is advantageous to use negative samples
for unknown samples, especially in P2 and P3 where
during EOS training. While this result agrees with [6],
known and unknown classes are semantically similar.
the performance of BG and EOS in P2 and P3 shows
that these methods are not ready to be employed in
6. Conclusion
the real world, and more parameter tuning is required
In this work, we propose three novel evaluation pro- to improve performances. Furthermore, making better
tocols for open-set image classification that rely on the use of or augmenting the negative classes also poses a
ILSVRC 2012 dataset and allow an extensive evalua- challenge in further research in open-set methods.
tion of open-set algorithms. The data is entirely com- Providing different conclusions for the three different
posed of natural images and designed to have various protocols reflects the need for evaluation methods in
levels of similarities between its partitions. Addition- scenarios designed with different difficulty levels, which
ally, we carefully select the WordNet parent classes we provide within this paper. Looking ahead, with the
that allow us to include a larger number of known, novel open-set classification protocols on ImageNet we
negative and unknown classes. In contrast to previ- aim to establish comparable and standard evaluation
ous work, the class partitions are carefully designed, methods for open-set algorithms in scenarios closer to
and we move away from implementing mixes of sev- the real world. Instead of using low-resolution images
eral datasets (where rejecting unknown samples could and randomly selected samples from CIFAR, MNIST
be relatively easy) and the random selection of known or SVHN, we expect that our open-set protocols will es-
and unknown classes inside a dataset. This allows us tablish benchmarks and promote reproducible research
to differentiate between methods that work well in out- in open-set classification. In future work, we will inves-
of-distribution detection, and those that really perform tigate and optimize more different open-set algorithms
open-set classification. A more detailed comparison of and report their performances on our protocols.References [16] Yann LeCun, Corinna Cortes, and Christopher J. C.
Burges. The MNIST database of handwritten digits,
[1] Abhijit Bendale and Terrance Boult. Towards open 1998.
world recognition. In Conference on Computer Vision [17] Ofer Matan, R.K. Kiang, C.E. Stenard, B. Boser, J.S.
and Pattern Recognition (CVPR). IEEE, 2015. Denker, D. Henderson, R.E. Howard, W. Hubbard,
[2] Abhijit Bendale and Terrance E. Boult. Towards open L.D. Jackel, and Yann Le Cun. Handwritten char-
set deep networks. In Conference on Computer Vision acter recognition using neural network architectures.
and Pattern Recognition (CVPR). IEEE, 2016. In USPS Advanced Technology Conference, 1990.
[3] Annesha Bhoumik. Open-set classification on Ima- [18] Dimity Miller, Lachlan Nicholson, Feras Dayoub, and
geNet. Master’s thesis, University of Zurich, 2021. Niko Sünderhauf. Dropout sampling for robust object
[4] Paul Bodesheim, Alexander Freytag, Erik Rodner, and detection in open-set conditions. In International Con-
Joachim Denzler. Local novelty detection in multi- ference on Robotics and Automation (ICRA). IEEE,
class recognition problems. In Winter Conference on 2018.
Applications of Computer Vision (WACV), 2015. [19] George A Miller. WordNet: An electronic lexical
[5] Akshay Dhamija, Manuel Günther, Jonathan Ventura, database. MIT press, 1998.
and Terrance E. Boult. The overlooked elephant of [20] Lawrence Neal, Matthew Olson, Xiaoli Fern, Weng-
object detection: Open set. In Winter Conference on Keen Wong, and Fuxin Li. Open set learning with
Applications of Computer Vision (WACV), 2020. counterfactual images. In European Conference on
[6] Akshay Raj Dhamija, Manuel Günther, and Ter- Computer Vision (ECCV), 2018.
rance E. Boult. Reducing network agnostophobia. In [21] Yuval Netzer, Tao Wang, Adam Coates, Alessandro
Advances in Neural Information Processing Systems Bissacco, Bo Wu, and Andrew Y. Ng. Reading digits
(NeurIPS), 2018. in natural images with unsupervised feature learning.
[7] Logan Engstrom, Andrew Ilyas, Shibani Santurkar, In Advances in Neural Information Processing Systems
and Dimitris Tsipras. Robustness (python library), (NIPS) Workshop, 2011.
2019. [22] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya
[8] Zongyuan Ge, Sergey Demyanov, and Rahil Gar- Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sas-
navi. Generative OpenMax for multi-class open set try, Amanda Askell, Pamela Mishkin, Jack Clark,
classification. In British Machine Vision Conference Gretchen Krueger, and Ilya Sutskever. Learning trans-
(BMVC), 2017. ferable visual models from natural language supervi-
sion. In International Conference on Machine Learn-
[9] Chuanxing Geng, Sheng-Jun Huang, and Songcan
ing (ICML), 2021.
Chen. Recent advances in open set recognition: A sur-
[23] Jie Ren, Peter J Liu, Emily Fertig, Jasper Snoek,
vey. Transactions on Pattern Analysis and Machine
Ryan Poplin, Mark Depristo, Joshua Dillon, and Bal-
Intelligence (TPAMI), 43(10):3614–3631, 2021.
aji Lakshminarayanan. Likelihood ratios for out-of-
[10] Izhak Golan and Ran El-Yaniv. Deep anomaly de-
distribution detection. In Advances in Neural Infor-
tection using geometric transformations. In Advances
mation Processing Systems (NeurIPS), 2019.
in Neural Information Processing Systems (NeurIPS),
[24] Ryne Roady, Tyler L. Hayes, Ronald Kemker, Ayesha
2018.
Gonzales, and Christopher Kanan. Are open set clas-
[11] Patrick Grother and Kayee Hanaoka. NIST special sification methods effective on large-scale datasets?
database 19 handprinted forms and characters 2nd edi- PLOS ONE, 15(9), 2020.
tion. Technical report, National Institute of Standards [25] Ethan M. Rudd, Lalit P. Jain, Walter J. Scheirer, and
and Technology (NIST), 2016. Terrance E. Boult. The extreme value machine. Trans-
[12] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian actions on Pattern Analysis and Machine Intelligence
Sun. Deep residual learning for image recognition. In (TPAMI), 2017.
Conference on Computer Vision and Pattern Recogni- [26] Olga Russakovsky, Jia Deng, Hao Su, Jonathan
tion (CVPR), 2016. Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang,
[13] Heinrich Jiang, Been Kim, Melody Guan, and Maya Andrej Karpathy, Aditya Khosla, Michael Bernstein,
Gupta. To trust or not to trust a classifier. In Advances et al. Imagenet large scale visual recognition challenge.
in Neural Information Processing Systems (NeurIPS), International Journal of Computer Vision (IJCV),
2018. 115(3), 2015.
[14] Alex Krizhevsky and Geoffrey Hinton. Learning mul- [27] Walter J. Scheirer, Anderson de Rezende Rocha,
tiple layers of features from tiny images. Technical Archana Sapkota, and Terrance E. Boult. Towards
report, University of Toronto, 2009. open set recognition. Transactions on Pattern Analy-
[15] Balaji Lakshminarayanan, Alexander Pritzel, and sis and Machine Intelligence (TPAMI), 35(7), 2013.
Charles Blundell. Simple and scalable predictive [28] Sagar Vaze, Kai Han, Andrea Vedaldi, and Andrew
uncertainty estimation using deep ensembles. In Zissermann. Open-set recognition: A good closed-set
Advances in Neural Information Processing Systems classifier is all you need? In International Conference
(NIPS), 2017. on Learning Representations (ICLR), 2022.[29] Yang Yu, Wei-Yang Qu, Nan Li, and Zimin Guo.
Open-category classification by adversarial sample
generation. In International Joint Conference on Ar-
tificial Intelligence (IJCAI), 2017.
[30] Da-Wei Zhou, Han-Jia Ye, and De-Chuan Zhan.
Learning placeholders for open-set recognition. In
Conference on Computer Vision and Pattern Recog-
nition (CVPR), 2021.P1 Negative P2 Negative P3 Negative
0.8
0.6
0.4
0.2
0.0
CCR
P1 Unknown P2 Unknown P3 Unknown
0.8
0.6
0.4
0.2
0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
S BG EOS FPR
Figure 5: Linear OSCR Curves. OSCR curves in linear scale for negative data (top tow) and unknown data (bottom
row) of the test set for each protocol.
Supplemental Material
7. Linear OSCR Curves
Depending on the application, different false positive values are of importance. In the main paper, we focus more
on the low FPR regions since these are of more importance when a human needs to look into the positive cases and,
hence, large numbers of false positives involve labor cost. When false positives are not of utmost importance, e.g.,
when handled by an automated system in low-security applications, the focus will be more on having high correct
classification rate. Therefore, in Fig. 5 we provide the OSCR plots according to figure 2 of the main paper, but this
time with linear FPR axis. These plots highlight even more that the Entropic Open-Set (EOS) and the SoftMax
with Background class (BG) approaches fail when known and unknown classes are very similar, as provided in P2
and P3 .
8. Results Sorted by Protocol
Fig. 6 displays the same results as shown in Figure 2 of the main paper, but here we compare the results of
the different protocols. This shows that the definition of our protocols follows our intuition. We can see that
protocol P1 is easy in open-set, but difficult in closed-set operation, especially when looking to the performance
of BG and EOS, which are designed for open-set classification. On the opposite, protocol P3 achieves higher
closed-set recognition performance (at FPR = 1) while dropping much faster for lower FPR values. Protocol P2
is difficult both in closed- and open-set evaluation, but open-set performance does not drop that drastically as in
P3 , especially considering the unknown test data in the second row. Due to its relatively small size, P2 is optimal
for hyperparameter optimization and hyperparameters should be able to be transferred to the other two protocols
– unless the algorithm is critical w.r.t. hyperparameters. We leave hyperparameter optimization to future work.
9. Detailed Results
In Tab. 2 we include detailed numbers of the values seen in figures 2 and 3 of the main paper. We also include
the cases where we use our confidence metric to select the best epoch of training.
When comparing Tab. 2(a) with Tab. 2(b) it can be seen that the selection of the best algorithm based on the0.8 S Negative BG Negative EOS Negative
0.6
0.4
0.2
0.0
CCR
0.8 S Unknown BG Unknown EOS Unknown
0.6
0.4
0.2
0.010 4 10 3 10 2 10 1 10010 4 10 3 10 2 10 1 10010 4 10 3 10 2 10 1 100
P1 P2 P3 FPR
Figure 6: Comparison of Protocols. OSCR curves are split by algorithm to allow a comparison across protocols.
Results are shown for negative data (top row) and unknown data (bottom row).
confidence score provides better CCR values for BG in protocols P1 and P2 , and better or comparable numbers
for EOS across all protocols. The advantage of S is not that expressed since S does not take into consideration
negative samples and, thus, the confidence for unknown γ − gets very low in Tab. 2(a). When using our validation
metric, training stops early and does not provide a good overall, i.e., closed-set performance. Therefore, we propose
to use our novel evaluation metric for training open-set classifiers, but not for closed-set classifiers such as S.
10. Detailed List of Classes
For reproducibility, we also provide the list of classes (ImageNet identifier and class name) that we used in our
three protocols in Tables 3, 4 and 5. For known and negative classes, we used the samples of the original training
partition for training and validation, and the samples from the validation partition for testing – since the test set
labels of ILSVRC2012 are still not publicly available.Table 2: Classification Results. Correct Classification Rates (CCR) are provided at some False Positive Rates
(FPR) in the test set. The best CCR values are in bold. γ + is calculated using known and γ − using unknown classes.
(a) Using Last Epoch
Confidence CCR at FPR of:
Epoch
γ +
γ −
1e-3 1e-2 1e-1 1
P1 - S 120 0.665 0.470 0.077 0.326 0.549 0.678
P1 - BG 120 0.650 0.520 0.146 0.342 0.519 0.663
P1 - EOS 120 0.663 0.846 0.195 0.409 0.667 0.684
P2 - S 120 0.665 0.341 — 0.189 0.443 0.683
P2 - BG 120 0.598 0.480 — 0.111 0.339 0.616
P2 - EOS 120 0.592 0.708 — 0.233 0.476 0.664
P3 - S 120 0.767 0.240 — — 0.540 0.777
P3 - BG 120 0.753 0.344 — 0.241 0.543 0.764
P3 - EOS 120 0.684 0.687 — 0.298 0.568 0.763
(b) Using Best Epoch
Confidence CCR at FPR of:
Epoch
γ +
γ −
1e-3 1e-2 1e-1 1
P1 - S 20 0.584 0.906 0.270 0.543 0.664 0.667
P1 - BG 99 0.660 0.628 0.167 0.420 0.572 0.673
P1 - EOS 105 0.672 0.877 0.283 0.535 0.683 0.694
P2 - S 39 0.603 0.528 0.031 0.134 0.384 0.659
P2 - BG 113 0.638 0.491 — 0.156 0.368 0.658
P2 - EOS 101 0.599 0.694 — 0.223 0.473 0.666
P3 - S 15 0.677 0.527 0.083 0.239 0.509 0.745
P3 - BG 115 0.747 0.379 — 0.243 0.539 0.761
P3 - EOS 114 0.696 0.688 — 0.266 0.585 0.772Table 3: Protocol 1 Classes: List of super-classes and leaf nodes of Protocol P1 .
Known Negative Unknown
Id Name Id Name Id Name
n02084071 dog n02118333 fox n07555863 food
n02085620 Chihuahua n02119022 red fox n07684084 French loaf
n02085782 Japanese spaniel n02119789 kit fox n07693725 bagel
n02085936 Maltese dog n02120079 Arctic fox n07695742 pretzel
n02086079 Pekinese n02120505 grey fox n07714571 head cabbage
n02086240 Shih-Tzu n02115335 wild dog n07714990 broccoli
n02086646 Blenheim spaniel n02115641 dingo n07715103 cauliflower
n02086910 papillon n02115913 dhole n07716358 zucchini
n02087046 toy terrier n02116738 African hunting dog n07716906 spaghetti squash
n02087394 Rhodesian ridgeback n02114100 wolf n07717410 acorn squash
n02088094 Afghan hound n02114367 timber wolf n07717556 butternut squash
n02088238 basset n02114548 white wolf n07718472 cucumber
n02088364 beagle n02114712 red wolf n07718747 artichoke
n02088466 bloodhound n02114855 coyote n07720875 bell pepper
n02088632 bluetick n02120997 feline n07730033 cardoon
n02089078 black-and-tan coonho n02125311 cougar n07734744 mushroom
n02089867 Walker hound n02127052 lynx n07745940 strawberry
n02089973 English foxhound n02128385 leopard n07747607 orange
n02090379 redbone n02128757 snow leopard n07749582 lemon
n02090622 borzoi n02128925 jaguar n07753113 fig
n02090721 Irish wolfhound n02129165 lion n07753275 pineapple
n02091244 Ibizan hound n02129604 tiger n07753592 banana
n02091467 Norwegian elkhound n02130308 cheetah n07754684 jackfruit
n02091635 otterhound n02131653 bear n07760859 custard apple
n02091831 Saluki n02132136 brown bear n07768694 pomegranate
n02092002 Scottish deerhound n02133161 American black bear n03791235 motor vehicle
n02092339 Weimaraner n02134084 ice bear n02701002 ambulance
n02093256 Staffordshire bullte n02134418 sloth bear n02704792 amphibian
n02093428 American Staffordshi n02441326 musteline mammal n02814533 beach wagon
n02093647 Bedlington terrier n02441942 weasel n02930766 cab
n02093754 Border terrier n02442845 mink n03100240 convertible
n02093859 Kerry blue terrier n02443114 polecat n03345487 fire engine
n02093991 Irish terrier n02443484 black-footed ferret n03417042 garbage truck
n02094114 Norfolk terrier n02444819 otter n03444034 go-kart
n02094258 Norwich terrier n02445715 skunk n03594945 jeep
n02094433 Yorkshire terrier n02447366 badger n03670208 limousine
n02095314 wire-haired fox terr n02370806 ungulate n03770679 minivan
n02095570 Lakeland terrier n02389026 sorrel n03777568 Model T
n02095889 Sealyham terrier n02391049 zebra n03785016 moped
n02096051 Airedale n02395406 hog n03796401 moving van
n02096177 cairn n02396427 wild boar n03930630 pickup
n02096294 Australian terrier n02397096 warthog n03977966 police van
n02096437 Dandie Dinmont n02398521 hippopotamus n04037443 racer
n02096585 Boston bull n02403003 ox n04252225 snowplow
n02097047 miniature schnauzer n02408429 water buffalo n04285008 sports car
n02097130 giant schnauzer n02410509 bison n04461696 tow truck
n02097209 standard schnauzer n02412080 ram n04467665 trailer truck
n02097298 Scotch terrier n02415577 bighorn n03183080 device
n02097474 Tibetan terrier n02417914 ibex n02666196 abacus
n02097658 silky terrier n02422106 hartebeest n02672831 accordion
n02098105 soft-coated wheaten n02422699 impala n02676566 acoustic guitar
n02098286 West Highland white n02423022 gazelle n02708093 analog clock
n02098413 Lhasa n02437312 Arabian camel n02749479 assault rifle
n02099267 flat-coated retrieve n02437616 llama n02787622 banjo
n02099429 curly-coated retriev n02469914 primate n02794156 barometer
n02099601 golden retriever n02480495 orangutan n02804610 bassoon
n02099712 Labrador retriever n02480855 gorilla n02841315 binoculars
n02099849 Chesapeake Bay retri n02481823 chimpanzee n02879718 bow
n02100236 German short-haired n02483362 gibbon n02910353 buckle
n02100583 vizsla n02483708 siamang n02948072 candle
n02100735 English setter n02484975 guenon n02950826 cannon
n02100877 Irish setter n02486261 patas n02965783 car mirror
n02101006 Gordon setter n02486410 baboon n02966193 carousel
n02101388 Brittany spaniel n02487347 macaque n02974003 car wheel
n02101556 clumber n02488291 langur n02977058 cash machine
n02102040 English springer n02488702 colobus n02992211 cello
n02102177 Welsh springer spani n02489166 proboscis monkey n03000684 chain saw
n02102318 cocker spaniel n02490219 marmoset n03017168 chime
n02102480 Sussex spaniel n02492035 capuchin n03075370 combination lock
n02102973 Irish water spaniel n02492660 howler monkey n03110669 cornet
n02104029 kuvasz n02493509 titi n03126707 crane
Continued on next pageProtocol 1 Classes: List of super-classes and leaf nodes of Protocol P1 .
Known Negative Unknown
Id Name Id Name Id Name
n02104365 schipperke n02493793 spider monkey n03180011 desktop computer
n02105056 groenendael n02494079 squirrel monkey n03196217 digital clock
n02105162 malinois n02497673 Madagascar cat n03197337 digital watch
n02105251 briard n02500267 indri n03208938 disk brake
n02105412 kelpie n03249569 drum
n02105505 komondor n03271574 electric fan
n02105641 Old English sheepdog n03272010 electric guitar
n02105855 Shetland sheepdog n03372029 flute
n02106030 collie n03394916 French horn
n02106166 Border collie n03425413 gas pump
n02106382 Bouvier des Flandres n03447721 gong
n02106550 Rottweiler n03452741 grand piano
n02106662 German shepherd n03467068 guillotine
n02107142 Doberman n03476684 hair slide
n02107312 miniature pinscher n03485407 hand-held computer
n02107574 Greater Swiss Mounta n03492542 hard disc
n02107683 Bernese mountain dog n03494278 harmonica
n02107908 Appenzeller n03495258 harp
n02108000 EntleBucher n03496892 harvester
n02108089 boxer n03532672 hook
n02108422 bull mastiff n03544143 hourglass
n02108551 Tibetan mastiff n03590841 jack-o’-lantern
n02108915 French bulldog n03627232 knot
n02109047 Great Dane n03642806 laptop
n02109525 Saint Bernard n03666591 lighter
n02109961 Eskimo dog n03691459 loudspeaker
n02110063 malamute n03692522 loupe
n02110185 Siberian husky n03706229 magnetic compass
n02110341 dalmatian n03720891 maraca
n02110627 affenpinscher n03721384 marimba
n02110806 basenji n03733131 maypole
n02110958 pug n03759954 microphone
n02111129 Leonberg n03773504 missile
n02111277 Newfoundland n03793489 mouse
n02111500 Great Pyrenees n03794056 mousetrap
n02111889 Samoyed n03803284 muzzle
n02112018 Pomeranian n03804744 nail
n02112137 chow n03814639 neck brace
n02112350 keeshond n03832673 notebook
n02112706 Brabancon griffon n03838899 oboe
n02113023 Pembroke n03840681 ocarina
n02113186 Cardigan n03841143 odometer
n02113624 toy poodle n03843555 oil filter
n02113712 miniature poodle n03854065 organ
n02113799 standard poodle n03868863 oxygen mask
n02113978 Mexican hairless n03874293 paddlewheel
n03874599 padlock
n03884397 panpipe
n03891332 parking meter
n03929660 pick
n03933933 pier
n03944341 pinwheel
n03992509 potter’s wheel
n03995372 power drill
n04008634 projectile
n04009552 projector
n04040759 radiator
n04044716 radio telescope
n04067472 reel
n04074963 remote control
n04086273 revolver
n04090263 rifle
n04118776 rule
n04127249 safety pin
n04141076 sax
n04141975 scale
n04152593 screen
n04153751 screw
n04228054 ski
n04238763 slide rule
n04243546 slot
n04251144 snorkel
Continued on next pageProtocol 1 Classes: List of super-classes and leaf nodes of Protocol P1 .
Known Negative Unknown
Id Name Id Name Id Name
n04258138 solar dish
n04265275 space heater
n04275548 spider web
n04286575 spotlight
n04311174 steel drum
n04317175 stethoscope
n04328186 stopwatch
n04330267 stove
n04332243 strainer
n04355338 sundial
n04355933 sunglass
n04356056 sunglasses
n04372370 switch
n04376876 syringe
n04428191 thresher
n04456115 torch
n04485082 tripod
n04487394 trombone
n04505470 typewriter keyboard
n04515003 upright
n04525305 vending machine
n04536866 violin
n04548280 wall clock
n04579432 whistle
n04592741 wing
n06359193 web site
End of table Protocol 1Table 4: Protocol 2 Classes: List of super-classes and leaf nodes of Protocol P2 .
Known Negative Unknown
Id Name Id Name Id Name
n02087122 hunting dog n02087122 hunting dog n02085374 toy dog
n02087394 Rhodesian ridgeback n02096051 Airedale n02085620 Chihuahua
n02088094 Afghan hound n02096177 cairn n02085782 Japanese spaniel
n02088238 basset n02096294 Australian terrier n02085936 Maltese dog
n02088364 beagle n02096437 Dandie Dinmont n02086079 Pekinese
n02088466 bloodhound n02096585 Boston bull n02086240 Shih-Tzu
n02088632 bluetick n02097047 miniature schnauzer n02086646 Blenheim spaniel
n02089078 black-and-tan coonho n02097130 giant schnauzer n02086910 papillon
n02089867 Walker hound n02097209 standard schnauzer n02087046 toy terrier
n02089973 English foxhound n02097298 Scotch terrier n02118333 fox
n02090379 redbone n02097474 Tibetan terrier n02119022 red fox
n02090622 borzoi n02097658 silky terrier n02119789 kit fox
n02090721 Irish wolfhound n02098105 soft-coated wheaten n02120079 Arctic fox
n02091244 Ibizan hound n02098286 West Highland white n02120505 grey fox
n02091467 Norwegian elkhound n02098413 Lhasa n02115335 wild dog
n02091635 otterhound n02099267 flat-coated retrieve n02115641 dingo
n02091831 Saluki n02099429 curly-coated retriev n02115913 dhole
n02092002 Scottish deerhound n02099601 golden retriever n02116738 African hunting dog
n02092339 Weimaraner n02099712 Labrador retriever n02114100 wolf
n02093256 Staffordshire bullte n02099849 Chesapeake Bay retri n02114367 timber wolf
n02093428 American Staffordshi n02100236 German short-haired n02114548 white wolf
n02093647 Bedlington terrier n02100583 vizsla n02114712 red wolf
n02093754 Border terrier n02100735 English setter n02114855 coyote
n02093859 Kerry blue terrier n02100877 Irish setter n02120997 feline
n02093991 Irish terrier n02101006 Gordon setter n02125311 cougar
n02094114 Norfolk terrier n02101388 Brittany spaniel n02127052 lynx
n02094258 Norwich terrier n02101556 clumber n02128385 leopard
n02094433 Yorkshire terrier n02102040 English springer n02128757 snow leopard
n02095314 wire-haired fox terr n02102177 Welsh springer spani n02128925 jaguar
n02095570 Lakeland terrier n02102318 cocker spaniel n02129165 lion
n02095889 Sealyham terrier n02102480 Sussex spaniel n02129604 tiger
n02102973 Irish water spaniel n02130308 cheetah
n02131653 bear
n02132136 brown bear
n02133161 American black bear
n02134084 ice bear
n02134418 sloth bear
n02441326 musteline mammal
n02441942 weasel
n02442845 mink
n02443114 polecat
n02443484 black-footed ferret
n02444819 otter
n02445715 skunk
n02447366 badger
n02370806 ungulate
n02389026 sorrel
n02391049 zebra
n02395406 hog
n02396427 wild boar
n02397096 warthog
n02398521 hippopotamus
n02403003 ox
n02408429 water buffalo
n02410509 bison
n02412080 ram
n02415577 bighorn
n02417914 ibex
n02422106 hartebeest
n02422699 impala
n02423022 gazelle
n02437312 Arabian camel
n02437616 llama
End of table Protocol 2You can also read