POINTGUARD: PROVABLY ROBUST 3D POINT CLOUD CLASSIFICATION

 
CONTINUE READING
PointGuard: Provably Robust 3D Point Cloud Classification

                                                                            Hongbin Liu*        Jinyuan Jia*    Neil Zhenqiang Gong
                                                                                                  Duke University
                                                                               {hongbin.liu, jinyuan.jia, neil.gong}@duke.edu

                                                                       Abstract
arXiv:2103.03046v2 [cs.CR] 2 Jul 2021

                                                                                                                                   Modify 10 points         Prediction: radio

                                            3D point cloud classification has many safety-critical
                                        applications such as autonomous driving and robotic grasp-
                                        ing. However, several studies showed that it is vulnerable to
                                        adversarial attacks. In particular, an attacker can make a                                 Add 10 points            Prediction: radio
                                        classifier predict an incorrect label for a 3D point cloud via
                                        carefully modifying, adding, and/or deleting a small num-
                                        ber of its points. Randomized smoothing is state-of-the-art            Label: plane
                                        technique to build certifiably robust 2D image classifiers.
                                                                                                               Prediction: plane
                                        However, when applied to 3D point cloud classification,                                    Delete 274 points        Prediction: tent
                                        randomized smoothing can only certify robustness against
                                        adversarially modified points.
                                            In this work, we propose PointGuard, the first defense           Figure 1: Top: point modification attack. Middle: point
                                        that has provable robustness guarantees against adversar-            addition attack. Bottom: point deletion attack. Red
                                        ially modified, added, and/or deleted points. Specifically,          points are added and yellow points are deleted.
                                        given a 3D point cloud and an arbitrary point cloud classi-
                                        fier, our PointGuard first creates multiple subsampled point
                                                                                                             has many applications, such as robotic grasping [28], au-
                                        clouds, each of which contains a random subset of the
                                                                                                             tonomous driving [2, 36], etc.. However, multiple recent
                                        points in the original point cloud; then our PointGuard pre-
                                                                                                             studies [34, 31, 38, 39, 35, 20] showed that 3D point cloud
                                        dicts the label of the original point cloud as the majority
                                                                                                             classifiers are vulnerable to adversarial attacks. In particu-
                                        vote among the labels of the subsampled point clouds pre-
                                                                                                             lar, given a 3D point cloud, an attacker can carefully mod-
                                        dicted by the point cloud classifier. Our first major theoret-
                                                                                                             ify, add, and/or delete a small number of points such that a
                                        ical contribution is that we show PointGuard provably pre-
                                                                                                             3D point cloud classifier predicts an incorrect label for it.
                                        dicts the same label for a 3D point cloud when the number
                                                                                                             We can categorize these attacks into four types based on the
                                        of adversarially modified, added, and/or deleted points is
                                                                                                             capability of an attacker: point modification, addition, dele-
                                        bounded. Our second major theoretical contribution is that
                                                                                                             tion, and perturbation attacks. In particular, in a point modi-
                                        we prove the tightness of our derived bound when no as-
                                                                                                             fication/addition/deletion attack [34, 31, 38, 35], an attacker
                                        sumptions on the point cloud classifier are made. Moreover,
                                                                                                             can only modify/add/delete points in a 3D point cloud. An
                                        we design an efficient algorithm to compute our certified
                                                                                                             attacker, however, can apply one or more of the above three
                                        robustness guarantees. We also empirically evaluate Point-
                                                                                                             operations, i.e., modification, addition, and deletion, to a
                                        Guard on ModelNet40 and ScanNet benchmark datasets.
                                                                                                             3D point cloud in a point perturbation attack. Figure 1 illus-
                                                                                                             trates the point modification, addition, and deletion attacks.
                                                                                                             These adversarial attacks pose severe security concerns to
                                        1. Introduction                                                      point cloud classification in safety-critical applications.
                                           3D point cloud, which comprises a set of 3D points, is                Several empirical defenses [18, 40, 35, 6] have been pro-
                                        a crucial data structure in modelling a 3D shape or object.          posed to mitigate the attacks. Roughly speaking, these de-
                                        In recent years, we have witnessed an increasing interest            fenses aim to detect the attacks or train more robust point
                                        in 3D point cloud classification [23, 17, 24, 30] because it         cloud classifiers. For instance, Zhou et al. [40] proposed a
                                                                                                             defense called DUP-Net, whose key step is to detect outlier
                                          * The   first two authors made equal contributions.                points and discard them before classifying a point cloud.

                                                                                                         1
These defenses, however, lack provable robustness guaran-                  Our derived certified perturbation size for a point cloud is
tees and are often broken by more advanced attacks. For                 the solution to an optimization problem, which relies on the
instance, Ma et al. [20] proposed a joint gradient based at-            point cloud’s label probabilities. However, it is challenging
tack and showed that it can achieve high attack success rates           to compute the exact label probabilities in practice since it
even if DUP-Net [40] is deployed.                                       requires predicting the labels for an exponential number of
    Therefore, it is urgent to study certified defenses that            subsampled point clouds. In particular, computing the exact
have provable robustness guarantees. We say a point cloud               label probabilities for a point cloud requires predicting the
                                                                        labels for nk subsampled point clouds if the point cloud
                                                                                      
classifier is provably robust if it certifiably predicts the same
label for a point cloud when the number of modified, added,             contains n points. To address the challenge, we develop
and/or deleted points is no larger than a threshold. Random-            a Monte-Carlo algorithm to estimate the lower and upper
ized smoothing [4] is state-of-the-art technique for build-             bounds of the label probabilities with probabilistic   guaran-
                                                                        tees via predicting labels for N
2. Background and Related Work                                          signed a new self-robust 3D point recognition network. In
                                                                        particular, the network first extracts local features from the
3D point cloud classification: A 3D point cloud is an                   input point cloud and then uses a self-attention mechanism
unordered set of points sampled from the surface of a 3D                to aggregate these local features, which could ignore adver-
object or shape. We use T = {Oi |i = 1, 2, · · · , n} to                sarial local features. Liu et al. [18] generalized adversar-
denote a 3D point cloud, where each point Oi is a vector                ial training [7] to build more robust point cloud classifiers.
that contains the (x, y, z) coordinates and possibly some               However, these empirical defenses lack certified robustness
other features, e.g., colours. In 3D point cloud classifica-            guarantees and are often broken by advanced adaptive at-
tion, given a point cloud T as input, a classifier f predicts a         tacks.
label y ∈ {1, 2, · · · , c} for it. For instance, the label could       Existing certified defenses: To the best of our knowledge,
represent the type of 3D object from which the point cloud              there are no certified defenses against adversarial attacks for
T is sampled. Formally, we have y = f (T ). Many deep                   3D point cloud classification. We note that, however, many
learning classifiers (e.g., [23, 24, 17, 30]) have been pro-            certified defenses against adversarial attacks have been pro-
posed for 3D point cloud classification. For instance, Qi et            posed for 2D image classification. Among these defenses,
al. [23] proposed PointNet, which can directly consume 3D               randomized smoothing [1, 19, 13, 16, 4, 25, 22, 14, 15,
point cloud. Roughly speaking, PointNet first applies in-               11, 29] is state-of-the-art because it is scalable to large-
put and feature transformations to the input points, and then           scale neural networks and applicable to arbitrary classi-
aggregates point features by max pooling. One important                 fiers. Roughly speaking, randomized smoothing adds ran-
characteristic of PointNet is permutation invariant. In par-            dom noise (e.g., Gaussian noise) to an input image before
ticular, given a 3D point cloud, the predicted label does not           classifying it; and randomized smoothing provably predicts
rely on the order of the points in the point cloud.                     the same label for the image when the adversarial perturba-
Adversarial attacks to 3D point cloud classifica-                       tion added to the image is bounded under certain metrics,
tion: Multiple recent works [34, 31, 38, 35, 27, 8, 37, 39,             e.g., `2 norm. We can leverage randomized smoothing to
12] showed that 3D point cloud classification is vulnerable             certify robustness of point cloud classification via adding
to (physically feasible) adversarial attacks. Roughly speak-            random noise to each point of a point cloud. However, ran-
ing, given a 3D point cloud, these attacks aim to make a                domized smoothing requires the size of the input (e.g., the
3D point cloud classifier misclassify it via carefully modi-            number of points in a point cloud) remains unaltered under
fying, adding, and/or deleting some points from it. Xiang               adversarial attacks. As a result, randomized smoothing can
et al. [34] proposed point perturbation and addition attacks.           only certify robustness against point modification attacks,
For instance, they showed that PointNet [23] can be fooled              leaving certified defenses against the other three types of
by adding a limited number of synthesized point clusters                attacks untouched.
with meaningful shapes such as balls to a point cloud. Yang                 We note that Jia et al. [9] analyzed the intrinsic certi-
et al. [35] explored point modification, addition, and dele-            fied robustness of bagging against data poisoning attacks.
tion attacks. In particular, their point modification attack            Both Jia et al. and our work use random sampling. How-
is inspired by gradient-guided attack methods, which were               ever, our work has several key differences with Jia et al..
designed to attack 2D image classification. Their point ad-             First, we solve a different problem from Jia et al.. More
dition and deletion attacks aim to add or remove the criti-             specifically, we aim to derive the certified robustness guar-
cal points, which can be identified by their label-dependent            antees of 3D point cloud classification against adversarial
importance scores obtained by computing the gradient of                 attacks, while they aim to defend against data poisoning at-
a classifier’s output with respect to the input. Wicker et              tacks. Second, we use sampling without replacement while
al. [31] proposed a point deletion attack which also lever-             Jia et al. adopted sampling with replacement, which results
aged critical points. Specifically, they developed an algo-             in significant technical differences in the derivation of the
rithm to identify critical points in a random and iterative             certified robustness guarantees (please refer to Supplemen-
manner. Ma et al. [20] proposed a joint gradient based at-              tal Material for details). Third, we only need to train a sin-
tack and showed that the proposed attack can break an em-               gle 3D point cloud classifier while Jia et al. requires to train
pirical defense [40] on multiple 3D point cloud classifiers.            multiple base classifiers.
Existing empirical defenses:         Several empirical de-
                                                                        3. Our PointGuard
fenses [18, 40, 35, 6, 32, 26] have been proposed to de-
fend against adversarial attacks. Roughly speaking, these                  In this section, we first describe how to build our Point-
defenses aim to detect the attacks or train more robust point           Guard from an arbitrary 3D point cloud classifier. Then, we
cloud classifiers. For instance, Zhou et al. [40] proposed              derive the certified robustness guarantee of our PointGuard
DUP-Net, whose key idea is to detect and discard outlier                and show its tightness. Finally, we develop an algorithm to
points before classifying a point cloud. Dong et al. [6] de-            compute our certified robustness guarantee in practice.

                                                                    3
3.2. Deriving the Certified Perturbation Size
                                            Prediction: plane

                                                                                                        Certified perturbation size: In an adversarial attack, an
                                                                                                        attacker perturbs, i.e., modifies, adds, and/or deletes, some
                                            Prediction: plane   Majority vote   Prediction: plane
                                                                                                        points in a point cloud T . We use T ∗ to denote the per-
                                                                                                        turbed point cloud. Given a point cloud T and its perturbed
Adversarial point cloud                                                                                 version T ∗ , we define the perturbation size η(T, T ∗ ) =
Label: plane                                                                                            max{|T |, |T ∗ |} − |T ∩ T ∗ |, where | · | denotes the number
Prediction: radio
                                            Prediction: radio                                           of points in a point cloud. Intuitively, given T and T ∗ , the
                                                                                                        perturbation size η(T, T ∗ ) indicates the minimum number
                          Subsampled point clouds                                                       of modified, added, and/or deleted points that are required
                                                                                                        to turn T into T ∗ . Given the point cloud T and an arbitrary
Figure 2: An example to illustrate the robustness of
                                                                                                        positive integer r, we define the following set:
PointGuard. A point cloud classifier misclassifies the ad-
versarial point cloud, where the red points are adversar-
                                                                                                                        Γ(T, r) = {T ∗ | η(T, T ∗ ) ≤ r}.          (2)
ially perturbed points. Three subsampled point clouds
are created, and two of them (top and middle ones)
                                                                                                        Intuitively, Γ(T, r) denotes the set of perturbed point clouds
do not contain adversarially perturbed points. Our
                                                                                                        that can be obtained by perturbing at most r points in T .
PointGuard predicts the correct label for the adversar-
                                                                                                           Our goal is to find a maximum r∗ such that our Point-
ial point cloud after majority vote.
                                                                                                        Guard provably predicts the same label for ∀ T ∗ ∈
                                                                                                        Γ(T, r∗ ). Formally, we have:
3.1. Building our PointGuard
                                                                                                          r∗ = argmax r s.t. g(T ) = g(T ∗ ), ∀T ∗ ∈ Γ(T, r). (3)
    Recall that an attacker’s goal is to modify, add, and/or                                                        r
delete a small number of points in a 3D point cloud T such
that it is misclassified by a point cloud classifier. Suppose                                           We call r∗ certified perturbation size. Note that the certified
we create multiple subsampled point clouds from T , each                                                perturbation size may be different for different point clouds.
of which includes k points subsampled from T uniformly                                                  Overview of our derivation:            Next, we provide an
at random without replacement. Our intuition is that, when                                              overview of our proof to derive the certified perturbation
the number of adversarially modified/added/deleted points                                               size of our PointGuard for a point cloud T . The detailed
is bounded, the majority of the subsampled point clouds do                                              proof is shown in the Supplemental Material. Our deriva-
not include any adversarially modified/added/deleted points                                             tion is inspired by previous work [10, 9]. In particular, the
and thus the majority vote among their labels predicted by                                              key idea is to divide the space into different regions based
the point cloud classifier may still correctly predict the label                                        on the Neyman-Pearson Lemma [21]. However, due to the
of the original point cloud T . Figure 2 provides an example                                            difference in sampling methods, our space divisions are sig-
to illustrate the intuition.                                                                            nificantly different from previous work [9]. For simplicity,
    We design our PointGuard based on such majority-vote                                                we define random variables W = Sk (T ) and Z = Sk (T ∗ ),
intuition. Next, we present a probabilistic view of the                                                 which represent the random subsampled point clouds from
majority-vote intuition, which enables us to derive the certi-                                          T and T ∗ , respectively. Given these two random variables,
fied robustness guarantees of PointGuard. Formally, we use                                              we denote pi = Pr(f (W) = i) and p∗i = Pr(f (Z) = i),
Sk (T ) to denote a random subsampled point cloud with k                                                where i ∈ {1, 2, · · · , c}. Moreover, we denote y = g(T ) =
points from a point cloud T . Given an arbitrary point cloud                                            argmaxi∈{1,2,··· ,c} pi . Our goal is to find the maximum
classifier f , we use it to predict the label of the subsam-                                            r∗ such that y = g(T ∗ ) = argmaxi∈{1,2,··· ,c} p∗i (i.e.,
pled point cloud Sk (T ). Since the subsampled point cloud                                              p∗y > maxi6=y p∗i ) for ∀T ∗ ∈ Γ(T, r∗ ).
Sk (T ) is random, the predicted label f (Sk (T )) is also ran-                                             The major challenge in deriving the certified perturba-
dom. We use pi = Pr(f (Sk (T )) = i), which we call label                                               tion size r∗ is to compute p∗i . Specifically, the challenge
probability, to denote the probability that the predicted label                                         stems from the complexity of the   point cloud classifier f
is i, where i ∈ {1, 2, · · · , c}. Our PointGuard predicts the                                          and predicting labels for the kt subsampled point clouds
label with the largest label probability for the point cloud T .                                        Sk (T ∗ ), where t is the number of points in T ∗ . To overcome
For simplicity, we use g to denote our PointGuard. Then,                                                the challenge, we propose to derive a lower bound of p∗y
we have the following:                                                                                  and an upper bound of maxi6=y p∗i . Moreover, based on the
                                                                                                        Neyman-Pearson Lemma [21], we derive the lower/upper
                               g(T ) = argmax pi .                                          (1)         bounds as the probabilities that the random variable Z is in
                                            i∈{1,2,··· ,c}                                              certain regions of its domain space, which can be efficiently

                                                                                                    4
computed for any given r. Then, we find the certified pertur-                       Formally, we have the following theorem:
bation size r∗ as the maximum r such that the lower bound                    Theorem 1 (Certified Perturbation Size). Suppose we have
of p∗y is larger than the upper bound of maxi6=y p∗i .                       an arbitrary point cloud classifier f , a 3D point cloud T ,
   Next, we discuss how we derive the lower/upper bounds.                    and a subsampling size k. y, e, py ∈ [0, 1], and pe ∈ [0, 1]
Suppose we have a lower bound py of py and an upper                          satisfy Equation (4). Then, our PointGuard g guarantees
bound pe of the second largest label probability pe for the                  that g(T ∗ ) = y, ∀T ∗ ∈ Γ(T, r∗ ), where r∗ is the solution
original point cloud T . Formally, we have the following:                    to the following optimization problem:
                  py ≥ py ≥ pe ≥ pe = max pi ,                     (4)       r∗ = argmax r
                                                   i6=y
                                                                                             r
where p and p denote the lower and upper bounds of p, re-                                          t
                                                                                                                max(n,t)−r
                                                                                                                              
spectively. Note that py and pe can be estimated using the                   s.t.      max         k
                                                                                                   n       −2·       k
                                                                                                                     n            + 1 − p0y + p0e < 0,
                                                                                    n−r≤t≤n+r
unperturbed point cloud T . In Section 3.3, we propose an                                          k                 k

algorithm to estimate them. Based on the fact that py and                                                                                         (9)
pi (i 6= y) should be integer multiples of 1/ nk , where n is                where     p0y
                                                                                        and      p0e
                                                                                                are respectively defined in Equation (5)
the number of points in T , we have the following:                           and (6), n is the number of points in T , and t is the number
               dpy · nk e                                                    of points in T ∗ which ranges from n − r to n + r when the
                       
          0
         py ,       n
                         ≤ Pr(f (W) = y),                (5)                perturbation size is r.
                          k
                           n
                                                                            Proof. See Section A in Supplementary Material.
                  bpi ·            c
        p0i   ,       n
                          k           ≥ Pr(f (W) = i), ∀i 6= y.   (6)          Note that our Theorem 1 can be applied to any of the four
                      k
                                                                             types of adversarial attacks to point cloud classification, i.e.,
Given these probability bounds p0y and p0i (i 6= y), we derive               point perturbation, modification, addition, and deletion at-
a lower bound of p∗y and an upper bound of p∗i (i 6= y) via                  tacks. Moreover, for point modification, addition, and dele-
the Neyman-Pearson Lemma [21]. We use Φ to denote the                        tion attacks, we can further simplify the constraint in Equa-
joint space of W and Z, where each element in the space                      tion (9) as there is a simple relationship between t, n, and
is a 3D point cloud with k points subsampled from T or                       r. Specifically, we have the following corollaries.
T ∗ . We denote by E the set of intersection points between
                                                                             Corollary 1 (Point Modification Attacks). Suppose an at-
T and T ∗ , i.e., E = T ∩ T ∗ . Then, we can divide Φ into
                                                                             tacker only modifies existing points in a 3D point cloud, i.e.,
three disjoint regions: ∆T , ∆E , and ∆T ∗ . In particular, ∆E
                                                                             we have t = n. Then, the constraint in Equation (9) reduces
consists of the subsampled point clouds that can be obtained                           (n−r) py −pe
                                                                                                    0  0

by subsampling k points from E; and ∆T (or ∆T ∗ ) consists                   to 1 − nk − 2 < 0.
                                                                                        (k )
of the subsampled point clouds that are subsampled from T
                                                                             Corollary 2 (Point Addition Attacks). Suppose an attacker
(or T ∗ ) but do not belong to ∆E .
                                                                             only adds new points to a 3D point cloud, i.e., we have
    We assume p0y > Pr(W ∈ ∆T ). Note that we can make
                                                                             t = n + r. Then, the constraint in Equation (9) reduces
this assumption because our goal is to find a sufficient con-                    (n+r)
dition. Then, we can find a region ∆y ⊆ ∆E such that                         to nk − 1 − p0y + p0e < 0.
                                                                                  (k )
Pr(W ∈ ∆y ) = p0y − Pr(W ∈ ∆T ). We can find the                             Corollary 3 (Point Deletion Attacks). Suppose an attacker
region because p0y − Pr(W ∈ ∆T ) is an integer multiple                      only deletes existing points from a 3D point cloud, i.e., we
of 1/ nk . Similarly, we can assume p0i < Pr(W ∈ ∆E )
          
                                                                             have t = n−r. Then, the constraint in Equation (9) reduces
since our goal is to find a sufficient condition. Then, for                        (n−r)
                                                                             to − nk + 1 − p0y + p0e < 0.
each i 6= y, we can find a region ∆i ⊆ ∆E such that                                 (k )
Pr(W ∈ ∆i ) = p0i based on the fact that p0i is an integer                      Next, we show that our derived certified perturbation size
multiple of 1/ nk . Finally, we derive the following bounds                  is tight, i.e., it is impossible to derive a certified perturbation
based on the Neyman-Pearson Lemma [21]:                                      size larger than ours if no assumptions on the point cloud
                                                                             classifier f are made.
          p∗y ≥ p∗y = Pr(Z ∈ ∆y ),                                 (7)
                                                                             Theorem 2 (Tightness of certified perturbation    size). Sup-
          p∗i ≤ p∗i = Pr(Z ∈ ∆i ∪ ∆T ∗ ), ∀i 6= y,                 (8)       pose we have p0y + p0e ≤ 1 and p0y + i6=y p0i ≥ 1. Then, for
                                                                                                                    P

where Pr(Z ∈ ∆y ) and Pr(Z ∈ ∆i ∪ ∆T ∗ ) represent the                       ∀r > r∗ , there exist a point cloud classifier f ∗ which sat-
probabilities that Z is in the corresponding regions, which                  isfies Equation (4) and an adversarial point cloud T ∗ such
can be efficiently computed via the probability mass func-                   that g(T ∗ ) 6= y or there exist ties.
tion of Z. Then, our certified perturbation size r∗ is the                   Proof. See Section B in Supplementary Material.
maximum r such that p∗y > maxi6=y p∗i .

                                                                         5
Algorithm 1: P REDICTION & C ERTIFICATION                          Solving the optimization problem: Given the estimated
    Input: f , T , k, N , and α.                                    label probability bounds, we can use binary search to solve
    Output: Predicted label and certified perturbation size.        the optimization problem in Equation (9) to find the certified
    M1 , M2 , · · · , MN ← R ANDOM S UBSAMPLE(T, k)                 perturbation size r∗ .
                    PN
    counts[i] ← j=1 I(f (Mj ) = i), i = 1, 2, · · · , c.            Complete algorithm: Algorithm 1 shows the complete
    y, e ← top two indices in counts                                process of our prediction and certification for a 3D point
    py , pe ← P ROB B OUND E STIMATION(counts, α)                   cloud T , which outputs our PointGuard’s predicted label y
    if py > pe then                                                 and certified perturbation size r∗ for T . The function R AN -
       r∗ = B INARY S EARCH(|T |, k, py , pe )                      DOM S UBSAMPLE creates N subsampled point clouds from
    else                                                            T . The function P ROB B OUND E STIMATION estimates the
       y, r∗ ← ABSTAIN, ABSTAIN                                     label probability lower and upper bounds py and pe with
    end if                                                          confidence level 1−α based on Equation (10) and (11). The
    return y, r∗                                                    function B INARY S EARCH solves the optimization problem
                                                                    in Equation (9) to obtain r∗ using binary search.

                                                                    3.4. Training the Classifier with Subsampling
3.3. Computing the Certified Perturbation Size
                                                                        Our PointGuard is built upon a point cloud classifier f .
   Given an arbitrary point cloud classifier f and a 3D point       In particular, in the standard training process, f is trained on
cloud T , computing the certified perturbation size r∗ re-          the original point clouds. Our PointGuard uses f to predict
quires solving the optimization problem in Equation (9),            the labels for the subsampled point clouds. Since the sub-
which involves the label probability lower bound py and up-         sampled point clouds have a different distribution from the
per bound pe . We develop a Monte-Carlo algorithm to es-            original point clouds, f has a low accuracy on the subsam-
timate these label probability bounds, with which we solve          pled point clouds. As a result, PointGuard has suboptimal
the optimization problem efficiently via binary search.             robustness. To address the issue, we propose to train the
Estimating label probability lower and upper                        point cloud classifier f on subsampled point clouds instead
bounds: We first create N random subsampled point                   of the original point clouds. Specifically, given a batch of
clouds from T , each of which contains k points. For                point clouds from the training dataset, we first create a sub-
simplicity, we use M1 , M2 , · · · , MN to denote them.             sampled point cloud for each point cloud in the batch, and
Then, we use the point cloud classifier f to predict                then we use the batch of subsampled point clouds to update
a label for each subsampled point cloud Mj , where                  f . Our experimental results show that training with subsam-
j = 1, 2, · · · , N . We use Ni to denote the number of             pling significantly improves the robustness of PointGuard.
               PN clouds whose predicted labels are i,
subsampled point
i.e., Ni =        j=1 I(f (Mj ) = i), where I is an indi-           4. Experiments
cator function. We predict the label with the largest
Ni as the label of the original point cloud T , i.e.,               4.1. Experimental Setup
y = g(T ) = argmaxi∈{1,2,··· ,c} Ni . Moreover, based
                                                                    Datasets and models: We evaluate PointGuard on Mod-
on the definition of the label probability pi , we know Ni
                                                                    elNet40 [33] and ScanNet [5] datasets. In particular, the
follows a binomial distribution with parameters N and
                                                                    ModelNet40 dataset contains 12,311 3D CAD models, each
pi , i.e., Ni ∼ Binomial(N, pi ). Therefore, we can use
                                                                    of which is a point cloud comprising 2,048 three dimen-
SimuEM [10], which is based on Clopper-Pearson [3]
                                                                    sional points and belongs to one of the 40 common object
method, to estimate a lower or upper bound of pi using Ni
                                                                    categories. The dataset is splitted into 9,843 training point
and N . In particular, we have:
                                                                    clouds and 2,468 testing point clouds. ScanNet is an RGB-
                α                                                   D video dataset which contains 2.5M views from 1,513
     py = Beta( ; Ny , N − Ny + 1),                     (10)
                c                                                   scenes. Following Li et al. [17], we extract 6,263 training
                  α                                                 point clouds and 2,061 testing point clouds from ScanNet,
     pi = Beta(1 − ; Ni , N − Ni + 1), ∀i 6= y,         (11)
                   c                                                each of which belongs to one of the 16 object categories and
                                                                    has 2,048 six dimensional points. We adopt PointNet [23]
where 1 − α is the confidence level for simultaneously es-
                                                                    and DGCNN [30] as the point cloud classifiers for Model-
timating the c label probability bounds and Beta(τ ; µ, ν)
                                                                    Net40 and ScanNet, respectively. We use publicly available
is the τ th quantile of the Beta distribution with shape pa-
                                                                    implementations for both PointNet1 and DGCNN2 .
rameters µ and ν. Both maxi6=y pi and 1 − py are upper
bounds of pe . We use the smaller one as pe , i.e., we have           1 https://github.com/charlesq34/pointnet

pe = min(maxi6=y pi , 1 − py ), which gives a tighter pe .            2 https://github.com/WangYueFt/dgcnn

                                                                6
1.0                                                                            1.0                                                                             1.0                                                                            1.0
  Certified/Empirical Accuracy

                                                                                 Certified/Empirical Accuracy

                                                                                                                                                                 Certified/Empirical Accuracy

                                                                                                                                                                                                                                                Certified/Empirical Accuracy
                                                              Undefended                                                                    Undefended                                                                      Undefended                                                                     Undefended
                                 0.8                          PointGuard                                        0.8                         Cohen et al.                                        0.8                         PointGuard                                         0.8                         PointGuard
                                                                                                                                            PointGuard
                                 0.6                                                                            0.6                                                                             0.6                                                                            0.6

                                 0.4                                                                            0.4                                                                             0.4                                                                            0.4

                                 0.2                                                                            0.2                                                                             0.2                                                                            0.2

                                 0.00      20     40     60      80        100                                  0.00     20     40     60       80         100                                  0.00      20     40    60      80        100                                   0.00        200       400        600
                                        Number of Perturbed Points, r                                                  Number of Modified Points, r                                                      Number of Added Points, r                                                     Number of Deleted Points, r

                                   (a) Point perturbation attacks.                                                (b) Point modification attacks.                                                      (c) Point addition attacks.                                                    (d) Point deletion attacks.

Figure 3: Comparing different methods under different attacks on ModelNet40. The results on ScanNet are in Sup-
plemental Material.

Compared methods: We compare our PointGuard with                                                                                                                                                  accuracy under an emprical attack to evaluate the unde-
the following methods:                                                                                                                                                                            fended classifiers. In particular, the certified accuracy at r
    Undefended classifier. We call the standard point cloud                                                                                                                                       perturbed points is the fraction of the testing point clouds
classifier undefended classifier, e.g., the undefended classi-                                                                                                                                    whose labels are correctly predicted and whose certified
fiers are PointNet and DGCNN on the two datasets in our                                                                                                                                           perturbation sizes are no smaller than r. Formally, given a
experiments, respectively.                                                                                                                                                                        testing set of point clouds T = {(To , lo )}mo=1 , the certified
    Randomized smoothing (Cohen et al.) [4]. Random-                                                                                                                                              accuracy at r perturbed points is defined as follows:
ized smoothing adds isotropic Gaussian noise with mean 0                                                                                                                                                                       Pm
                                                                                                                                                                                                                                     o=1       I (lo = yo ) · I (ro∗ ≥ r)
and standard deviation σ to an image before using a classi-                                                                                                                                                       CAr =                                                   ,                                             (12)
fier to classify it. Randomized smoothing provably predicts                                                                                                                                                                                           m
the same label for the image when the `2 -norm of the adver-                                                                                                                                      where I is an indicator function, lo is the true label of the
sarial perturbation added to the image is less than a thresh-                                                                                                                                     testing point cloud To , and yo and ro∗ respectively are the
old, which is called certified radius. We can generalize ran-                                                                                                                                     predicted label and the certified perturbation size returned
domized smoothing to certify robustness against point mod-                                                                                                                                        by PointGuard or randomized smoothing for To .
ification attacks. In particular, we can add Gaussian noise                                                                                                                                           We use an empirical attack to calculate the empirical ac-
to each dimension of each point in a point cloud before us-                                                                                                                                       curacy of an undefended classifier. Specifically, we first use
ing a point cloud classifier to classify it, and randomized                                                                                                                                       the empirical attack to perturb each point cloud in a testing
smoothing provably predicts the same label for the point                                                                                                                                          set, and then we use the undefended classifier to classify the
cloud when the `2 -norm of the adversarial perturbation is                                                                                                                                        perturbed point clouds and compute the accuracy (called
less than the certified radius. Note that our certified per-                                                                                                                                      empirical accuracy). We adopt the empirical attacks pro-
turbation size is the number points that are perturbed by an                                                                                                                                      posed by Xiang et al. [34] for the point perturbation, mod-
attacker. Therefore, we transform the certified radius to cer-                                                                                                                                    ification, and addition attacks. We use the empirical attack
tified perturbation size as follows. Suppose the points in the                                                                                                                                    proposed by Wicker et al. [31] for the point deletion attacks.
point clouds lie in the space Θ, and the `2 -norm distance be-
                                                                                                                                                                                                      We note that the certified accuracy at r is a lower bound
tween two arbitrary points in the space Θ is no larger √ than λ,
                                                                                                                                                                                                  of the accuracy that PointGuard or randomized smooth-
i.e., maxθ1 ,θ2 kθ1 − θ2√k2 ≤ λ. For instance, λ = 2 3 for
                                                                                                                                                                                                  ing can achieve no matter how an attacker modifies, adds,
ModelNet40 and λ = 15 for ScanNet. Then, we can em-                                                                                                                                               and/or deletes at most r points for each point cloud in the
ploy the relationship between `0 -norm and `2 -norm to de-                                                                                                                                        testing set, while the empirical accuracy under an empirical
rive the certified perturbation size based on the certified ra-                                                                                                                                   attack is an upper bound of the accuracy that an undefended
dius returned by randomized smoothing. Specifically, given                                                                                                                                        classifier can achieve under attacks.
λ and the `2 -norm certified radius δ for a point cloud, the
                                                   2
certified perturbation size can be computed as b λδ 2 c.                                                                                                                                          Parameter setting: Our PointGuard has three parameters:
                                                                                                                                                                                                  k, 1 − α, and N . Unless otherwise mentioned, we adopt the
Evaluation metric: PointGuard and randomized smooth-                                                                                                                                              following default parameters: α = 0.0001, N = 10, 000,
ing provide certified robustness guarantees, while the unde-                                                                                                                                      and k = 16 for both ModelNet40 and ScanNet. By default,
fended classifiers provide empirical robustness. Therefore,                                                                                                                                       we train the point cloud classifiers with subsampling. We
we use certified accuracy, which has been widely used to                                                                                                                                          adopt σ = 0.5 for randomized smoothing so its certified
measure the certified robustness of a machine learning clas-                                                                                                                                      accuracy without attack is similar to that of PointGuard. By
sifier against adversarial perturbations, to evaluate Point-                                                                                                                                      default, we consider the point perturbation attacks, as they
Guard and randomized smoothing; and we use empirical                                                                                                                                              are the strongest among the four types of attacks.

                                                                                                                                                                 7
1.0                                                                 1.0                                                               1.0                                                               1.0
                                             with subsampling                                                           k=8
                       0.8                                                                 0.8                                                               0.8                                                               0.8
  Certified Accuracy

                                                                      Certified Accuracy

                                                                                                                                        Certified Accuracy

                                                                                                                                                                                                          Certified Accuracy
                                             w/o subsampling                                                            k = 16
                                                                                                                        k = 128
                       0.6                                                                 0.6                                                               0.6                                                               0.6

                                                                                                                                                                            α = 0.1
                       0.4                                                                 0.4                                                               0.4                                                               0.4
                                                                                                                                                                            α = 0.01                                                          N = 1, 000
                       0.2                                                                 0.2                                                               0.2            α = 0.001                                          0.2            N = 5, 000
                                                                                                                                                                            α = 0.0001                                                        N = 10, 000
                       0.00      20     40      60     80       100                        0.00        50      100     150        200                        0.00      20       40       60   80    100                        0.00      20        40       60   80   100
                              Number of Perturbed Points, r                                       Number of Perturbed Points, r                                     Number of Perturbed Points, r                                     Number of Perturbed Points, r

  (a) Training with vs w/o subsampling                                                                 (b) Impact of k                                                      (c) Impact of α                                               (d) Impact of N

Figure 4: (a) Training the point cloud classifier with vs. without subsampling. (b), (c), and (d) show the impact of k,
α, and N , respectively. The dataset is ModelNet40. The results on ScanNet are in Supplemental Material.

4.2. Experimental Results                                                                                                                                      Training with vs. without subsampling: Figure 4a (or
                                                                                                                                                               Figure 6a in Supplemental Material) shows the comparison
Comparing PointGuard with other methods under dif-                                                                                                             of the certified accuracy of PointGuard when the point cloud
ferent attacks: Figure 3 (or Figure 5 in Supplemental Ma-                                                                                                      classifier is trained with or without subsampling on Model-
terial) compares PointGuard with other methods under the                                                                                                       Net40 (or ScanNet). Our experimental results demonstrate
four types of attacks on ModelNet40 (or ScanNet), where an                                                                                                     that training with subsampling can substantially improve
undefended classifier is measured by its empirical accuracy,                                                                                                   the certified accuracy of PointGuard. The reason is that the
while PointGuard and randomized smoothing are measured                                                                                                         point cloud classifier trained with subsampling can more ac-
by certified accuracy. We observe that the certified accuracy                                                                                                  curately classify the subsampled point clouds.
of PointGuard is slightly lower than the empirical accuracy
of an undefended classifier when there are no attacks, i.e.,                                                                                                   Impact of k, α, and N : Figure 4b, 4c, and 4d (or Fig-
r = 0. However, for point perturbation, modification, and                                                                                                      ure 6b, 6c, and 6d in Supplemental Material) show the im-
addition attacks, the empirical accuracy of an undefended                                                                                                      pact of k, α, and N on the certified accuracy of our Point-
classifier quickly drops to 0 while the certified accuracy of                                                                                                  Guard on ModelNet40 (or ScanNet), respectively. Based
PointGuard is still high as r increases. For point deletion                                                                                                    on the experimental results, we make the following observa-
attacks, the empirical accuracy of an undefended classifier                                                                                                    tions. First, k measures a tradeoff between accuracy without
may be higher than the certified accuracy of PointGuard.                                                                                                       attacks (i.e., r = 0) and robustness. In particular, a smaller
This indicates that the existing empirical point deletion at-                                                                                                  k gives a smaller certified accuracy without attacks, but the
tacks are not strong enough.                                                                                                                                   certified accuracy drops to 0 more slowly as the number of
    Our PointGuard substantially outperforms randomized                                                                                                        perturbed points r increases. The reason is that the per-
smoothing for point modification attacks in terms of cer-                                                                                                      turbed points are less likely to be subsampled when k is
tified accuracy. Randomized smoothing adds additive noise                                                                                                      smaller. Second, the certified accuracy increases as α or N
to a point cloud, while our PointGuard subsamples a point                                                                                                      increases. The reason is that a larger α or N leads to tighter
cloud. Our experimental results show that subsampling out-                                                                                                     lower and upper label probability bounds, which in turn lead
performs additive noise to build provably robust point cloud                                                                                                   to larger certified perturbation sizes. We also note that the
classification systems. We also observe that the empirical                                                                                                     certified accuracy is insensitive to α and N once they are
accuracy of the undefended classifier is close to the certi-                                                                                                   large enough.
fied accuracy of randomized smoothing, indicating that the
empirical point modification attacks are strong.                                                                                                               5. Conclusion
    We also compare PointGuard with an empirical defense
(i.e., DUP-Net [40]) to measure the gaps between certified                                                                                                         In this work, we propose PointGuard, the first provably
accuracy and empirical accuracy. According to [40], on                                                                                                         robust 3D point cloud classification system against various
ModelNet40, the empirical accuracy of DUP-Net under a                                                                                                          adversarial attacks. We show that PointGuard provably pre-
point deletion attack [37] is 76.1%, 67.7%, and 57.7% when                                                                                                     dicts the same label for a testing 3D point cloud when the
the attacker deletes 50, 100, and 150 points, respectively.                                                                                                    number of adversarially perturbed points is bounded. More-
Under the same setting, PointGuard achieves certified accu-                                                                                                    over, we prove our bound is tight. We empirically demon-
racy of 73.4%, 64.3%, and 53.5%, respectively. We observe                                                                                                      strate the effectiveness of PointGuard on ModelNet40 and
the gaps between the empirical accuracy (an upper bound                                                                                                        ScanNet benchmark datasets. An interesting future work is
of accuracy) of DUP-Net and certified accuracy (a lower                                                                                                        to further improve PointGuard by leveraging the knowledge
bound of accuracy) of PointGuard are small.                                                                                                                    of the point cloud classifier.

                                                                                                                                        8
References                                                                          domly smoothed classifiers. In Advances in Neural Informa-
                                                                                    tion Processing Systems, pages 4910–4921, 2019. 3
 [1] Xiaoyu Cao and Neil Zhenqiang Gong. Mitigating evasion
                                                                             [15]   Alexander Levine and Soheil Feizi. Robustness certificates
     attacks to deep neural networks via region-based classifica-
                                                                                    for sparse adversarial attacks by randomized ablation. In
     tion. In Proceedings of the 33rd Annual Computer Security
                                                                                    AAAI, pages 4585–4593, 2020. 3
     Applications Conference, pages 278–287, 2017. 3
                                                                             [16]   Bai Li, Changyou Chen, Wenlin Wang, and Lawrence Carin.
 [2] Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, and Tian Xia.
                                                                                    Certified adversarial robustness with additive noise. In Ad-
     Multi-view 3d object detection network for autonomous
                                                                                    vances in Neural Information Processing Systems, pages
     driving. In Proceedings of the IEEE Conference on Com-
                                                                                    9464–9474, 2019. 3
     puter Vision and Pattern Recognition, pages 1907–1915,
     2017. 1                                                                 [17]   Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di,
                                                                                    and Baoquan Chen. Pointcnn: Convolution on x-transformed
 [3] Charles J Clopper and Egon S Pearson. The use of confi-
                                                                                    points. In Advances in neural information processing sys-
     dence or fiducial limits illustrated in the case of the binomial.
                                                                                    tems, pages 820–830, 2018. 1, 3, 6
     Biometrika, 26(4):404–413, 1934. 6
 [4] Jeremy M Cohen, Elan Rosenfeld, and J Zico Kolter. Certi-               [18]   Daniel Liu, Ronald Yu, and Hao Su. Extending adversarial
     fied adversarial robustness via randomized smoothing. arXiv                    attacks and defenses to deep 3d point cloud classifiers. In
     preprint arXiv:1902.02918, 2019. 2, 3, 7, 13                                   2019 IEEE International Conference on Image Processing
                                                                                    (ICIP), pages 2279–2283. IEEE, 2019. 1, 3
 [5] Angela Dai, Angel X Chang, Manolis Savva, Maciej Hal-
     ber, Thomas Funkhouser, and Matthias Nießner. Scannet:                  [19]   Xuanqing Liu, Minhao Cheng, Huan Zhang, and Cho-Jui
     Richly-annotated 3d reconstructions of indoor scenes. In                       Hsieh. Towards robust neural networks via random self-
     Proceedings of the IEEE Conference on Computer Vision                          ensemble. In Proceedings of the European Conference on
     and Pattern Recognition, pages 5828–5839, 2017. 6                              Computer Vision (ECCV), pages 369–385, 2018. 3
 [6] Xiaoyi Dong, Dongdong Chen, Hang Zhou, Gang Hua,                        [20]   Chengcheng Ma, Weiliang Meng, Baoyuan Wu, Shibiao Xu,
     Weiming Zhang, and Nenghai Yu. Self-robust 3d point                            and Xiaopeng Zhang. Efficient joint gradient based attack
     recognition via gather-vector guidance. In 2020 IEEE/CVF                       against sor defense for 3d point cloud classification. In Pro-
     Conference on Computer Vision and Pattern Recognition                          ceedings of the 28th ACM International Conference on Mul-
     (CVPR), pages 11513–11521. IEEE, 2020. 1, 3                                    timedia, pages 1819–1827, 2020. 1, 2, 3
 [7] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy.               [21]   Jerzy Neyman and Egon Sharpe Pearson. Ix. on the problem
     Explaining and harnessing adversarial examples. In ICLR,                       of the most efficient tests of statistical hypotheses. Philo-
     2015. 3                                                                        sophical Transactions of the Royal Society of London. Series
 [8] Abdullah Hamdi, Sara Rojas, Ali Thabet, and Bernard                            A, Containing Papers of a Mathematical or Physical Char-
     Ghanem. Advpc: Transferable adversarial perturbations on                       acter, 231(694-706):289–337, 1933. 2, 4, 5, 11
     3d point clouds. In European Conference on Computer Vi-                 [22]   Rafael Pinot, Laurent Meunier, Alexandre Araujo, Hisashi
     sion, pages 241–257. Springer, 2020. 3                                         Kashima, Florian Yger, Cédric Gouy-Pailler, and Jamal Atif.
 [9] Jinyuan Jia, Xiaoyu Cao, and Neil Zhenqiang Gong. Intrin-                      Theoretical evidence for adversarial robustness through ran-
     sic certified robustness of bagging against data poisoning at-                 domization. In Advances in Neural Information Processing
     tacks. In AAAI, 2020. 3, 4, 11, 13                                             Systems, pages 11838–11848, 2019. 3
[10] Jinyuan Jia, Xiaoyu Cao, Binghui Wang, and Neil Zhenqiang               [23]   Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas.
     Gong. Certified robustness for top-k predictions against ad-                   Pointnet: Deep learning on point sets for 3d classification
     versarial perturbations via randomized smoothing. In Inter-                    and segmentation. In Proceedings of the IEEE conference
     national Conference on Learning Representations, 2019. 4,                      on computer vision and pattern recognition, pages 652–660,
     6, 11, 13                                                                      2017. 1, 2, 3, 6
[11] Jinyuan Jia, Binghui Wang, Xiaoyu Cao, and Neil Zhenqiang               [24]   Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J
     Gong. Certified robustness of community detection against                      Guibas. Pointnet++: Deep hierarchical feature learning on
     adversarial structural perturbation via randomized smooth-                     point sets in a metric space. In Advances in neural informa-
     ing. In Proceedings of The Web Conference 2020, pages                          tion processing systems, pages 5099–5108, 2017. 1, 3
     2718–2724, 2020. 3                                                      [25]   Hadi Salman, Jerry Li, Ilya Razenshteyn, Pengchuan Zhang,
[12] Jaeyeon Kim, Binh-Son Hua, Duc Thanh Nguyen, and Sai-                          Huan Zhang, Sebastien Bubeck, and Greg Yang. Provably
     Kit Yeung. Minimal adversarial examples for deep learning                      robust deep learning via adversarially trained smoothed clas-
     on 3d point clouds. arXiv preprint arXiv:2008.12066, 2020.                     sifiers. In Advances in Neural Information Processing Sys-
     3                                                                              tems, pages 11292–11303, 2019. 3
[13] Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu,                   [26]   Jiachen Sun, Karl Koenig, Yulong Cao, Qi Alfred Chen, and
     Daniel Hsu, and Suman Jana. Certified robustness to ad-                        Zhuoqing Mao. On the adversarial robustness of 3d point
     versarial examples with differential privacy. In 2019 IEEE                     cloud classification. 2021. 3
     Symposium on Security and Privacy (SP), pages 656–672.                  [27]   Tzungyu Tsai, Kaichen Yang, Tsung-Yi Ho, and Yier Jin.
     IEEE, 2019. 3                                                                  Robust adversarial objects against deep learning models. In
[14] Guang-He Lee, Yang Yuan, Shiyu Chang, and Tommi                                Proceedings of the AAAI Conference on Artificial Intelli-
     Jaakkola. Tight certificates of adversarial robustness for ran-                gence, volume 34, pages 954–962, 2020. 3

                                                                         9
[28] Jacob Varley, Chad DeChant, Adam Richardson, Joaquı́n                 Proceedings of the IEEE International Conference on Com-
     Ruales, and Peter Allen. Shape completion enabled robotic             puter Vision, pages 1961–1970, 2019. 1, 2, 3, 8
     grasping. In 2017 IEEE/RSJ international conference on
     intelligent robots and systems (IROS), pages 2442–2447.
     IEEE, 2017. 1
[29] Binghui Wang, Xiaoyu Cao, Jinyuan Jia, Neil Zhenqiang
     Gong, et al. On certifying robustness against backdoor at-
     tacks via randomized smoothing. In CVPR 2020 Workshop
     on Adversarial Machine Learning in Computer Vision, 2020.
     3
[30] Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E Sarma,
     Michael M Bronstein, and Justin M Solomon. Dynamic
     graph cnn for learning on point clouds. Acm Transactions
     On Graphics (tog), 38(5):1–12, 2019. 1, 2, 3, 6
[31] Matthew Wicker and Marta Kwiatkowska. Robustness of 3d
     deep learning in an adversarial setting. In Proceedings of the
     IEEE Conference on Computer Vision and Pattern Recogni-
     tion, pages 11767–11775, 2019. 1, 3, 7
[32] Ziyi Wu, Yueqi Duan, He Wang, Qingnan Fan, and Leonidas
     Guibas. IF-Defense: 3d adversarial point cloud defense via
     implicit function based restoration. 2021. 3
[33] Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Lin-
     guang Zhang, Xiaoou Tang, and Jianxiong Xiao. 3d
     shapenets: A deep representation for volumetric shapes. In
     Proceedings of the IEEE conference on computer vision and
     pattern recognition, pages 1912–1920, 2015. 6
[34] Chong Xiang, Charles R Qi, and Bo Li. Generating 3d adver-
     sarial point clouds. In Proceedings of the IEEE Conference
     on Computer Vision and Pattern Recognition, pages 9136–
     9144, 2019. 1, 3, 7
[35] Jiancheng Yang, Qiang Zhang, Rongyao Fang, Bingbing Ni,
     Jinxian Liu, and Qi Tian. Adversarial attack and defense on
     point sets. arXiv preprint arXiv:1902.10899, 2019. 1, 3
[36] Xiangyu Yue, Bichen Wu, Sanjit A Seshia, Kurt Keutzer,
     and Alberto L Sangiovanni-Vincentelli. A lidar point cloud
     generator: from a virtual world to autonomous driving. In
     Proceedings of the 2018 ACM on International Conference
     on Multimedia Retrieval, pages 458–464, 2018. 1
[37] Yue Zhao, Yuwei Wu, Caihua Chen, and Andrew Lim. On
     isometry robustness of deep 3d point cloud models under
     adversarial attacks. In Proceedings of the IEEE/CVF Con-
     ference on Computer Vision and Pattern Recognition, pages
     1201–1210, 2020. 3, 8
[38] Tianhang Zheng, Changyou Chen, Junsong Yuan, Bo Li, and
     Kui Ren. Pointcloud saliency maps. In Proceedings of the
     IEEE International Conference on Computer Vision, pages
     1598–1606, 2019. 1, 3
[39] Hang Zhou, Dongdong Chen, Jing Liao, Kejiang Chen, Xi-
     aoyi Dong, Kunlin Liu, Weiming Zhang, Gang Hua, and
     Nenghai Yu. Lg-gan: Label guided adversarial network for
     flexible targeted attack of point cloud based deep networks.
     In Proceedings of the IEEE/CVF Conference on Computer
     Vision and Pattern Recognition, pages 10356–10365, 2020.
     1, 3
[40] Hang Zhou, Kejiang Chen, Weiming Zhang, Han Fang,
     Wenbo Zhou, and Nenghai Yu. Dup-net: Denoiser and up-
     sampler network for 3d adversarial point clouds defense. In

                                                                      10
A. Proof of Theorem 1                                                 three regions which are as follows:

   Given a point cloud T and its perturbed version T ∗ , we                           ∆T = {ϕ ∈ Φ|ϕ ⊆ T, ϕ 6⊆ E},                           (14)
                                                                                                                          ∗
define the following two random variables:                                            ∆T ∗ = {ϕ ∈ Φ|ϕ ⊆ T , ϕ 6⊆ E},                        (15)
                                                                                      ∆E = {ϕ ∈ Φ|ϕ ⊆ E},                                   (16)
                W = Sk (T ), Z = Sk (T ∗ ),              (13)
                                                                      where ∆E consists of the subsampled point clouds that can
where W and Z represent the random 3D point clouds with               be obtained by subsampling k points from E; and ∆T (or
k points subsampled from T and T ∗ uniformly at random                ∆T ∗ ) consists of the subsampled point clouds that are sub-
without replacement, respectively. We use Φ to denote the             sampled from T (or T ∗ ) but do not belong to ∆E . Since W
joint space of W and Z, where each element is a 3D point              and Z, respectively, represent the random 3D point clouds
cloud with k points subsampled from T or T ∗ . We denote              with k points subsampled from T and T ∗ uniformly at ran-
by E the set of intersection points between T and T ∗ , i.e.,         dom without replacement, we have the following probabil-
E = T ∩ T ∗.                                                          ity mass functions:
    Before proving our theorem, we first describe a variant
                                                                                             ( 1
                                                                                                 n ,   if ϕ ∈ ∆T ∪ ∆E ,
of the Neyman-Pearson Lemma [21] that will be used in our                    Pr(W = ϕ) = (k )                                 (17)
proof. The variant is from [9].                                                                0,      otherwise,
                                                                                            ( 1
                                                                                               t ,   if ϕ ∈ ∆T ∗ ∪ ∆E ,
Lemma 1 (Neyman-Pearson Lemma). Suppose W and Z                              Pr(Z = ϕ) = (k)                                  (18)
are two random variables in the space Φ with probability                                      0,     otherwise,
distributions σw and σz , respectively. Let H : Φ →
                                                  − {0, 1}            where t is the number of points in T ∗ (i.e., t = |T ∗ |). We
be a random or deterministic function. Then, we have the              use s to denote the number of intersection points between
following:                                                                     ∗                         ∗
                                                                      T and
                                                                          s
                                                                             T , i.e., s = |E|
                                                                                             s
                                                                                                = |T ∩ T |. Then, the size of ∆E
                                                                      is k , i.e., |∆E | = k . Given the size of ∆E , we have the
   • If Q1 = {ϕ ∈ Φ : σw (ϕ) > ζ ·σz (ϕ)} and Q2 = {ϕ ∈               following probabilities:
     Φ : σw (ϕ) = ζ · σz (ϕ)} for some ζ > 0. Let Q =                                                                    s
                                                                                                                          
     Q1 ∪ Q3 , where Q3 ⊆ Q2 . If we have Pr(H(W) =                                        Pr(W ∈ ∆E ) =                 k
                                                                                                                         n ,
                                                                                                                                           (19)
     1) ≥ Pr(W ∈ Q), then Pr(H(Z) = 1) ≥ Pr(Z ∈ Q).                                                                      k
                                                                                                                               s
                                                                                                                                
                                                                                                                               k
                                                                                           Pr(W ∈ ∆T ) = 1 −                   n ,
                                                                                                                                           (20)
   • If Q1 = {ϕ ∈ Φ : σw (ϕ) < ζ ·σz (ϕ)} and Q2 = {ϕ ∈                                                                        k
     Φ : σw (ϕ) = ζ · σz (ϕ)} for some ζ > 0. Let Q =                                      Pr(W ∈ ∆T ∗ ) = 0.                               (21)
     Q1 ∪ Q3 , where Q3 ⊆ Q2 . If we have Pr(H(W) =                                                       s
                                                                                                            
                                                                                                                        k
     1) ≤ Pr(W ∈ Q), then Pr(H(Z) = 1) ≤ Pr(Z ∈ Q).                                        Pr(Z ∈ ∆E ) =                t ,
                                                                                                                                           (22)
                                                                                                                        k
                                                                                                                               s
                                                                                                                                 
Proof. Please refer to [9].                                                                Pr(Z ∈ ∆T ∗ ) = 1 −                 k
                                                                                                                               t ,          (23)
                                                                                                                                 
                                                                                                                               k

    Next, we formally prove our Theorem 1. Our proof                                       Pr(Z ∈ ∆T ) = 0.                                 (24)
is inspired by previous work [10, 9]. Roughly speaking,                                                             s
                                                                                                                    ()
the idea is to derive the label probability lower and upper           We have Pr(W ∈ ∆E ) =                         k
                                                                                                                    n   because Pr(W ∈ ∆E ) =
                                                                                                                    ( )
                                                                                                                    k
bounds via computing the probability of random variables                |∆E |      (ks)
                                                                                  =     . Since Pr(W ∈ ∆T ) + Pr(W ∈ ∆E ) = 1,
in certain regions crafted by the variant of Neyman-Pearson           |∆T ∪∆E |    (nk)
Lemma. However, due to the difference in sampling meth-                                                    (s)
                                                                      we have Pr(W ∈ ∆T ) = 1 − nk . We have Pr(W ∈
ods, our space divisions are significantly different from pre-                                             (k )
vious work [9]. Recall that we denote pi = Pr(f (W) = i)              ∆T ∗ ) = 0 because W is subsampled from T , which does
and p∗i = Pr(f (Z) = i), where i ∈ {1, 2, · · · , c}. We              not contain any points from T ∗ \ E. Similarly, we can com-
denote y = argmaxi={1,2,...,c} pi . Our goal is to find the           pute the probabilities of random variable Z in those regions.
maximum r∗ such that y = argmaxi={1,2,··· ,c} p∗i , i.e.,                Based on the fact that py and pi (i 6= y) should be integer
p∗y > p∗e = maxi6=y p∗i , for ∀T ∗ ∈ Γ(T, r∗ ). Our key               multiples of 1/ nk , we derive the following bounds:
step is to derive a lower bound of p∗y and an upper bound                             dpy · nk e
                                                                                              
                                                                                0
of p∗e = maxi6=y p∗i via Lemma 1. Given these probability                      py ,        n
                                                                                                 ≤ Pr(f (W) = y),               (25)
bounds, we can find the maximum r∗ such that the lower                                          k
                                                                                                    n
                                                                                                        
bound of p∗y is larger than the upper bound of p∗e .                                    bpi ·               c
                                                                              p0i   ,       n
                                                                                              k                ≥ Pr(f (W) = i), ∀i 6= y.   (26)
Dividing the space Φ: We first divide the space Φ into                                      k

                                                                 11
Deriving a lower bound of p∗y : We define a binary func-             We have Equation (37) from (36) because Pr(Z ∈ ∆T ) = 0,
tion Hy (ϕ) = I(f (ϕ) = y), where ϕ ∈ Φ and I is an in-                 Equation (38) from (37) as Pr(W = ϕ) =  · Pr(Z = ϕ) for
dicator function. Then, we have the following based on the              ϕ ∈ ∆y , and the last Equation from Equation (28) - (30).
definitions of the random variable Z and the function Hy :                 Deriving an upper bound of maxi6=y p∗i : We lever-
                                                                        age the second part of Lemma 1 to derive an upper bound
               p∗y = Pr(f (Z) = y) = Pr(Hy (Z) = 1).       (27)         of maxi6=y p∗i . We assume Pr(W ∈ ∆E ) > p0i , ∀i ∈
                                                                        {1, 2, · · · , c} \ {y}. We can make the assumption be-
   Our idea is to find a region such that we can apply                  cause we aim to derive a sufficient condition. For ∀i ∈
Lemma 1 to  derive a lower
                             bound of Pr(Hy (Z) = 1). We               {1, 2, · · · , c} \ {y}, we can find a region ∆i ⊆ ∆E such
                     s
                   (   )                                                that we have the following:
assume p0y − 1 − nk        ≥ 0. We can make this assumption
                   (k )
because we only need to find a sufficient condition. Then,                                   Pr(W ∈ ∆i ) = p0i .                    (40)
we can find a region ∆y ⊆ ∆E satisfying the following:                                                     p0i
                                                                        We can find the region because is an integer multiple of
                                                                          1
                                                                             . Given region ∆i , we define the following region:
                           Pr(W ∈ ∆y )                     (28)         (nk)
                         =p0y   − Pr(W ∈ ∆T )              (29)                                Bi = ∆i ∪ ∆T ∗ .                     (41)
                                                                        For ∀i ∈ {1, 2, · · · , c} \ {y}, we define a function Hi (ϕ) =
                                        !
                                         s
                         =p0y −     1−   k
                                         n    .            (30)         I(f (ϕ) = i), where ϕ ∈ Φ. Then, based on Equation (26)
                                         k                              and the definition of random variable W, we have:
We can find the region ∆y because p0y is an integer multi-               Pr(Hi (W) = 1) = Pr(f (W) = i) ≤ p0i = Pr(W ∈ Bi ).
ple of     1
                 . Given the region ∆y , we define the following                                                        (42)
          (nk)
region:                                                                 We note that Pr(W = ϕ) <  · Pr(Z = ϕ) if and only if
                                                                        ϕ ∈ ∆T ∗ and Pr(W = ϕ) =  · Pr(Z = ϕ) if ϕ ∈ ∆i ,
                           A = ∆T ∪ ∆y .                   (31)                    (t)
                                                                        where  = nk . Based on the definition of random variable
                                                                                   (k )
Then, based on Equation (25), we have:                                  Z, Equation (42), and Lemma 1, we have the following:
                                                                                       Pr(Hi (Z) = 1) ≤ Pr(Z ∈ Bi ).                (43)
                 Pr(f (W) = y) ≥ p0y = Pr(W ∈ A).          (32)
                                                                        Since we have p∗i = Pr(f (Z) = i) = Pr(Hi (Z) = 1),
We can establish the following based on the definition of               Pr(Z ∈ Bi ) is an upper bound of p∗i and can be computed
W:                                                                      as follows:
                                                                                         Pr(Z ∈ Bi )                                (44)
 Pr(Hy (W) = 1) = Pr(f (W) = y) ≥ Pr(W ∈ A). (33)
                                                                                       =Pr(Z ∈ ∆i ) + Pr(Z ∈ ∆        T∗     )      (45)
Furthermore, we have Pr(W = ϕ) >  · Pr(Z = ϕ) if and                                                      s
                                                                                                             
                                                                                                                 k
only if ϕ ∈ ∆T and Pr(W = ϕ) = ·Pr(Z = ϕ) if ϕ ∈ ∆y ,                                 =Pr(Z ∈ ∆i ) + 1 −        t                  (46)
            (t)                                                                                                  k
where  = nk . Therefore, based on the definition of A in                                                             s
                                                                                                                         
            (k )                                                                                                      k
Equation (31) and the condition in Equation (33), we obtain                            =Pr(W ∈ ∆i )/ + 1 −           t             (47)
                                                                                                                      k
the following by applying Lemma 1:                                                                       s
                                                                                                           
                                                                                        1
                                                                                       = · p0i + 1 −     k
                                                                                                         t .                        (48)
                    Pr(Hy (Z) = 1) ≥ Pr(Z ∈ A).            (34)                                         k

Since we have p∗y = Pr(Hy (Z) = 1), Pr(Z ∈ A) is a lower                By considering all possible i in the set {1, 2, · · · , c} \ {y},
bound of p∗y and can be computed as follows:                            we have:
                                                                                             max p∗i                                (49)
                      Pr(Z ∈ A)                            (35)                               i6=y

                     =Pr(Z ∈ ∆T ) + Pr(Z ∈ ∆y )            (36)                            ≤ max Pr(Z ∈ Bi )                        (50)
                                                                                              i6=y
                     =Pr(Z ∈ ∆y )                          (37)                                                    s
                                                                                                                     
                                                                                            1
                     =Pr(W ∈ ∆y )/                        (38)                            = · max p0i + 1 −      k
                                                                                                                  t                 (51)
                                                                                             i6=y                k
                                       s
                                          !!
                                                                                                          s
                                                                                                            
                      1                                                                     1
                     = ·    p0y −      k
                                    1− n      .            (39)                            ≤ · p0e + 1 − kt  ,                     (52)
                                         k                                                              k

                                                                   12
where p0e ≥ max p0i .                                                                                                                           (kt )
                                                                                              where t is the number of points in T ∗ and  =          . Let
                   i6=y                                                                                                                         (nk)
  Deriving the certified perturbation size: To reach our                                      ∆e ⊆ ∆E be the region that satisfies the following:
goal Pr(f (Z) = y) > max Pr(f (Z) = i), it is sufficient to
                                      i6=y
have the following:                                                                                     ∆e ∩ ∆y = ∅ and Pr(W ∈ ∆e ) = p0e .               (60)

                         s
                            !!                                            s
                                                                                             Note that we can find the region ∆e because we have p0y +
    1      0                        1
       · py − 1 − n     k
                                > · p0e + 1 −                              k
                                                                           t      (53)        p0e ≤ 1. We let Be = ∆T ∗ ∪ ∆e . Then, we can divide the
                                   
                        k                                                  k                  the region Φ \ (A ∩ Be ) into c − 2 regions and we use Bi
      t          s
                  
                                                                                              to denote each region, where i ∈ {1, 2, · · · , c} \ {y, e}. In
  ⇐⇒ nk  − 2 · nk  + 1 − p0y + p0e < 0.                                         (54)                                            Pr(T ∈ Bi ) ≤ p0i . We
                                                                                              particular, each region Bi satisfiesP
               k          k
                                                                                              can find these regions since py + i6=y p0i ≥ 1. Then, we
                                                                                                                             0

Since Equation (54) should be satisfied for all possible per-                                 can construct the following point cloud classifier f ∗ :
turbed point cloud T ∗ (i.e., n − r ≤ t ≤ n + r), we have                                                               (
the following sufficient condition:                                                                             ∗         y, if ϕ ∈ A,
                                                                                                               f (ϕ) =                                   (61)
                  t
                            s
                                                                                                                         i, if ϕ ∈ Bi .
                  k         k        0     0
        max       n − 2 · n + 1 − py + pe < 0.           (55)
    n−r≤t≤n+r
                          k                  k                                                Note that the point cloud classifier f ∗ is well-defined in the
                                                                                              space Φ. We have the following probabilities for our con-
When the above Equation (55) is satisfied, we have p0y −                                      structed point cloud classifier f ∗ :
            
        (s)                                   (s)
  1 − nk         ≥ 0 and Pr(W ∈ ∆E ) = nk ≥ p0i , ∀i ∈                                                  Pr(f ∗ (W) = y) = Pr(W ∈ A) = p0y ,               (62)
        (k )                                  (k )
{1, 2, · · · , c} \ {y}, which are the conditions that we rely                                          Pr(f ∗ (W) = e) = Pr(W ∈ Be ) = p0e ,             (63)
on to construct the region ∆y and ∆i (i 6= y). The certi-                                                      ∗
fied perturbation size r∗ is the maximum r that satisfies the                                           Pr(f (W) = i) = Pr(W ∈ Bi ) ≤           p0i ,     (64)
above sufficient condition. Note that s = max(n, t) − r.
                                                                                              where i ∈ {1, 2, . . . , c} \ {y, e}. Note that the point cloud
Then, our certified perturbation size r∗ can be derived by
                                                                                              classifier f ∗ is consistent with Equation (4). Moreover, we
solving the following optimization problem:
                                                                                              have the following:
r∗ = argmax r                                                                                                  Pr(f ∗ (Z) = e)                            (65)
               r
                              t                  max(n,t)−r                                                = Pr(Z ∈ Be )                                  (66)
                                                           
  s.t.         max            k
                              n −2·
                                                     k
                                                     n        + 1 − p0y + p0e < 0.                                         max(n,t)−r
                                                                                                                                      
         n−r≤t≤n+r
                              k                            k                                                   p0e               k
                                                                                                           =       +1−           t                        (67)
                                                                                  (56)                                          k
                                                                                                                                   max(n,t)−r
                                                                                                                                               !!
B. Proof of Theorem 2                                                                                          1
                                                                                                           ≥       p0y −   1−          k
                                                                                                                                       n                  (68)
                                                                                                                                      k
    Similar to previous work [4, 10, 9], we show the tight-
ness of our bounds via constructing a counterexample. In                                                   = Pr(Z ∈ A)                                    (69)
particular, when r > r∗ , we will show that we can construct                                                        ∗
                                                                                                           = Pr(f (Z) = y),                               (70)
a point cloud T ∗ and a point cloud classifier f ∗ which sat-
isfies the Equation (4) such that the label y is not predicted                                              (kt )
by our PointGuard or there exist ties. Since r∗ is the max-                                   where  =
                                                                                                            (nk)
                                                                                                                  . Note that we derive Equation (68) from
imum value that satisfies the Equation (56), there exists a                                   (67) based on Equation (59). Therefore, for ∀r > r∗ , there
point cloud T ∗ satisfying the following when r > r∗ :                                        exist a point cloud classifier f ∗ which satisfies Equation (4)
         t
                    max(n,t)−r
                                                                                             and a point cloud T ∗ such that g(T ∗ ) 6= y or there exist ties.
         k
         n
              −2·       k
                         n
                                          + 1 − p0y + p0e ≥ 0                    (57)
         k
         t
           
                         k
                   max(n,t)−r
                                                                   max(n,t)−r
                                                                                 !           C. Acknowledgements
⇐⇒       k
         n
              −       k
                       n
                                     +    p0e   ≥   p0y   −   1−       k
                                                                        n
                                                                                                We thank the anonymous reviewers for constructive re-
         k             k                                                k
                                                                                              views and comments. This work was partially supported by
                                                                                  (58)
                                                                                    !!       NSF grant No.1937786.
                      max(n,t)−r                                        max(n,t)−r
                                       
     p0e                                         1
⇐⇒         +1−            k
                          t
                                            ≥     ·       p0y −   1−       k
                                                                            n
                                                                              
                         k
                                                                           k
                                                                                  (59)

                                                                                         13
You can also read