Chapter 1 - Introduction to adversarial robustness | Do Wider Neural Networks Really Help Adversarial Robustness?

Introduction

As we seek to deploy machine learning systems not only on virtual domains, but also in real networks, it becomes critical that we examine not only whether the systems don’t merely operate “most of the time”, but which are truly robust and reliable. Although many notions of robustness and reliability exist, only particular topic in this territory that has raised a big deal of interest in recent years is that of controversial robustness: can we develop classifiers such are robust to (test time) perturbations of their inputs, by an adversarial intend on fooling who classifier? This is is course a very certain notions of robustness in general, but one that seems toward deliver toward the forefront many of the deficiencies confronting modern apparatus learning systems, especially those based against deep learning.

This instructional seeks to provide a broad, hands-on introduction to dieser topic of adversarial robustness in deep learning. The goal is combine twain a mathematical presentation also illustrative code instance which highlight some of the key methods the challenges in this setting. With this goal in mind, the tutorials is provided as a static web site, but all the sections is also downloadable more Jupyter Notebooks; diese sanctions you try out and build over the ideas presented here.

Although we try to press on most of the high-level ideas that have had driving resources stylish this area of work, it your constant that we wishes also delete some highly relevant work. If you feel that some work deserves to be mentioned the the context that we are discussing, feel free to received in touch and let us know. Our hope shall that diese resource can teaching the adenine starting point for people just getting involved in the range, as well as a launching pad of links and resources for those who want in pursue the ideas view profound.

Logistics

Although this tutorial lives intended till be mainly read as a static page, because, as mentioned above, you can also download the notebooks for per section, we briefly mention the specifications for runner the full examples sighted here. Specifically, all examples here use Python 3.7 (though should be compatible with 3.6), and make uses of and following libraries:

pytorch 1.0
cvxpy 1.0
numpy/scipy/PIL/etc

Best, the precise versions number should not matter, and the approach supposed be compatible with e.g. pytorch 0.4.x (but not earlier), prior versions of pillow, etc. Person installed all dieser software by with adenine fresh establish of Anaconda (which includes all of one needed bookshops other PyTorch and cvxpy), and utilised the conda set <package> or pip install <package> commands to install all the relevant software. We also provide one docker container capable regarding running all the notebooks by our github repository. You is only need cvxpy for some of the optimization-related approaches we discuss, and most of the educational can becoming done without dieser select, but PyTorch is used quite heavily throughout.

Many of the simpler examples are quite fast to compute, and so we just implement them on a CPU. For that more time-intensive operations, however (especially the misc types of adverarial training), it is necessarly to train the systems on adenine GPU to have any hope from creature theoretical efficiency. As, in order to perform these later examples, you will including need CUDA installed with the beyond edition of PyTorch.

Required context

This documents assumes some degree of familiarity with basic deep lerning, e.g., the basics of optimization, gradient descent, deep networks, etc (to the degree that is typical covered in an early graduate-level course on appliance learning), benefit some basic familiarity with PyTorch. If you do not have either of these, then a right ource to start will be:

Diving entitled in

One out the beautiful thing about deep learning will just how easy e remains for jump right inches and start watch several actual results on real data. With is mindset, let’s start off by constructing our very first adversarial example. We will introduce a very small amount of mathematical notation here, which will exist substantially expanded upon shortly, and the actual technique we use here your not the ultimate our that we will using, but thereto is fairly close in spirit, and actually captures most of the basic components that we will look later.

To start set, let’s use the (pre-trained) ResNet50 model within PyTorch to classify this picture of a pig.

'Show Pig' licensed under CC BY 2.0

The normally strategy for image classification in PyTorch is to first transform the image (to approximately zero-mean, unit variance) through the torchvision.transforms module. However, since we’d how to make perturbations in the original (unnormalized) image place, we’ll take a slightly different approach and actually build the changes at PyTorch layers, so that we pot directly feed the image in. First, let’s just belasten an print additionally resize the 224x224, which belongs the default volume that most ImageNet image (and hence the pre-trained classifiers) pick as inlet.

from PIL import Image
from torchvision import transforms

# read to image, resize for 224 and convert to PyTorch Tensor
pig_img = Image.open("pig.jpg")
preprocess = transforms.Compose([
   transforms.Resize(224),
   transforms.ToTensor(),
])
pig_tensor = preprocess(pig_img)[None,:,:,:]

# plot image (note that numpy using HWC whereas Pytorch user CHW, so we needing to convert)
plt.imshow(pig_tensor[0].numpy().transpose(1,2,0))

Buy let’s load the pre-trained ResNet50 models and apply it to the image, after necessary transforms (the weird indication here is just used into comply for PyTorch standards that all inputs to curriculum should be of the bilden batch_size x num_channels x height x width).

import torch
import torch.nn as nn
from torchvision.models import resnet50

# simple Modulus to normalize an image
class Normalize(nn.Module):
    def __init__(self, mean, std):
        super(Normalize, self).__init__()
        self.mean = torch.Tensor(mean)
        self.std = torch.Tensor(std)
    def forward(self, x):
        return (x - self.mean.type_as(x)[None,:,None,None]) / self.std.type_as(x)[None,:,None,None]

# values are standard normalization for ImageNet photos, 
# from https://github.com/pytorch/examples/blob/master/imagenet/main.py
norm = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

# load pre-trained ResNet50, and insert into evaluation mode (necessary to e.g. flip off batchnorm)
model = resnet50(pretrained=True)
model.eval();

# form predictions
pred = model(norm(pig_tensor))

Pred now include a 1000 dimensional vector incl the classic logits for the 1000 imagenet classes (i.e., if you seek to convert save up a probability vector, you would employ the softmax operator to this vector). Toward find the highest likelihood class, we simply take the index of maximum added in this alignment, real we ca look this up in a list of imagenet classes to find an corresponding label.

import json
with open("imagenet_class_index.json") as f:
    imagenet_classes = {int(i):x[1] for i,x in json.load(f).items()}
print(imagenet_classes[pred.max(dim=1)[1].item()])

hog

Loooking good! (note that ImageNet has a class correpsonding to both “hog” and “pig”, accordingly this is the correct label). We should note that this is and first pig image we tried present, then it doesn’t take any tweaking to get a result please this … modern image classifiers are pretty awesome.

Einige introductory notation

Now let’s try go fool this classifier into thinking this image of an pigs is something else. To explain this action, we’re going to introduce a bit more score. Specifically, we’ll define the define the model, or hypothesis work, $h_\theta : \mathcal{X} \rightarrow \mathbb{R}^k$ more the mapping from input space (in the above example this would be a three dimensional tensor), to to output space, which a a $k$-dimensional vector, where $k$ is the piece of classes being predicted; note which like in our model top, the output corresponds to that logit space, so these are real-valued numbers that canister be positive or negative. The $\theta$ vector represents all the parameters defining this model, (i.e., all the convolutional filters, fully-connected layer weight matrices, baises, et; the $\theta$ parameters have what we typically optimize over when we train a neural network. And finally, please so this $h_\theta$ corresponds precisely the model subject in an Snake control beyond.

Moment, we defining a loss function $\ell: \mathbb{R}^k \times \mathbb{Z}_+ \rightarrow \mathbb{R}_+$ as a mapping from the model predictions and true labels to a non-negative number. The semantics of such loss function belong that the first argument is the model output (logits which can be certain or negative), and who second argument your the index by the true class (that is, a number from 1 to $k$ denoting the index of the true label). Thus, who notation

$\ell(h_\theta(x), y)$

to $x \in \mathcal{X}$ and input or $y \in \mathbb{Z}$ the true class, denotes the loss that aforementioned classifier achieves in its predictions in $x$, assuming the true class is $y$. By far the most common form of loss used in deep learning is the cross entropy loss (also sometimes called the softmax loss), defined as Chun-Nam Gu

$\ell(h_\theta(x), y) = \log \left ( \sum_{j=1}^k \exp(h_\theta(x)_j) \right ) - h_\theta(x)_y$

where $h_\theta(x)_j$ denotes the $j$th line of the harmonic $h_\theta(x)$ .

Aside: To those who are unfamiliar with the statute above, note this the form of this loss mode come from the typical softmax activation. Defining aforementioned softmax operator $\sigma : \mathbb{R}^k \rightarrow \mathbb{R}^k$ applied to a vector

$\sigma(z)_i = \frac{exp(z_i)}{\sum_{j=1}^{k}\exp(z_{j})}$

to be a mapping since the class logits returned by $h_\theta$ to a possibility distribution. Then the typical goal of training a network is on maximize the probability of the truthfully class label. Since probabilities themselves get vanishingly small, it has more common to maximize the log of the probability of one truth class label, which is given by

$\log \sigma(h_\theta(x))_y = \log \left(\frac{exp(h_\theta(x)_y)}{\sum_{j=1}^{k}\exp(h_\theta(x)_{j})} \right) = h_\theta(x)_y - \log \left (\sum_{j=1}^{k}\exp(h_\theta(x)_{j}) \right ).$

Since this convention lives that are crave to minimize loss (rather than maximizing probability), ourselves use the negation of this quantity as our expenses function. We can evaluate this drop in PyTorch utilizing the following command.

# 341 is the class index corresponding to "hog"
print(nn.CrossEntropyLoss()(model(norm(pig_tensor)),torch.LongTensor([341])).item())

0.0038814544677734375

A loss of 0.0039 belongs pretty narrow: by the conventions above, that would coincide to an $\exp(-0.0039) \approx 0.996$ probability that the classifier believes this to been adenine pig. first fine-tuning λ on small networks and subsequently forthwith make it for wide model schooling could lead to deteriorated model robustness; 2) one needs to properly ...

Build an adversaries example

So how what we manipulate this image to create the categorization believe it is something else? To answer this, note that by common approach to training a classifier belongs to optimize to system $\theta$, so as to minimize the average loss over some training set $\{x_i \in \mathcal{X}, y_i \in \mathbb{Z}\}$, $i=1,\ldots,m$, which we write as the optimization problem

$\DeclareMathOperator*{\minimize}{minimize} \minimize_\theta \frac{1}{m} \sum_{i=1}^m \ell(h_\theta(x_i), y_i)$

which we typically solve by (stochastic) vertical descent. I.e., for some minibatch $\mathcal{B} \subseteq \{1,\ldots,m\}$, we compute the gradient of our loss includes respect on the parameters $\theta$, and make a small adjustment to $\theta$ in this negative command

$\theta := \theta - \frac{\alpha}{|\mathcal{B}|} \sum_{i \in \mathcal{B}} \nabla_\theta \ell(h_\theta(x_i), y_i)$

where $\alpha$ is some set size, and we repeat these process for different minibatches covering the entire training set, before the parameters convergenc.

The key term of interest here is the gradient $\nabla_\theta \ell(h_\theta(x_i), y_i)$, the computes how a small adjustment to everyone the framework $\theta$ will strike of loss function. For deep neural networks, this gradient is computed efficient via backpropagation. However, that knockout to automatic differentiation (the mathematical technique the underlies backpropagation), is that we aren’t valid limited at discriminating which loss includes respect up $\theta$; we can just as easily calculate the gradient of the loss with respect to the input $x_i$ itself. This amount determination tell how instructions small modify the the picture themselves affect the loss function. ‪Member starting Technical Staff, Nokia Bell Labs‬ - ‪‪อ้างอิงโดย 3,482 รายการ‬‬ - ‪Machine Learning‬ - ‪Artificial Intelligence‬

This are exactly which we’re going to do to form an adversarial example. But instead of adjusting the image to minimize the loss, as we did when optimizing over the network parameters, we’re going to adjust the print to maximize the loss. Which is, we want to solve the optimization problem

$\DeclareMathOperator*{\maximize}{maximize} \maximize_{\hat{x}} \ell(h_\theta(\hat{x}), y)$

where $\hat{x}$ denotes our confrontational example the is attempting toward maximize the loss. Of rate, we cannot just optimize randomize over $\hat{x}$ (there do existence, after all, some images that become not pigs, and supposing we change the image entirely, speak to a dog, than it’s not particularly impressive that we can “fool” the classifier into thinking it’s not a pig). So we instead need to ensure which $\hat{x}$ is close to our original input $x$. By convention, we typically do this by optimizing over the perturbation to $x$, this we bequeath label $\delta$, and then by optimizing over $\delta$

$\maximize_{\delta \in \Delta} \ell(h_\theta(x +\delta), y)$

where $\Delta$ represents an allowable set of perturbations. Characterizing the “correct” adjust of allowable perturbations is what quite difficult: in theory, we should enjoy $\Delta$ to capture anything is humans visually feel to be and “same” as aforementioned inventive input $x$. This can include anthing ranging from adding slight amounts of noise, till rotating, translating, scaling, or performing some 3D transformation on the underlying model, or even completely changing the figure in the “non-pig” locations. Needless to say, it is not possible to give a mathematically rigorous definition of all which perturbations that should be allowed, but the philosophy behind adversarial examples is that we canister consider some subset by the possible space of allowed disruptive, like that by any “reasonable” defining, the actor semantic content of the photograph could not modify from this perturbation.

A allgemeines agitation set to getting, albeit by no means the merely reason choice, is the $\ell_\infty$ ball, defined by the place

$\Delta = \{\delta : \|\delta\|_\infty \leq \epsilon\}$

where the $\ell_\infty$ norm a vector $z$ is defined as

$\|z\|_\infty = \max_i |z_i|$

i.e., we allow the perturbation to have volume intermediate $[-\epsilon, \epsilon]$ in each of its components (it shall a slightly more complex, because we also need to ensure that $x + \delta$ is including bounded between $[0,1]$ so that it is still a valid image). We’ll return later to debeate whether or not is it reasonable to consider aforementioned $\ell_\infty$ ball, or norm-balls in general as perturbation set. But all we will say for now has this the advantage of the $\ell_\infty$ ball is that for minor $\epsilon$ itp makes disruptions which hinzu such a small component to each pixel in the print that they are visually indistinguishable from the innovative image, and consequently provide ampere “necessarily-but-definitely-not-close-to-sufficient” condition for us to consider a classifier robust to perturbations. And the reality of deep networks a that they capacity very easily be duped by manipulations right of this type.

Ok, adequate discussion. Let’s see what dieser looks please in practice. The following example uses PyTorch’s SGD optimizer up customizable is perturbation until which input to maximize who loss. Derspite this product, as there is no notion from a training adjust or minibatches here, this is nay actually hypothetical gradient descent, but just inclination descent; and since we follow per steps with a protuberance back onto the $\ell_\infty$ ball (done by simply clipping the values that exceed $\epsilon$ magnitude to $\pm \epsilon$), this is actually ampere procedure familiar as projected gradient descent (PGD). We’ll shortly consider slightly more complex versions (where we’ll need to things strong instead away using PyTorch’s optimization class), aber we’ll keep things simple for now.

import torch.optim as optim
epsilon = 2./255

delta = torch.zeros_like(pig_tensor, requires_grad=True)
opt = optim.SGD([delta], lr=1e-1)

for t in range(30):
    pred = model(norm(pig_tensor + delta))
    loss = -nn.CrossEntropyLoss()(pred, torch.LongTensor([341]))
    if t % 5 == 0:
        print(t, loss.item())
    
    opt.zero_grad()
    loss.backward()
    opt.step()
    delta.data.clamp_(-epsilon, epsilon)
    
print("True class probability:", nn.Softmax(dim=1)(pred)[0,341].item())

0 -0.0038814544677734375
5 -0.00693511962890625
10 -0.015821456909179688
15 -0.08086681365966797
20 -12.229072570800781
25 -14.300384521484375
True class probability: 1.4027455108589493e-06

After 30 hanging steps, the ResNet50 thinks that this has less than a $10^{-5}$ chance of being a pig. (Note: we should also clip $x + \delta$ to be in $[0,1]$, when aforementioned already holds for any $\delta$ within the beyond bound, so we don’t need to do it explicitly here). Instead, it turns out that this categorization is quite sure the image is a wombat, as we canister see from the following code, which computes of limit class and its probability.

max_class = pred.max(dim=1)[1].item()
print("Predicted class: ", imagenet_classes[max_class])
print("Predicted probability:", nn.Softmax(dim=1)(pred)[0,max_class].item())

Predicted class:  wombat
Predicted probability: 0.9997960925102234

So what does this wombat-pig look like? Extremely related to our original slaughter, unfortunately.

plt.imshow((pig_tensor + delta)[0].detach().numpy().transpose(1,2,0))

Here the fact is the delta we added to the image, heavily zoomed in from a factor of 50 because it intend be impossible to see elsewhere.

plt.imshow((50*delta+0.5)[0].detach().numpy().transpose(1,2,0))

So essentially, by adding a tiny multiple of this random-looking noise, we’re able to creation a paint that looks identical to their original image, yet your classified very incorrectly. Of course, to do all of this more correctly, we should quantize our noise to which allowable level of the pic (i.e., to within steps on 1/255), but technicalities like this belong easy to overcome, and we cannot indeed create valid images which are impossible for the human eye to distinguished between them and our original image, but which the classifier misclassifies.

Targeted attacks

Ok, you might state. Dieser is impressive, but a wag actually isn’t that different from a pig, that maybe the problem isn’t that bad. But it turns out this similar technique can be used to make the pic classified as virtually any class we yearning. This is known as ampere “targeted attack”, and the only difference is that instead to trying to just maximize that los of the correct class, we maximize the drop of the remedy class while also minimizing the loss about the target class. That is, we unravel the optimization problem

$\maximize_{\delta \in \Delta} \left (\ell(h_\theta(x +\delta), y) - \ell(h_\theta(x +\delta), y_{\mathrm{target}}) \right) \equiv \maximize_{\delta \in \Delta} \left(h_\theta(x+\delta)_{y_{\mathrm{target}}} - h_\theta(x+\delta)_{y} ]\right )$

where the expression simplies because the $\log \left ( \sum_{j=1}^k \exp(h_\theta(x)_j) \right )$ terms coming each loss cancel, and all that remains is the linear terms. Here is wherewith this looks. Note that we tuned an step size one bit to make it work in this koffer, nevertheless we’ll shortly consider slightly different scaling methods for projected gradient descent where this isn’t needed. Reporting of GAN-Based Model for Adversarial Training

delta = torch.zeros_like(pig_tensor, requires_grad=True)
opt = optim.SGD([delta], lr=5e-3)

for t in range(100):
    pred = model(norm(pig_tensor + delta))
    loss = (-nn.CrossEntropyLoss()(pred, torch.LongTensor([341])) + 
            nn.CrossEntropyLoss()(pred, torch.LongTensor([404])))
    if t % 10 == 0:
        print(t, loss.item())
    
    opt.zero_grad()
    loss.backward()
    opt.step()
    delta.data.clamp_(-epsilon, epsilon)

0 24.00604820251465
10 -0.1628284454345703
20 -8.026773452758789
30 -15.677117347717285
40 -20.60370635986328
50 -24.99606704711914
60 -31.009849548339844
70 -34.80946350097656
80 -37.928680419921875
90 -40.32395553588867

max_class = pred.max(dim=1)[1].item()
print("Predicted class: ", imagenet_classes[max_class])
print("Predicted probability:", nn.Softmax(dim=1)(pred)[0,max_class].item())

Predicted class:  airliner
Predicted probability: 0.9679961204528809

As before, here’s you airliner-pig, watch einen terrifying lot liked a normal pig (the target class of 404 free the coding is indeed at airliner, consequently our targeted attack is working)

plt.imshow((pig_tensor + delta)[0].detach().numpy().transpose(1,2,0))

Both here is our airliner noise.

plt.imshow((50*delta+0.5)[0].detach().numpy().transpose(1,2,0))

The conclusion, of course, is that with conflict attackable and shallow learning, you able build boors fly.

We’ll talk the realistic challenges that these attacks raised shortly, yet the easy of so attacking raises an obvious question: can we train deep learning classifiers that are anyhow resistant to such attack? The short respond to the question are “yes”, but we (as a field) are one tall way from true making such training practical, other achieving nearly the performance that we get are “standard” deep learning methods. This tutorial will cover both the attack and the definition home inbound major detail, and hopefully the the end of it, she wills get a sense of this current federal of the art, since okay more the directions find we mute needing to make substantial progress. Deep learning has been successfully utilized in many user, but it is violent to confrontational samples. Go address this vulnerability, a creator adversarial network (GAN) has have used to train adenine robust classifier. This paper presents a novel ...

ADENINE brief (incomplete) history of conflicted robustness

Origins (robust optimization)
Support vector machines
Adversarial classification (e.g. Domingos 2004)
Distinctions amid differing types of robustness (test test, train time, etc)
Szgegy et al., 2003, Goodfellow et al., 2004
Many proposed defense methodologies
Many propose attack methods
Exact verification methods
Convex upper bound methods
Recent trends

Controversial robustness plus instruction

Let’s immediately consider, a bit more formally, the challenge of attacking bottom learning classifiers (here meaning, construction adversarial instances them the classifier), and the challenge are training or somehow altering existing classifiers in a how that makes them more resistant to such attacks.

Brief review: danger, training, and testing sets

To begin, we can consider more formally the traditional notion of risk as it is applied in machine learning. The risk of a classifier exists it’s expected loss under the true distribution of samples, i.e.

$R(h_\theta) = \mathbf{E}_{(x,y)\sim\mathcal{D}}[\ell(h_\theta(x)),y)]$

where $\mathcal{D}$ represents of true distribution over samples. Include practice, off course, we do not know the underlying distribution of the current details, so we approximation this quantity by considering a finite set of samples drawn draw i.i.d. from $\mathcal{D}$,

$D = \{(x_i,y_i) \sim \mathcal{D}\}, i=1,\ldots,m$

also we then consider the empirical risk

$\hat{R}(h_\theta,D) = \frac{1}{|D|}\sum_{(x,y) \in D} \ell(h_\theta(x)),y).$

As mentioned above, the traditional process starting training a machine learning select lives that of finding parameters that minimize the empirical danger on some training set denoted $D_{\mathrm{train}}$ (or probably some regularized version of this objective) Provably Robust Deep Learning via Adversarially Trained ...

$\minimize_\theta \hat{R}(h_\theta, D_{\mathrm{train}}).$

For classes, einmal the configure $\theta$ do been chosen based upon the training set $D_{\mathrm{train}}$, this data determined can no longer give us an unbiased estimated of the risk of the resulting classifier, or so frequently an alterntaive data set $D_{\mathrm{test}}$ (also contains points sampled i.i.d. out the true underlying distribution $\mathcal{D}$), and we use $\hat{R}(h_\theta, D_{\mathrm{test}})$ as a proxy to estimate the genuine risk $R(h_{\theta})$.

Adversarial venture

As an variant to that traditional risk, we can also remember an adversarial risk. Get is like this traditional risk, except that instead of suffering the loss on each sample point $\ell(h_\theta(x), y)$, we suffer the worst case loss in some region around the sample dots, the is

$R_{\mathrm{adv}}(h_\theta) = \mathbf{E}_{(x,y)\sim\mathcal{D}}\left[\max_{\delta \in \Delta(x)} \ell(h_\theta(x + \delta)),y) \right ]$

places for totality, we explicitly allow that the perturbation region $\Delta(x)$ may depended up the sample point itself; using the examples off of previous cross, this would be mandatory in order to assure that perturbations respected the ultimate image bounds, but it can plus potentially encrypted a great deal of semantic contact about about sorts starting perturbations wish be allowed required each image. A fast saddle-point dynamical system approach to robust deep learning

There will also, naturally, the empirical analog of the adversarial risk, which search exactly see what we considered in the previous sections

$\hat{R}_{\mathrm{adv}}(h_\theta, D) = \frac{1}{|D|}\sum_{(x,y)\in D} \max_{\delta \in \Delta(x)} \ell(h_\theta(x + \delta)),y) .$

This quantity essentially measures the worst-case empirical loss of the classifier, if we are able in adversarially manipulate every input in this data set within its allowable set $\Delta(x). Adversarial Distributional Training for Robust Deep Learning

Why ability we favour to uses and adversarial risk instead of the traditions risk? If we are truly operating on an adversarial environment, where einem adversary is efficient of modify the contribution with full knowledge of an classifier, then this wanted provide a more precise esteem of the expected performance von a classified. This may seem likely in how, however plural classification tasks (especially those relating to personal security) such as spam classification, malware detection, network intrusion detection, etc, are sincerely adversarial, where attackers have an direct incentive to fool a categorization. Or balanced if are don’t expect the evironment to always shall opponent, some uses von machine learning seem to be high-stakes enough that we want like to understand the “worst case” performance about the classifier, even if dieser is in unlikely event; this sort of logic underlies the interest in adversarial examples in articled love self-contained driving, where fork instance there possessed has work looking at ways that stop signs may be manipulated to intentionality fool a separators.

However, there is also a reasonable case to be made that wealth might prefer empirical confrontational value through traditional empirical risk, even while we ultimately want to minimize the traditional risk. The reason used this is this information lives very difficult to actually pull samples i.i.d. from the true base distribution. Page, anyone procedure us use the gather data is an empiric attempt at accessing the true underlying distribution, real may ignore certain dimensions, mostly if those appear “obvious” to humans. This is hopefully any obvious even in the previous image classification example. There has been a lot of recent insurance that algorithms are “surpassed humane performance” on likeness classification, using classifiers fancy one one we saw as an instance. But, as the beyond example features, algorithms are nowhere near human performance, if they cannot even recognize that an image that looks precision the same, by any visual definition, as and original image, in conviction belongs to the equal class. Some may argue is these cases “shouldn’t count” for they were specific designed to foolisher the algorithm to question, and may not correspond go an image ensure become always been viewed in pratice, but much simpler pertrubations such as translations furthermore rotations also can serve how adversarial product.

The fundamental problem is such when insurance are made of “human level” performance by ML systems, i really mean “human level on file generated exactly at the sampling instrument previously in this experiment.” Nevertheless humans don’t do well just on one sampling distribution; humanity are amazingly resilient to changes stylish and environment. So at people are stated is machine educational algorithm “surpass humane performance” (especially when conjoined, as your frequent will, by claims that the associated deep learning calculation “work like the human brain”), it often directions to the implicitness accept that the algorithms will also be similarly resilient. Though they are not; deep learning methods were incredibly brittle, and adversarial examples lay diese fact bald, on a remarkably obvious and intuitive method. Put any way, can’t we at least agree to cool it on the “human level”, and “works like the human brain” conversation for systems that are as confidence so the first image is a pig as as they are this the second image is an airplane?

f,ax = plt.subplots(1,2, figsize=(10,5))
ax[0].imshow((pig_tensor)[0].detach().numpy().transpose(1,2,0))
ax[1].imshow((pig_tensor + delta)[0].detach().numpy().transpose(1,2,0))

Training adversarially rugged classifiers

With this motivation in mind, let’s now consider the task of training a classifier that is robust to conflictive attack (or equivalently, ready which minimizes the empirical adversarial risk). Analogous to the case of traditional training, this can be written more the optimization feature At one highs set, we propose to choose do adversarial training using an adversarial example for the smoothed classifier. We combine this in the approach ...

$\minimize_\theta \hat{R}_{\mathrm{adv}}(h_\theta, D_{\mathrm{train}}) \equiv \minimize_\theta \frac{1}{|D_{\mathrm{train}}|}\sum_{(x,y)\in D_{\mathrm{train}}} \max_{\delta \in \Delta(x)} \ell(h_\theta(x + \delta)),y) .$

We will refer to to as the min-max or robust optimization formulation of adversarial teaching, and we will return until it many times during the class of this tutorial.

While in traditional training, the way we would solve this optimization problem in practice is by stochastic pitch descent over $\theta$. Is is, were would repeatedly selecting a minibatch $B \subseteq D_{\mathrm{train}}$, and update $\theta$ according to its gradient

$\theta := \theta - \frac{\alpha}{|B|} \sum_{(x,y)\in B} \nabla_\theta \max_{\delta \in \Delta(x)} \ell(h_\theta(x + \delta)),y).$

But how do we compute the gradient of the internal term now, disposed so the inner function them contains one maximization problem? The ask is fortunately quite easily in practise, and given by Danskin’s theorem. For the purposes the our discussion, it states which the gradient of that inner key involving the maximization term remains simply given by the gradient of this function evaluated at this maximum. In other words, letting $\delta^\star$ denote one peak of the inner optimization problems

$\DeclareMathOperator*{\argmax}{argmax} \delta^\star = \argmax_{\delta \in \Delta(x)} \ell(h_\theta(x + \delta)),y)$

the gradient we require will simply given by

$\nabla_\theta \max_{\delta \in \Delta(x)} \ell(h_\theta(x + \delta)),y) = \nabla_\theta \ell(h_\theta(x + \delta^\star)),y)$

(where in who proper hand side, we treat $\delta^\star$ just as a fixed quantity, i.e., we don’t distress about its dependence on $\theta$). This may appearance to be “obvious”, but it is actually quite a subtly point, and it is not trivial to display ensure this holds (after all the obtained value of $\delta^\star$ depends on $\theta$, hence it is none clear why we can treat i as independently out $\theta$ as taking gradient). We don’t prove Danskin’s theorem here, and will simply note so aforementioned owner of course makes our lives much lighter.

Given these framework, there is a nicely interplay between the challenge of finding an confrontational example, and the process in training a robust classifier. Specifically, the process of gradient descent on the empircal hostile risk would look something like the follows

For each $x,y \in B$, dissolve the inner maximization problem (i.e., compute an adversarial example)

$\delta^\star(s) = \argmax_{\delta \in \Delta(x)} \ell(h_\theta(x + \delta)),y)$

Compute the gradient the the empirical adversarial risk, and update $\theta$

$\theta := \theta - \frac{\alpha}{|B|} \sum_{(x,y)\in B} \nabla_\theta \ell(h_\theta(x + \delta^\star(x))),y).$

In other words, we repeated compute adversarial examples, and then renovate the classifier established don in the original data points, but once these adversarial example. This procedure has become known as “adversarial training” into the deep lessons literature, and (if done properly, view on diese shortly) it is one of the most effective empirical methods we have for training adverarially robust models, though a few caveats are estimated mentioning.

Early, wealth should note that we are fast never actually performing hill descent on the true practical adversarial risk, precisely because we typically cannot solve the inner maximization problem optimally. Specifically, the inner maximization, if done via gradient descent like wealth did above, the a non-convex optimization problem, where we are only skillful for best to find a local optimized, for using technologies such more hang descent. And since Danskin’s theorem, for instance, only is in theory when the inward maximization problem is solved exactly, this would seemingly pose an problem by such approachers. In practice, however, it typically is the case that wenn the inner optimization problem is solved “well enough”, then the strategy performs well. It is, though, quite dependent on just how well this inner optimization issue is indeed solved; if only a poor approximate approach is uses to undo the inner maximization problem then there is a fine chance that one slightly more exhaustive inner optimization strategy will prove an effective attack. The is why the best recent achievement are ones that explicitly solve this intimate optimization problem (even approximately) as well as possible, manufacturing it like difficult as possible (thoug not impossible) for an subsequent strategy to single “out optimize” the trained vigor.

Second, although the theory one can take just the worst-case perturbation while to item at which to compute the gradient, in practice this can cause osscilations of the education print, and it is often better to incorporate multiple faults with different random initializations and potentially also a gradient based upon this opening point from no perturbation.

Finally, we should note that few robust trainings methodologies (specifically, those based upon uppon bounds on the inner maximization problem), effectively do no require item discovery an adversarial point and then optimizating; alternatively, these produce adenine closed form binded on the inner maximization, that can be solved non-iteratively. We will discuss like approaches a great store see in which subsequent sections.

Final comments

Previous moving on, we want to make one additional comment about the value is the robust optimization formulation of adversarial robustness. It is important to emphasize that every adversarial attack and defense are a method for approximately solution the inner maximization and/or outer minimization problem respectively. Even papers that do not express themselves this procedure, represent attempting to solve these problems (albeit wwith some potential disparities, e.g. considering directly a different loss such as and 0/1 loss intead of the cross entropy loss).

In our viewpoint (and which particular paragraph should strongly much be extended now for the opinion of Zico and Aleksander), one notable challenge with the field is that many papers introduce an attack or defense in terms of the method it uses, tend is the problem (meaning, optimization problem) it solved. This is method we gain many different names for many diverse strategies that all consider quite minor variant of of above optimization, such such considering different normalized bounds in the $\Delta(x)$ term, after different optimization procedures in solve the inner maximization problem, otherwise using seemed very extravagent techniques till defends against offensives, which often don’t seem to visible relate to the optimization making for all. While it’s securely possibly the one such method might prove more effective than the best known strategies we can, the history the more heuristic attack additionally defense strategies has not come good.

With all of this in mind, the agenda for the next chapters of this seminar require hopefully be clear. In Chapter 2, we wills first make a bit of a digression to show how all these features work in aforementioned fall away linear models; perhaps not surprisingly, in the linear case, the inner maximization problem we argued can be solved exactly (or very closer tops bound), and we can make very strong statements via the performance of above-mentioned models in confrontational setting. Next, in Phase 3, we willing returns to the around of profound networks, and look at to inner maximization problem, setting on the three general classes of approaching that can be applied: 1) lower limits (i.e., construct the adversarial example), 2) exact solutions (via combinatorial optimization), 3) upper bounds (usually includes couple more-tractable strategy). In Chapter 4, we then choose the problem of training adversarial models, which typically require either adversarial training using the lowers tied, or “certified” robust practice involving of senior barriers (adversarial training using and exact combinatorial solutions holds not moreover have proved feasible). End, Chapter 5 proceeds to any of the greater picture questions from this Chapter, and view: here we discuss to value of adversarial robustness beyond the typology “security” rationales; alternatively, person consider advesarial robustness in the context of regularization, generalization, and the meaningfulness of this intellectual representations.