【论文笔记】Content-based Unrestricted Adversarial Attack-个人在线分享

【论文笔记】Content-based Unrestricted Adversarial Attack插图图2：Adversarial Content Attack的流程。首先使用Image Latent Mapping将图像映射到潜变量空间。然后，用Adversarial Latent Optimization生成对抗性样本。最后，生成的对抗性样本可以欺骗到目标分类模型。

3.1 Image Latent Mapping

对于扩散模型，最简单的图像映射是DDIM采样的逆过程，使用prompt

\mathcal{P}

$P$ 的条件嵌入

(

)

\mathcal{C}=\psi(\mathcal{P})

$C = ψ (P)$ ，基于常微分方程过程可以在小步长限制下反转：

(

−

)

⋅

(

)

(2)

z_{t+1}=\sqrt{\frac{\alpha_{t+1}}{\alpha_t}}z_t+\sqrt{\alpha_{t+1}}(\sqrt{\frac{1}{\alpha_{t+1}}-1}-\sqrt{\frac{1}{\alpha_t}-1})\cdot\epsilon_ heta(z_t,t,\mathcal{C}) ag{2}

$z_{t + 1} = \frac{α _{t + 1}}{α _{t}}$

zt+αt+1

(αt+11−1

−αt1−1

)⋅ϵθ(zt,t,C)(2)
其中

z_0

$z_{0}$ 是给定的真实图像。图像的描述prompt通常由图像描述模型（如BLIP v2）自动生成。

给定

$w$ 作为引导比例参数，

∅

(“”)

\varnothing=\psi ext{(“”)}

$\emptyset = ψ (“”)$ 是空文本的嵌入表示，无分类器引导(classifier-free guidance)预测可以表示为：

(

∅

)

⋅

(

)

(

−

)

⋅

(

∅

)

(3)

ilde{\epsilon}_ heta(z_t,t,\mathcal{C},\varnothing)=w\cdot\epsilon_ heta(z_t,t,\mathcal{C})+(1-w)\cdot\epsilon_ heta(z_t,t,\varnothing) ag{3}

$ϵ ~_{θ} (z_{t}, t, C, \emptyset) = w \cdot ϵ_{θ} (z_{t}, t, C) + (1 - w) \cdot ϵ_{θ} (z_{t}, t, \emptyset) (3)$
Stable Diffusion中

7.5

w=7.5

$w = 7.5$ 。噪声是通过

\epsilon_ heta

$ϵ_{θ}$ 预测出来的，用于去噪过程，因此每一步都会有细微的误差，随着许多步去噪，导致误差累积越来越大，破坏了噪声的高斯分布，诱发不真实的视觉效果。

为减小累计误差，对每一步

$t$ 优化空文本嵌入

∅

\varnothing

$\emptyset$ 。首先使用

w=1

$w = 1$ 在DDIM的逆过程输出一系列潜变量表示

{

∗

⋯

∗

}

\{z_0^*,\cdots,z_T^*\}

${z_{0 *}, \dots, z_{T *}}$ ，其中

∗

z_0^*=z_0

$z_{0 *} = z_{0}$ 。然后对于时间戳

{

⋯

}

\{T,\cdots,1\}

${T, \dots, 1}$ ，使用

7.5

w=7.5

$w = 7.5$ ，

\bar{z}_T=z_t

$z ˉ_{T} = z_{t}$ 在

$N$ 次迭代中进行了如下优化：

min

⁡

∅

∣

−

∗

−

(

∅

)

∣

(4)

\min_{\varnothing_t}||z_{t-1}^*-z_{t-1}(\bar{z}_t,t,\mathcal{C},\varnothing)||_2^2 ag{4}

$\emptyset_{t} min ∣∣ z_{t - 1 *} - z_{t - 1} (z ˉ_{t}, t, C, \emptyset) ∣ ∣_{22} (4)$

−

(

∅

)

−

(

−

)

⋅

(

∅

)

(5)

z_{t-1}(\bar{z}_t,t,\mathcal{C},\varnothing)=\sqrt{\frac{\alpha_{t-1}}{\alpha_t}}\bar{z}_t+\sqrt{\alpha_{t-1}}(\sqrt{\frac{1}{\alpha_{t-1}}-1}-\sqrt{\frac{1}{\alpha_t}-1})\cdot ilde{\epsilon}_ heta(z_t,t,\mathcal{C},\varnothing) ag{5}

$z_{t - 1} (z ˉ_{t}, t, C, \emptyset) = \frac{α _{t - 1}}{α _{t}}$

zˉt+αt−1

(αt−11−1

−αt1−1

)⋅ϵ~θ(zt,t,C,∅)(5)
在每一步的最后，将

−

\bar{z}_{t-1}

$z ˉ_{t - 1}$ 更新为

−

(

∅

)

z_{t-1}(\bar{z}_t,t,\mathcal{C},\varnothing_t)

$z_{t - 1} (z ˉ_{t}, t, C, \emptyset_{t})$ 。最后得到原始图像在潜变量空间内的表示，包含噪声

\bar{z}_T

$z ˉ_{T}$ ，空文本嵌入

∅

\varnothing_t

$\emptyset_{t}$ 和文本嵌入

(

)

\mathcal{C}=\psi(\mathcal{P})

$C = ψ (P)$ 。

3.2 Adversarial Latent Optimization

本节提出了一种针对潜变量的优化方法，最大化在非限制对抗样本上的攻击性能。经过image latent mapping后得到的潜变量空间中，空文本嵌入

∅

\varnothing_t

$\emptyset_{t}$ 确保了重建的图像的质量，条件嵌入

\mathcal{C}

$C$ 保证了图像的语义信息。同时优化两种嵌入并不现实，考虑到噪声

\bar{z}_T

$z ˉ_{T}$ 很大程度上表示了潜变量空间中图像的信息，因此选择优化噪声

\bar{z}_T

$z ˉ_{T}$ 。但是这种优化的复杂梯度计算和取值范围溢出的问题仍然是挑战。

基于image latent mapping生成的潜变量，将扩散模型中的去噪过程Eq.5定义为

(

⋅

)

\Omega(\cdot)

$Ω (\cdot)$ ，其包含

$T$ 次迭代：

(

{

∅

}

)

(

⋯

(

−

∅

−

)

⋯

∅

)

∅

)

(6)

\Omega(z_T,T,\mathcal{C},\{\varnothing_t\}_{t=1}^T)=z_0(z_1(\cdots,(z_{T-1},T-1,\mathcal{C},\varnothing_{T-1}),\cdots,1,\mathcal{C},\varnothing_1),0,\mathcal{C},\varnothing_0) ag{6}

$Ω (z_{T}, T, C, {\emptyset_{t}}_{t = 1 T}) = z_{0} (z_{1} (\dots, (z_{T - 1}, T - 1, C, \emptyset_{T - 1}), \dots, 1, C, \emptyset_{1}), 0, C, \emptyset_{0}) (6)$
由此，重新构建的模型可表示为

(

{

∅

}

)

\bar{z}_0=\Omega(z_T,T,\mathcal{C},\{\varnothing_t\})

$z ˉ_{0} = Ω (z_{T}, T, C, {\emptyset_{t}})$ 。结合Eq.7，对抗性目标优化可以表示为：

max

⁡

(

)

∣

∞

≤

(

{

∅

}

)

(7)

\max_\delta \mathcal{L}(\mathcal{F}_ heta(\bar{z}_0,y)),\ s.t.||\delta||_\infty\leq\kappa,\ \bar{z}_0=\Omega(z_T+\delta,T,\mathcal{C},\{\varnothing_t\}) ag{7}

$δ max L (F_{θ} (z ˉ_{0}, y)), s . t .∣∣ δ ∣ ∣_{\infty} \leq κ, z ˉ_{0} = Ω (z_{T} + δ, T, C, {\emptyset_{t}}) (7)$
其中

\bar{z}_0

$z ˉ_{0}$ 是自然图像，

\delta

$δ$ 是潜变量空间中的对抗性扰动。

损失函数包含两部分：

交叉熵损失 $\mathcal{L}_{ce} Lce，用于引导对抗性样本误导分类器；$
均方误差损失 $\mathcal{L}_{mse} Lmse，用于引导对抗性样本在 l 2 l_2 l2距离上尽可能接近真实的干净样本。由此，完整的损失函数 L \mathcal{L} L可以表示为：$

(

)

(

)

−

⋅

(

)

\mathcal{L}(\mathcal{F}_ heta(\bar{z}_0),y,z_0)=\mathcal{L}_{ce}(\mathcal{F}_ heta(\bar{z}_0),y)-\beta\cdot\mathcal{L}_{mse}(\bar{z}_0,z_0)

$L (F_{θ} (z ˉ_{0}), y, z_{0}) = L_{ce} (F_{θ} (z ˉ_{0}), y) - β \cdot L_{m se} (z ˉ_{0}, z_{0})$
本文中

0.1

\beta=0.1

$β = 0.1$ ，损失函数的目标是最大化交叉熵损失冰最小化和干净样本的

l_2

$l_{2}$ 距离。为保证

z_0

$z_{0}$ 和

\bar{z}_0

$z ˉ_{0}$ 的一致性，假设当

\delta

$δ$ 很小时（即

∣

∞

≤

||\delta||_\infty\leq\kappa

$∣∣ δ ∣ ∣_{\infty} \leq κ$ 时），

\delta

$δ$ 不会改变

z_0

$z_{0}$ 和

\bar{z}_0

$z ˉ_{0}$ 的一致性，关键在于产生最大分类损失的

\delta

$δ$ 。

类似于传统的对抗攻击，使用基于梯度的技术，通过

≃

∇

(

)

\delta\simeq\eta
abla_{z_T}\mathcal{L}(\mathcal{F}_ heta(\bar{z}_0),y)

$δ ≃ η \nabla_{z_{T}} L (F_{θ} (z ˉ_{0}), y)$ 来估计噪声

\delta

$δ$ ，其中

\eta

$η$ 是发生在梯度方向上的扰动量。利用链式规则对

∇

(

)

abla_{z_T}\mathcal{L}(\mathcal{F}_ heta(\bar{z}_0),y)

$\nabla_{z_{T}} L (F_{θ} (z ˉ_{0}), y)$ 进行展开，可以得到如下的导数项：

∇

(

)

∂

⋅

∂

⋅

∂

⋯

∂

−

∂

(9)

abla_{z_T}\mathcal{L}(\mathcal{F}_ heta(\bar{z}_0),y)=\frac{\partial\mathcal{L}}{\partial\bar{z}_0}\cdot\frac{\partial\bar{z}_0}{\partial z_1}\cdot\frac{\partial z_1}{\partial z_2}\cdots\frac{\partial z_{T-1}}{\partial z_T} ag{9}

$\nabla_{z_{T}} L (F_{θ} (z ˉ_{0}), y) = \frac{\partial L}{\partial z ˉ _{0}} \cdot \frac{\partial z ˉ _{0}}{\partial z _{1}} \cdot \frac{\partial z _{1}}{\partial z _{2}} \dots \frac{\partial z _{T - 1}}{\partial z _{T}} (9)$

Skip Gradient
尽管梯度是可导的，通过此式推导出完整的计算图是不可行的。

$\frac{\partial\mathcal{L}}{\partial\bar{z}_0} ∂zˉ0∂L是分类器关于重构图像 z ˉ 0 \bar{z}_0 zˉ0的导数，并提供对抗梯度方向。$
$\frac{\partial z_t}{\partial z_{t+1}} ∂zt+1∂zt，每一次导数的计算都代表一次反向传播的计算。$
一个完整的去噪过程累积了

本文提出了Skip Gradient来估计

∂

⋅

∂

⋅

∂

⋯

∂

−

∂

\frac{\partial\mathcal{L}}{\partial\bar{z}_0}\cdot\frac{\partial\bar{z}_0}{\partial z_1}\cdot\frac{\partial z_1}{\partial z_2}\cdots\frac{\partial z_{T-1}}{\partial z_T}

$\frac{\partial L}{\partial z ˉ _{0}} \cdot \frac{\partial z ˉ _{0}}{\partial z _{1}} \cdot \frac{\partial z _{1}}{\partial z _{2}} \dots \frac{\partial z _{T - 1}}{\partial z _{T}}$ 。去噪过程旨在消除DDIM采样中加入的高斯噪声，DDIM利用重参数化技巧，在任意第

$t$ 步下进行闭式采样：

−

∼

(

)

(10)

z_t=\sqrt{\alpha_t}z_0+\sqrt{1-\alpha_t}\varepsilon,\ \varepsilon\sim\mathcal{N}(0,I) ag{10}

$z_{t} = α_{t}$

z0+1−αt

ε, ε∼N(0,I)(10)
对Eq.10变形，得到

−

z_0=\frac{1}{\sqrt{\alpha_t}}z_t-\sqrt{\frac{1-\alpha_t}{\alpha_t}}\varepsilon

$z_{0} = α _{t}$

1zt−αt1−αt

ε。由此，得到

∂

\frac{\partial z_0}{\partial z_t}=\frac{1}{\sqrt{\alpha_t}}

$\frac{\partial z _{0}}{\partial z _{t}} = α _{t}$

1。Stable Diffusion中，步长

$t$ 最多是1000，因此有

lim

⁡

→

1000

∂

lim

⁡

→

1000

≈

14.58

\lim_{t\rightarrow 1000}\frac{\partial z_0}{\partial z_t}=\lim_{t\rightarrow 1000}\frac{1}{\sqrt{\alpha_t}}\approx 14.58

$lim_{t \to 1000} \frac{\partial z _{0}}{\partial z _{t}} = lim_{t \to 1000} α _{t}$

1≈14.58。总结而言，

∂

\frac{\partial z_0}{\partial z_t}

$\frac{\partial z _{0}}{\partial z _{t}}$ 可以被看做常数

\rho

$ρ$ ，Eq.9可以变为

∇

(

)

∂

abla_{z_T}\mathcal{L}(\mathcal{F}_ heta(\bar{z}_0),y)=\rho\frac{\partial\mathcal{L}}{\partial\bar{z}_0}

$\nabla_{z_{T}} L (F_{θ} (z ˉ_{0}), y) = ρ \frac{\partial L}{\partial z ˉ _{0}}$ 。综上所述，Skip Gradients估计了去噪过程的梯度，减少了计算和存储需求。

Differentiable Boundary Processing
扩散模型没有严格限制

\bar{z}_0

$z ˉ_{0}$ 的数值取值范围，

z_T

$z_{T}$ 的修改可能会导致其取值范围被超出。由此引入differentiable boundary processing

(

⋅

)

\varrho(\cdot)

$ϱ (\cdot)$ 。

(

⋅

)

\varrho(\cdot)

$ϱ (\cdot)$ 将超出

[

]

[0,1]

$[0, 1]$ 范围的数值压缩到

[

]

[0,1]

$[0, 1]$ 中：

(

)

{

tanh

⁡

(

1000

)

10000

≤

(11)

\varrho(x) = \begin{cases} anh(1000x)/10000 \qquad & x<0 \ x \qquad & 0\leq x 1 \end{cases} ag{11}

$ϱ (x) = ⎩$

⎨

⎧tanh(1000x)/10000xtanh(1000(x−1))/10001x<00≤x<1x>1(11)
接下来定义

\Pi_\kappa

$Π_{κ}$ 为对抗扰动

\delta

$δ$ 在

\kappa

$κ$ 球面上的投影。引入动量

$g$ ，将优化对抗性潜变量为：

←

⋅

−

∇

(

)

∣

∇

(

)

∣

(12)

g_k\leftarrow \mu\cdot g_{k-1}+\frac{
abla_{z_T}\mathcal{L}(\mathcal{F}_ heta((\varrho(\bar{z}_0),y)))}{||
abla_{z_T}\mathcal{L}(\mathcal{F}_ heta((\varrho(\bar{z}_0),y)))||_1} ag{12}

$g_{k} \leftarrow μ \cdot g_{k - 1} + \frac{\nabla _{z_{T}} L ( F _{θ} (( ϱ ( z ˉ _{0} ) , y )))}{∣∣ \nabla _{z_{T}} L ( F _{θ} (( ϱ ( z ˉ _{0} ) , y ))) ∣ ∣ _{1}} (12)$

←

(

−

⋅

sign

(

)

(13)

\delta_k\leftarrow\Pi_\kappa(\delta_{k-1}+\eta\cdot ext{sign}(g_k)) ag{13}

$δ_{k} \leftarrow Π_{κ} (δ_{k - 1} + η \cdot sign (g_{k})) (13)$
综上所述，adversarial latent optimization采用跳跃梯度来确定去噪过程的梯度，结合可微边界处理来调节对抗样本的取值范围，根据梯度进行迭代优化。结合图像潜在映射，算法1中说明了adversarial content attack的详细过程。
【论文笔记】Content-based Unrestricted Adversarial Attack插图(1)

一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

3.1 Image Latent Mapping

3.2 Adversarial Latent Optimization

admin 钻石

相关推荐