Definition of optimal witness — Start recalling that an entanglement witness $W^{\rm opt}$ is said to be optimal if there is no other witness $W'$ with detects a strict superset of entangled states — that is, a $W'$ such that $\langle\rho,W^{\rm opt}\rangle<0\implies \langle\rho,W'\rangle<0$, and there is some $\sigma$ such that $\langle\sigma,W'\rangle<0< \langle\sigma,W^{\rm opt}\rangle$.
Note that I'll use the shorthand notation $\langle A,B\rangle\equiv \operatorname{tr}(A^\dagger B)$ throughout this answer, as I think it better highlights some of the geometric ideas involved in these calculations.
Prove that if $W-P$ is a witness then $W$ isn't optimal
Subtracting a PSD operator can't do worse —
Let $W$ be an entanglement witness, and let $P\ge0$ be a PSD operator such that $W-P$ is still an entanglement witness.
It's easy to see that $W-P$ is at least as strong a witness as $W$, because for any $\rho$, $\langle W,\rho\rangle<0$ implies $\langle W-P,\rho\rangle<0$. We want to prove that $W-P$ is strictly stronger than $W$, that is, that there is some entangled state $\eta^e$ such that $\langle W,\eta^e\rangle\ge0$ but $\langle W-P,\eta^e\rangle<0$.
Proof idea — One way to prove this is to show that there is always some state $\eta$ such that $\langle W,\eta\rangle=0$ and $\langle P,\eta\rangle>0$. If you can find this state, then it immediately follows that $W-P$ detects it while $W$ doesn't, hence $W$ isn't optimal.
Prove existence of an entangled state needing $P$ to be witnessed —
Note that unless $P=0$, there must be some state $\eta$ such that $\langle P,\eta\rangle>0$. Let's then consider the expectation value of $W$ on this state, and distinguish between the following three possibilities:
- If $\langle W,\eta\rangle=0$, then $\langle W-P,\eta\rangle<0$, and thus $\eta$ is entangled, and detected as such by $W-P$ but not by $W$. Thus $W-P$ is stronger than $W$, and $W$ isn't optimal.
- If $\langle W,\eta\rangle<0$, let $\eta^+$ be some state such that $\langle W,\eta^+\rangle>0$ (which, again, must exist if $W\neq0$). But then, there must be some $p\in(0,1)$ such that
$$\bar\eta = p \eta+(1-p)\eta^+$$
satisfies $\langle W,\bar\eta\rangle=0$. Furthermore, we still have $\langle P,\bar\eta\rangle>0$, hence again $W$ isn't optimal.
- If $\langle W,\eta\rangle>0$, let $\eta^-$ be some (necessarily entangled) state such that $\langle W,\eta^-\rangle<0$. Then as above, there must be some $p\in(0,1)$ such that
$$\bar\eta = p\eta + (1-p)\eta^-$$
gives $\langle W,\bar\eta\rangle=0$, and still $\langle P,\bar\eta\rangle>0$, hence again the conclusion.
For more related results you might want to have a look at (Lewenstein et al. 2000, quant-ph/0005014). For example, they show in Lemma 2 that an entanglement witness $W_2$ is finer than $W_1$ iff there's $P\ge0$ with $\operatorname{tr}(P)=1$ and $0\le \epsilon<1$ such that $W_1=(1-\epsilon)W_2+\epsilon P$. And then soon after that a witness $W$ is optimal iff for all $P$ and $\epsilon>0$ the operator $(1+\epsilon)W-\epsilon P$ isn't a witness.
$W$ is optimal if no $P\ge0$ can be subtracted from it
This is an intermediate step that will help proving the next result. It's essentially a rewording of what is proved in Lemma 2 and Theorem 1 of https://arxiv.org/abs/quant-ph/0005014.
They show in Lemma 2 that $W_2$ is finer than $W_1$ iff there's $P\ge0$ with $\operatorname{tr}(P)=1$ and $\epsilon\in[0,1)$ such that $W_1=(1-\epsilon)W_2+\epsilon P$. But observing that any $W$ is a witness iff $\alpha W$ is a witness for any $\alpha>0$, this is the same as saying that $W_2$ is finer than $W_1$ iff it's a positive multiple of $W_1-P$ for some $P\ge0$ with $\operatorname{tr}(P)\in[0,1)$.
So in summary, they're saying that all witnesses finer than $W_1$ must have (modulo positive multiples) the form $W_1-P$ for some $P\ge0$.
Which in turn means that if $W-P$ isn't a witness for all $P\ge0$, then $W$ must be optimal.
Let's try to more directly show that if $W'$ is a finer entanglement witness than $W$, then $\alpha W'=W-P$ for some $\alpha>0$ and some $P\ge0$.
This amounts to proving that any such $W'$ is such that $W-\alpha W'\ge0$ for some $\alpha>0$, that is, that there is $\alpha>0$ such that $\langle W,\rho\rangle\ge \alpha \langle W',\rho\rangle$.
for all states $\rho$. To show this, let's consider a few different cases:
- If $\langle W,\rho\rangle=0$, then $\langle W',\rho\rangle\le0$, and thus the inequality is satisfied. This follows from Lemma 1(i) in the paper.
- If $\langle W,\rho\rangle<0$, then $\langle W',\rho\rangle\le\langle W,\rho\rangle$, and thus the inequality is again satisfied. This follows from Lemma 1(2) in the paper.
- If $\langle W,\rho\rangle>0$ then $\langle W',\rho\rangle\le\lambda\langle W,\rho\rangle$ for some fixed $\lambda\ge1$ that only depends on $W$ and $W'$. This is Lemma 1(3) in the paper.
We thus have the result with $\alpha=1/\lambda$.
Optimality vs product states such that $\langle e, f|W|e,f\rangle=0$
The other point is again discussed in https://arxiv.org/abs/quant-ph/0005014. Define
$$P_W\equiv \{|e,f\rangle : \,\, \langle W,\mathbb{P}_e\otimes\mathbb{P}_f\rangle=0\}.$$
They then show in the paper that:
(Lemma 3) If $P\ge0$ is such that $PP_W\neq0$, then $W-P$ isn't a witness. In other words, if there's $|e,f\rangle\in P_W$ such that $P|e,f\rangle\neq0$, then $W-P$ isn't a witness. This is immediate remembering that this would mean $\langle W,\mathbb{P}_e\otimes \mathbb{P}_f\rangle=0$ and $\langle P,\mathbb{P}_e\otimes \mathbb{P}_f\rangle>0$, and thus $\langle W-P,\mathbb{P}_e\otimes\mathbb{P}_f\rangle<0$.
(Corollary 2) If $P_W$ spans the whole space, then $W$ is optimal.
Using the above results this is now also easy to see: if $P_W$ spans the space, then any operator $P\neq0$ is such that $W-P$ isn't a witness anymore. Thus $W$ must be optimal.