Great question. Here are some details to complement the answer @tristan-nemoz, and explain why the diffusion operator works in the first place.
As you correctly stated, the job of the diffusion operator is to "reflect" the probability amplitudes about the mean. So, in the case of two qubits ($n = 2$), this is how Grover's algorithm will go:
- Place qubits in equal superposition:
$$ 
\begin{aligned}
|\psi_1\rangle &= \frac{1}{\sqrt{2^n}} \sum_{i = 0}^{2^n} |i\rangle 
\\
\\
&= \frac{1}{2}\left(|00\rangle + |01\rangle + |10\rangle + |11\rangle\right) 
\end{aligned}
$$
If we graph the probability amplitudes for each state, it will look like this:

- Apply the oracle $U_f$ to "flip" the probability amplitude of the marked element (since you mentioned you know how this works and how to construct the circuit, I won't go into details). For argument sake, let's assume the marked state is $|01\rangle$:
$$ |\psi_2\rangle = \frac{1}{2}\left(|00\rangle - |01\rangle + |10\rangle + |11\rangle\right) $$

- "Flip" all probability amplitudes $\alpha_i$ about the mean. So, for this, we need to compute the mean $\mu$, which is easily done by adding all probability amplitudes and dividing by the total number of elements ($2^n$):
$$ 
\begin{aligned}
\mu &= \frac{1}{2^n}\sum_{i = 0}^{2^n} \alpha_i
\\
\\
&= \frac{1}{4}\left(\frac{1}{2} - \frac{1}{2} + \frac{1}{2} + \frac{1}{2}\right) = \frac{1}{4}
\end{aligned}
$$
Notice that the distance between the current probability amplitudes $\alpha_i$ and the mean $\mu$ is simply $\alpha_i - \mu$, which pictorially looks like this:

It is important to note that in the fig above we haven't changed anything with respect to step 2, we're just redrawing the amplitudes starting from the mean.
Now, after the reflection about the mean, the graph will look like this instead:

The distance between each probability amplitude to the mean will still be $\alpha_i - \mu$, but now they will go in opposite direction. So the question is, what is the new value of each probability amplitude? (i.e., what is the distance from the $0$ line to the new $\alpha_i$?). Well, in general, that would be the distance from $0$ to the mean line minus the distance from the mean line to the amplitude $\alpha_i - \mu$, so we have:
$$
\begin{aligned}
\alpha_i^{(new)} &= \mu - (\alpha_i^{(old)} - \mu)
\\
\\\alpha_i^{(new)} &= 2\mu - \alpha_i^{(old)},
\end{aligned}
$$
where $\alpha_i^{(old)}$ and $\alpha_i^{(new)}$ are each of the prob amplitudes before and after the reflection about the mean, respectively.

So, the column vector with all new probability amplitudes is given by:
$$ \begin{bmatrix}\alpha_0 \\ \alpha_1 \\ \vdots \\ \alpha_{2^n-1} \end{bmatrix}^{(new)} = 2\mu - \begin{bmatrix}\alpha_0 \\ \alpha_1 \\ \vdots \\ \alpha_{2^n-1} \end{bmatrix}^{(old)}.$$
One thing to note is that, one can compute the mean of a column vector by taking the inner product of that vector with an "all ones" row vector:
$$\mu = \frac{1}{2^n}\begin{bmatrix}1 & 1 & \dots & 1 \end{bmatrix} \begin{bmatrix}\alpha_0 \\ \alpha_1 \\ \vdots \\ \alpha_{2^n-1} \end{bmatrix} = \frac{1}{2^n}(\alpha_0 + \alpha_1 + \dots + \alpha_{2^n-1})$$
You can replace the expression for the mean in for each new probability amplitude by turning it into a matrix:
$$ 
\begin{aligned}
\begin{bmatrix}\alpha_0 \\ \alpha_1 \\ \vdots \\ \alpha_{2^n-1} \end{bmatrix}^{(new)} = 2\frac{1}{2^n}\begin{bmatrix} 1 & 1 & \dots & 1 \\ 1 & 1 & \dots & 1 \\ \vdots & \vdots & \ddots & \vdots \\ 1 & 1 & \dots & 1  \end{bmatrix}\begin{bmatrix}\alpha_0 \\ \alpha_1 \\ \vdots \\ \alpha_{2^n-1} \end{bmatrix}^{(old)} - \begin{bmatrix}\alpha_0 \\ \alpha_1 \\ \vdots \\ \alpha_{2^n-1} \end{bmatrix}^{(old)}.
\end{aligned}
$$
Then, we can factorize the old prob amp vector to find what is the unitary $U_s$ needed to find the new prob amp vector (which are the matrices inside the parenthesis):
$$ 
\begin{aligned}
\begin{bmatrix}\alpha_0 \\ \alpha_1 \\ \vdots \\ \alpha_{2^n-1} \end{bmatrix}^{(new)} = \left(\underbrace{\frac{2}{2^n}\begin{bmatrix} 1 & 1 & \dots & 1 \\ 1 & 1 & \dots & 1 \\ \vdots & \vdots & \ddots & \vdots \\ 1 & 1 & \dots & 1  \end{bmatrix}}_{2| s \rangle \langle s |} - \underbrace{\begin{bmatrix} 1 & 0 & \dots & 0 \\ 0 & 1 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & 1  \end{bmatrix}}_I \right)\begin{bmatrix}\alpha_0 \\ \alpha_1 \\ \vdots \\ \alpha_{2^n-1} \end{bmatrix}^{(old)}.
\end{aligned}
$$
where $|s\rangle$ is the equal superposition state and $I$ the identity matrix. So we have:
$$ U_s = 2| s \rangle \langle s | - I .$$
To generate the equal superposition state, one takes the all-zeros state and apply a Hadamard to each qubit: $| s \rangle = H^{\otimes n}| 0 \rangle^{\otimes n}$.
So, replacing $|s\rangle$ (and $\langle s |$) in the operator expression, you get:
$$ U_s = 2 \, H |0\rangle  \langle 0| H - I,$$
where I have omitted the $^{\otimes n}$ superscript for the $H$ gates (and all-zeros state) for convenience.
Now, since $HH = I$, we can sandwich the identity in the expression above without changing the result:
$$
\begin{aligned}
U_s &= 2 \, H |0\rangle  \langle 0| H - H \, I \, H
\\
\\
U_s &= H \left ( 2 |0\rangle  \langle 0| - I \right) H
\end{aligned}
$$
Next, consider the matrix representation of the expression in between the $H$ gates above:
$$
\begin{aligned}
2 |0\rangle  \langle 0| - I = & \, \begin{bmatrix} 2 & 0 & \dots & 0 \\ 0 & 0 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & 0 \end{bmatrix} - \begin{bmatrix} 1 & 0 & \dots & 0 \\ 0 & 1 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & 1 \end{bmatrix}
\\
\\
2 |0\rangle  \langle 0| - I = -1 & \, \begin{bmatrix} -1 & 0 & \dots & 0 \\ 0 & 1 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & 1 \end{bmatrix} 
\end{aligned}
$$
This last matrix corresponds to a gate that inverts the phase of the all-zeros state, and the pre-factor of $−1$ is simply associated with a global phase we can ignore. To construct this gate, you can first flip all bits with an $X$ gate, apply a multi-controlled $Z$ gate on all qubits $\text{MC}Z$, and then flip the bits back with another $X$:
$$2 |0\rangle  \langle 0| - I = X \, \text{MC}Z \, X .$$
Replacing in $U_s$, you get the following sequence of gates:
$$U_s = H \, X \, \text{MC}Z \, X \, H,$$
where, again, all of these are being applied to all qubits. And this expression is exactly the circuit you have in your diagram.