As I understand the introduction of sideband via phase modulation is done before the beam enters any cavities/the interferometer. Is this usually correct?
Yes. Figure 1 of this paper, for example, shows the key components with the modulator in front of everything else.
Which frequency is kept on resonance with the cavities? Carrier or sidebands? How does a cavity in resonance with the carrier affect the sidebands and vice versa?
Just to be clear, there are three main frequencies of interest here: the carrier frequency $\omega_0 \approx 282\,\mathrm{THz}$, and the two sideband frequencies introduced by the modulator, $ω_{\mathrm{m}1} \approx 5\,\mathrm{MHz}$ and $ω_{\mathrm{m}2} \approx 45\,\mathrm{MHz}$. It's also worth noting that noise and other oscillatory disturbances — most importantly, gravitational waves — will also produce sidebands, but not at constant frequencies that can be used to tune cavities, so we disregard them for this purpose. This paper says
- the length of the power recycling cavity is chosen to be resonant for all frequency components simultaneously
- the length of the signal recycling cavity is set such that the carrier and the $ω_{\mathrm{m}2}$ sidebands are simultaneously resonant, while the $ω_{\mathrm{m}1}$ are not
- the carrier field is resonant in both the arm cavities and the power recycling cavity
- the length of the Fabry-Pérot arm cavities and modulation frequencies are conditioned, so that the carrier is resonant, whereas the RF sidebands are very near the exact point in between.
The Michelson interferometer (comprised of the Fabry-Pérot cavities plus their distances to the beamsplitter) is kept very close to — but not precisely at — a dark fringe, so that a small amount of carrier light comes out the "dark port". It's not actually clear to me how this is achieved; I believe the Fabry-Pérot cavities are kept at resonance, while their distances to the beamsplitter are manipulated to achieve this.
So, to summarize, I believe this table describes Advanced LIGO:
\begin{array} {l|ccc}
& \omega_0 & \omega_{\mathrm{m}1} & \omega_{\mathrm{m}2} \\
\hline
\text{Power recycling} & \text{resonant} & \text{resonant} & \text{resonant} \\
\text{Fabry Pérot} & \text{resonant} & \text{not resonant} & \text{not resonant} \\
\text{Michelson} & \text{nearly dark} & \text{not resonant} & \text{not resonant} \\
\text{Signal recycling} & \text{resonant} & \text{not resonant} & \text{resonant}
\end{array}
Some sources imply that the signal recycling cavity is tuned to the sideband(s) and the power recycling cavity is tuned to the carrier frequency but I'm not sure if I understood that correctly. Wouldn't the power recycling cavity tuned to the carrier frequency lead to a relative loss of the sidebands which are required for detection?
Evidently, those sources are incorrect; see previous question. On the other hand, your intuition is evidently correct; all available power should be recycled.
Why is dark port detection preferred when shot noise scales with 1/(Laser Power)?
Excellent question. You've almost answered this yourself, because you wrote that it scales according to the "Laser Power". While I suspect that you meant to suggest that it scales with the power of the light actually hitting the photodetector, it turns out that it actually does scale with the laser power input right into the front of LIGO, so dark or bright output doesn't matter (at least approximately).
It's true that the shot noise in the amount of light arriving at the photodetector scales with the power in the "local oscillator" (which is the amount of light that's allowed to exit the "dark port" because the Michelson is only held close to a dark fringe, rather than precisely on it). Specifically, the noise on the photodetector scales as
\begin{equation}
N_{\text{pd}} \propto \sqrt{P_{\text{local oscillator}}}.
\end{equation}
However, the quantity we actually care about is the noise in the strain that we deduce from that light, and to calculate this, there is a transfer function that has its own scaling that changes things. This is a little simplified, but basically, if $P_{\mathrm{pd}}$ is the optical power on the photodetector, and strain is $h$, then they are related by
\begin{equation}
h = \frac{P_{\mathrm{pd}}} {C},
\end{equation}
where $C$ is this transfer function that relates the optical power to the strain that actually affects the interferometer. The derivation is not obvious (I think I've seen it, but I can't remember it, and I actually don't know of a reference for it), but it turns out that we have
\begin{equation}
C \propto \sqrt{P_{\text{laser power}}\, P_{\text{local oscillator}}}.
\end{equation}
Therefore, we get the strain noise going like
\begin{equation}
N_h = \frac{N_{\text{pd}}} {C} \propto \sqrt{\frac{P_{\text{local oscillator}}} {P_{\text{laser power}}\, P_{\text{local oscillator}}}}
= 1/\sqrt{P_{\text{laser power}}}.
\end{equation}
So the amount of light actually being let out of the detector scales out of the result, and only the input power matters. Of course, there are additional factors that can help. For example, the laser power should actually be multiplied by the gain of the power-recycling cavity (which is ~38), and $N_h$ should also be divided by the gain from build up in the Fabry-Pérots (~270). But in terms of scaling, the amount of power they choose to allow out of the "dark" port doesn't affect the result; it's all about the power they pump into the instrument. These details are covered (though not really explained in depth) in this paper.
What is done to prevent the active stabilization techniques from cancelling out a "real" GW-signal?
The two main ideas are to ensure that the stabilization forces only occur at frequencies outside of the sensitive detection range as much as possible, and to apply those forces to degrees of freedom other than the crucial differential arm length. So basically, as much as possible, there is no control system affecting the differential arm length at frequencies between $10\,\mathrm{Hz}$ and many thousands of Hz. There are apparently some unavoidable cross couplings from the Michelson and signal-recycling degrees of freedom, but because those couplings are known and the error signals from the other control loops are known, their effects can be subtracted from the differential arm length using feedforward filters. This paper has some more detail and references.