Formally, a mode in quantum optics is a solution of the electromagnetic wave equations, which can be populated with photons. This is, admittedly, quite abstract and not obviously connected to how the term is used in different sub-fields. I think, the best is, to learn it from examples.
Transveral modes
The simplest solution to Maxwell's equations is a plane wave
$\vec{E}_0 e^{i(\vec{k} \cdot \vec{r} - \omega t)}$. One can consider it a mode. It's just not of any practical relevance, because it extends over all space and time – far beyond the experimental setup. But the nice thing about Maxwell's equations is that they are linear (as long as you stay away from
nonlinear susceptibilities). Therefore, a linear combination of several solutions is also a solution. Prominent examples are the emission pattern of a dipole antenna or Gaussian beams. The latter one is typically used to describe the
mode of a cavity, for example in a laser. You probably also heard of higher-order transversal modes in this context. All these Hermite-Gaussian, Laguerre-Gaussian, Ince-Gaussian etc. modes are just different solutions to Maxwell's equations. A nice example how they show up when the length of a cavity is scanned, can be found
here.
Longitudinal modes
It is similar for the longitudinal modes of a cavity. They correspond to the frequencies
$\omega$ which are resonant with the cavity. Of course, a mode can also be defined without a resonator. Take for example a laser pulse with Gaussian temporal envelope. In frequency domain, it is simply a linear combination of waves within a frequency band, with Gaussian weights of its constituents. This is just one example of how to express the temporal shape of a wave in terms of single-frequency waves by
Fourier transform. If you take the transversal and longitudinal modes together, and add the polarization degree of freedom
1, you can express any valid electromagnetic field as "mode".
Coherence
So far you might think that you can combine anything into a solution, and therefore the whole universe could be described by 1 single mode. To understand how anything can be multi-mode, we need to understand the principle of
coherence. If the electric field at two different points in spacetime has a known phase relation, we say the field is coherent between them. So, if we know what's going on in the whole universe (the dream of every physicist), everything would be coherent. Unfortunately, even our most coherent lasers don't have a coherence time significantly longer than
$1\,\text{s}$ (see
here why), i.e. their phase can't be predicted more than
$1\,\text{s}$ into the future.
Transversally, light can be incoherent if it is emitted by several independent sources.
This video from Ben Bartlett shows how an extended light source in the center emits light. Because not all points in the source emit in phase, we see a complex interference pattern (
speckle) with domains of constructive or destructive interference. The size of these domains is given by the transversal coherence length, i.e. by the length over which the phase can be predicted in the transversal direction. In the beginning the video shows the simulation on timescales of the optical cycle. From
0:09-
0:17 it runs faster, on timescales of the coherence time of the source. One can see how the different regions of the source change their relative phase and therefore the domains of constructive/destructive interference shift. Finally, it speeds up to timescales much longer than the coherence time. This is the impression we have from incoherent light sources – just a homogeneous constant intensity.
Quantitatively, the coherence time of a light source is related to its spectrum by the
Wiener–Khinchin theorem. The transversal coherence length can be calculated from its transversal size by the
van Cittert–Zernike theorem. Basically, they both simply describe Fourier relations.
Modes within the turmoil
Even in these chaotic/thermal/incoherent light sources, people define modes. A mode is basically the volume of the 6-dimensional configuration space (position vector
$\otimes$ momentum vector), within which the light is coherent. Most of the time, this boils down to coherence time and transversal coherence length, as in
R. Dändliker – The concept of modes in optics and photonics (2000).
This is important for example for the experiment performed by
R. Hanbury Brown & R. Q. Twiss (1956), because to observe the photon bunching, both detectors must look at the same mode. In
this paper, photon bunching of a thermal light source is measured. To ensure spatial coherence, they pick up some light with a
single-mode fiber2. They then filter the light spectrally to a bandwidth of
$2\,\text{GHz}$ and can therefore detect photon bunching on a timescale of
$\tau_c = 375\,\text{ps}$.
Last, but not least, modes have some interesting thermodynamic properties. With linear optics, you can't increase the average number of photons in a mode.
There is no magical lens which combines the light from several modes into one. This is the reason why many nonlinear optical phenomena, which require
a high number of photons within one mode, were experimentally demonstrated only after the invention of the laser, like
the AC Stark shift or
second-harmonic generation.
1: I'm neglecting the polarization as "7th dimension" of the configuration space here. For the connection between polarization, see this answer on the cross-spectral density.
2: It is called single mode, because the combination of its core size and the maximum divergence (given by the refractive index step between core and cladding) are chosen such that only 1 mode can propagate within the core. Higher-order transversal modes would have a too high divergence.