To simplify the matters and reduce notational clutter, let's assume we have a 1d chain with nearest-neighbour hopping/interaction, described by the Hamiltonian,
$$
H = t \sum_{\langle i,j \rangle} a_i^\dagger a_j
+ \Big( V \sum_{\langle i,j \rangle} a_i a_j + \text{c.c.} \Big) ~.
$$
To systematically obtain the long-wavelength (low-energy) continuum limit of this Hamiltonian, we should first perform a Fourier transformation to momentum representation, using
$$
\begin{align*}
a_k &= \frac{1}{\sqrt{N}} \sum_{l} e^{i k x_l} a_l \\
a_l &= \frac{1}{\sqrt{N}} \sum_{k} e^{i k x_l} a_k \\
\delta(k) &= \frac{1}{N} \sum_{l} e^{i k x_l} \\
\delta(x) &= \frac{1}{N} \sum_{k} e^{i k x} ~,
\end{align*}
$$
where $l$ is the lattice index, $k$ is the momentum, $N$ is the number of lattice sites, and we have used the ‘natural’ units where $ \hbar = 1 $.
The first term of the Hamiltonian can be rewritten as
$$
\sum_{\langle i,j \rangle} a_i^\dagger a_j
= \sum_{i, n = \pm 1} a_i^\dagger a_{i + n} ~,
$$
because of our assumption of nearest-neighbour hopping on a 1d chain. This can be indeed generalized with little effort to any kind of lattice in any dimension ($n$ becomes a vector then). This paves the way for the Fourier transform:
$$
\begin{align*}
\sum_{\langle i,j \rangle} a_i^\dagger a_j
&= \sum_{i, n = \pm 1} a_i^\dagger a_{i + n} \\
&= \frac{1}{N} \sum_{i,n} \sum_{k, q}
e^{i (q - k) x_i} e^{i q n a} \, a_k^\dagger a_q \\
&= \sum_{k, q, n = \pm 1} \delta(q - k) \, e^{i q n a} \, a_k^\dagger a_q \\
&= \sum_{k} ( e^{i k a} + e^{-i k a} ) \, a_k^\dagger a_k
= \sum_{k} 2 \cos(k a)\, a_k^\dagger a_k ~,
\end{align*}
$$
where we have used the Fourier transform relations above, and $ x_{i + n} \equiv x_i + n a $ with $a$ denoting the lattice spacing.
Similarly, one can show that
$$
\sum_{\langle i,j \rangle} a_i a_j = \sum_{k} 2 \cos(k a)\, a_k a_{-k} ~.
$$
Therefore, the Hamiltonian in momentum representation reads
$$
H = 2t \sum_{k} \cos(k a)\, a_k^\dagger a_k + \Big( 2V \sum_{k} \cos(k a)\, a_k a_{-k} + \text{c.c.} \Big) ~.
$$
Now we can take the long-wavelength limit by assuming that the characteristic wavelength of the excitations, $\lambda$, is much larger than the lattice spacing, $a$; ie., $ \lambda \gg a $ or $ k a \ll 1 $ which allows an expansion of the dispersions in terms of $ k a $:
$$
\cos(ka) = 1 - \frac{1}{2} (k a)^2 + \mathcal{O}\left( (k a)^4 \right) ~.
$$
Neglecting higher-order terms, we end up with an effective long-wavelength Hamiltonian,
$$
\begin{align*}
H_{\lambda \gg a} &\approx 2t \sum_{k} a_k^\dagger a_k - t \sum_{k} (k a)^2 \, a_k^\dagger a_k \\
&+ \Big( 2V \sum_{k} a_k a_{-k} - V \sum_{k} (k a)^2 \, a_k a_{-k} + \text{c.c.} \Big) ~.
\end{align*}
$$
Such neglects of higher-orders can be substantiated via RG arguments; consult eg., Kopietz et al. “Introduction to the Functional Renormalization Group” (2013) [wcat].
It is straightforward to show, via a Fourier transform back to position representation, that
$$
\begin{align*}
\sum_{k} a_k^\dagger a_k &= \sum_{l} a_l^\dagger a_l \\
\sum_{k} (k a)^2 a_k^\dagger a_k &=
a^2 \sum_{l} \frac{\partial}{\partial x_l} a_l^\dagger \,
\frac{\partial}{\partial x_l} a_l \\
\sum_{k} a_k a_{-k} &= \sum_{l} a_l a_l \\
\sum_{k} (k a)^2 a_k a_{-k} &=
a^2 \sum_{l} \frac{\partial}{\partial x_l} a_l \,
\frac{\partial}{\partial x_l} a_l ~.
\end{align*}
$$
Hence,
$$
H_{\lambda \gg a} \approx
2 t \sum_l a_l^\dagger a_l
- t a^2 \sum_l \partial_{x_l} a_l^\dagger \partial_{x_l} a_l
+ \Big( 2 V \sum_l a_l a_l
- V a^2 \sum_l \partial_{x_l} a_l \partial_{x_l} a_l + \text{c.c.} \Big) ~.
$$
or better, using a suggestive notation,
$$
\begin{align*}
H_{\lambda \gg a} &\approx
2 t \sum_l \frac{\Delta x}{a} \, \varphi^\dagger(x_l) \varphi(x_l)
- t \sum_l \frac{\Delta x}{a} \, \partial_{x_l} \varphi^\dagger(x_l) \partial_{x_l} \varphi(x_l) \\
&+ \Big( 2 V \sum_l \frac{\Delta x}{a} \, \varphi(x_l) \varphi(x_l)
- V \sum_l \frac{\Delta x}{a} \, \partial_{x_l} \varphi(x_l) \partial_{x_l} \varphi(x_l) + \text{c.c.} \Big) ~,
\end{align*}
$$
where $ \Delta x \equiv a $, $ \varphi(x_l) := a_l $ and we have introduced the dimensionless coordinates, $ x \mapsto \frac{x}{a} $. Now we can take the continuum limit ($ a \rightarrow 0 $), so that
\begin{align*}
H_{\lambda \gg a} &\approx
\int {\mathrm{d} x} \Big\{ 2 t \, \varphi^\dagger(x) \varphi(x)
- t \, \partial_x \varphi^\dagger(x) \partial_x \varphi(x)
+ \Big( 2 V \, \varphi(x) \varphi(x)
- V \, \partial_x \varphi(x) \partial_x \varphi(x) + \text{c.c.} \Big) \Big\} ~.
\end{align*}
An analogous procedure can be performed in the functional-integral/action formalism, but the steps are the same.
This is essentially a “gradient expansion”. Notice that since the Hamiltonian is quadratic, there is no need for a Hubbard-Stratonovich transformation.