The particular value of the scaling exponent $\nu$, and the prefactor $A_\xi$ will depend on the lattice, and dimension you use to calculate, but the form $(p - p_c)^{-\nu}$ holds in a wide variety of geometries. For ease of calculation, I'll show how it works on the Bethe lattice, and how it's symmetrical about the critical point.
First, look at the Bethe lattice

Notice that is has no loops, so there is a unique path from any site $i$ to any other site $j$.
The probability that any two sites $i$ and $j$ are connected is equal to the probability that each is occupied ($p\times p = p^2$), times the probability that every site on the path connecting them is occupied, which is $p^{d(i,j)}$, where $d(i,j)$ is the number of sites between sites $i$ and $j$.
Let's say that site $i$ is the origin, and $j$ is some point $r$ sites away, then there are $r-1$ sites between them, and the probability they're connected is $c(r) = p^2\times p^{r-1} = p^{r+1}$. We need to multiply this by the number of points that are $r$ sites away from the origin which, for the lattice in the figure, is $3\times 2^{r-1}$.
Thus, we have the correlation function given by $$g(r) = 3\times2^{r-1}p^{r+1}$$
We can rewrite this as $$g(r) = \frac{3}{2^2}\left(2p\right)^{r+1} \sim e^{\left(r+1\right)\ln 2p}$$
Clearly, the correlation shrinks to zero when $2p < 1$, and becomes infinite when $2p > 1$, so we have $p_c = 1/2$.
Now, we can say $$g(r) \sim e^{r\ln p/p_c}$$ which resembles the form in your question, we just need to introduce a minus sign to make the correspondence complete.
The correlation length is defined to be the distance at which the probability of a site being connected to the origin falls to a level $1/e$, this is quite clearly given by $$\xi = -\frac{1}{\ln (p/p_c)} = \frac{1}{\ln (p_c/p)}$$ which for $p$ close to $p_c$ can be expanded using the Taylor series for $\ln(1+x)$, a la
$$\ln p/p_c = \ln \left[ \left(p_c - p\right)/p + 1\right]\approx \frac{p_c - p}{p}$$ Thus, we finally have $$\xi \approx \frac{p}{p_c - p} \propto \left(p_c - p\right)^{-1}$$
Note that each site in the Bethe lattice used here branch out to 3 other sites. If instead, each site branched to $z$ sites, we'd just replace every 3 and 2 in the above argument with $z$, and $z-1$, and our result would be general (and you'd find $p_c = \left(z-1\right)^{-1}$).
Now, the correlation length obviously goes to infinity for $p > p_c$, however, if you focused on the correlation length of sites that are in finite clusters (i.e. ignore the infinite clusters), then you'd get the same form for the correlation length. As you go above the critical point, the probability of being in a large, finite cluster gets smaller and smaller, and it is only likely to be in an infinite cluster, or in a very small finite cluster. Obviously, the average finite cluster size decreases until $p=1$ when it becomes zero (everything is in the infinite cluster). This is not a quantitative argument, but demonstrates the bi-directionality of the correlation length scaling.