There are multiple ways to justify this identification.
- The first one is to recognize that in thermodynamics the Helmholtz free energy is the right thermodynamic potential to figure out spontaneous evolution of your thermodynamic system (in virtue of the second principle of thermodynamics) at fixed (N,V,T). In statistical mechanics, this is expressed by the canonical ensemble and the corresponding statistics associated to it. If you were to have a mesoscale-observable $X$ about which you want to know the probability distribution for $X$ to have value $x$, then you will compute
\begin{equation}
p(x|N,V,T) = \frac{\sum_{\{ i; \:X(i) = x\}} e^{-\beta E_i}}{e^{-\beta F(N,V,T)}}
\end{equation}
now, the sum at the numerator can be thought as a partial partition function (with for instance an additional potential that biases all the micro states $i$ so that $X(i) =x$). We can then define some kind of partial free energy $F(x|N,V,T)$ such that
\begin{equation}
p(x|N,V,T) = \frac{e^{-\beta F(x|N,V,T)}}{e^{-\beta F(N,V,T)}}
\end{equation}
We then see that the most probable value of the meso-observable $X$ to be observed is the one that will minimize the "partial" free energy at fixed (N,V,T); thus the logarithm of the partition function plays a role equivalent to that of the Helmholtz free energy in usual thermodynamics when the second principle is here replaced by "most probable value".
- The second one is more direct and more formal. It consists in noticing that by definition
\begin{equation}
\langle E \rangle = \sum_i E_i \frac{e^{-\beta E_i}}{Z(N,V,T)}
\end{equation}
and then write the tautology $E_i = -k_b T \ln e^{-\beta E_i} = -k_b T \ln \left( e^{-\beta E_i} \frac{Z(N,V,T)}{Z(N,V,T)} \right) = -k_B T \ln p(i|N,V,T) -k_B T \ln Z(N,V,T)$.
Upon replacing in the above formula we get:
\begin{eqnarray}
\langle E \rangle &=& -k_B T \sum_i p(i|N,V,T) [\ln p(i|N,V,T)+ \ln Z(N,V,T)]
\end{eqnarray}
Now, defining the entropy as being the Shannon entropy of the distribution i.e
\begin{equation}
S(N,V,T) = -k_B \sum_i p(i|N,V,T) \ln p(i|N,V,T)
\end{equation}
we get
\begin{equation}
-k_B T \ln Z(N,V,T) = \langle E \rangle - TS(N,V,T)
\end{equation}
By comparison with the definition of the Helmholtz free energy, it comes naturally that $F = -k_BT \ln Z$.
- The third and last one for this answer consists in using the role of generating function of the cumulants played by the logarithm of the partition function (that I discussed quite in detail here) to retrieve partial derivatives which bear a thermodynamical significance (like internal energy, chemical potential and os forth) and then again make a strong parallel with the Helmholtz free energy.