Summary
Since this turned into a huge answer, here's the bottom line up front: your starting point by taking $\Omega=\frac{N!}{n_1! n_2! \cdots n_k!}$ was already an approximation to the true number of states. To get the exact $\Omega$, you should sum over possible configurations of occupation numbers, $\Omega = \sum_{n_1, n_2, \cdots, n_k}\frac{N!}{n_1! n_2!\cdots n_k!}$. In the limit of large $N$, that sum becomes dominated by the configuration with all $n_i=1/k$, in which case it becomes a good approximation to take just the log of the one term that you did, apply Stirling's formula, and use the formula $n_i/N = p_i$.
Setup
This answer is going to start off saying a bunch of things that you already know; it's not because I don't think you know them, but so that I can phrase them in my language, because I think the issue is a little subtle, so I want to lay out my perspective carefully. I am going to use units where $k_B=1$
I haven't watched the lecture but here is what I am assuming you are describing. You have $N$ distinguishable particles, and there are $k$ states, each of which has the same energy (so you can say we are working in the microcanonical ensemble where the energy of the system is fixed). Then we need to count how many ways we can distribute the $N$ particles among $k$ states. Given the the occupation numbers $\{n_1, n_2, \cdots, n_k\}$ (meaning, $n_i$ particles in state $i$, with $1\leq i \leq k$), we can come the number of ways $\Omega$ to arrange the particles consistent with those occupation numbers. That will lead to the result you wrote down
$$
\Omega = \frac{N!}{\prod_{i=1}^k n_i!}
$$
As others have said, there are multiple definitions of entropy, which can make things complicated. I am pretty opinionated (possibly overly so) and my point of view is that the best way to understand entropy is through the Shannon (or Gibbs) entropy. This says, given a discrete set of $k$ states labeled by $i$, and a probability distribution over those states $p_i$, then the entropy is
$$
S = - \sum_{i=1}^k p_i \log p_i
$$
(As an aside, defining entropy for a continuous set of states is complicated). The normal "Boltzmann entropy" $S = \log \Omega$ is a special case that applies in the situation where each state has uniform probability. Then $p_i=1/k$, and
$$
S = - \sum_i \frac{1}{k} \log \frac{1}{k} = \log k
$$
where in this case $k$ is the number of microstates.
Exact calculation of entropy
Now we start to make our way back to your problem, but I'm going to approach it a different way, first. We will build to an exact calculation of the entropy in steps.
First, note that a set of occupation numbers $\{n_1, n_2, \cdots, n_k\}$, with $n_1+n_2+\cdots+n_k=N$, can be thought of as a random draw from a probability distribution over $k$ states. In other words, we can assign probabilities $\{p_1, p_2, \cdots, p_k\}$ (with $\sum_k p_k = 1$) to the $k$ states. Then if we randomly assign $N$ particles to the $k$ states, we will get a histogram $\{n_1, n_2, \cdots, n_k\}$. It is not the case in general that $n_i/N=p_i$. For example, if $k=2$, then you can imagine you assigned each particle to one of the two states by flipping a coin. If $N=100$, you don't expect to get exactly 50 heads and 50 tails, you might get 54 heads and 46 tails for instance. But as $N$ gets larger then the ratio $n_i/N$ will approach $p_i$ by the law of large numbers. Where we are headed, intuitively, is that when you used Stirling's approximation you implicitly used the law of large numbers in this sense.
Second, in thermodynamics we are interested in the probability distribution that maximizes the entropy. Here I'm going to introduce the special case $k=2$, so that we can do a few analytical calculations easily.
Now let's start with $N=1$ particle. Suppose the probability of state $1$ is $p$, so the probability of state $2$ is $1-p$. Then the entropy is
$$
S(p) = -p \log p - (1-p) \log(1-p)
$$
You can find the maximum of $S(p)$ by solving $dS/dp=0$; it is an easy calculation and is solved for $p=1/2$, meaning the two microstates are equally likely. (With $k$ states we would find $p=1/k$). Plugging $p=1/2$ into $S(p)$, we find the entropy is $S=\log 2$, or $S=\log k$ for $k$ states, consistent with the Boltzmann entropy. So we see the Boltzmann entropy applies after we have maximized the entropy over a set of probability distributions.
Third, you might have noticed there's a factor of $N$ between $S=\log k$ and $S = N \log k$ that you wrote in your answer. That's because so far we have only looked at $1$ particle. Generalizing to $N$ non-interacting distinguishable particles (for $k=2$ states) is straightforward. Particle $1$ has a probability $p_1$ to be in state $1$ and $1-p_1$ to be in state $2$; particle $2$ has probability $p_2$ to be in state $1$ and $1-p_2$ to be in state $2$, etc. Since the particles are independent, the $N$ particle probability distribution is just the product of the $N$ $1$ particle distributions. So the probability that particle $1$ is in state $i_1$ (which could be $0$ or $1$), particle $2$ is in state $i_2$, etc, is
$$
p(i_1, i_2, \cdots, i_N) =
\left(p_{i_1} \delta_{i_1, 1} + (1-p_1) \delta_{i_1, 2}\right) \left(p_{i_2} \delta_{i_2, 1} + (1-p_2) \delta_{i_2, 2}\right)
\cdots \left(p_{i_N}\delta_{i_N, 1}+ (1-p_N) \delta_{i_N, 2}\right)
$$
where $\delta_{a,b}=1$ if $a=b$ and $0$ otherwise.
So the overall entropy as a function of $p_1, p_2, \cdots, p_N$ is
$$
S(p_1, p_2, \cdots, p_N) = \sum_{i=1}^N \left(-p_i\log p_i - (1-p_i) \log(1-p_i)\right)
$$
You can again maximize this as a function of $p_1, \cdots, p_N$, and find that the maximum entropy distribution has $p_1=p_2=\cdots=p_N=1/2$. In other words, each particle has a $1/2$ probability to be in either state $1$ or in state $2$. However, the configuration of the whole system involves specifying the state of \emph{each} particle. Which means there are $2^N$ total states. Since each of these states is equally likely (you can either say that's an assumption of thermodynamics, or you can say that we argued this is the maximum entropy distribution), that means the probability of each state is $2^{-N}$. That means the overall entropy of the maximum entropy distribution is
$$
S = N \log 2
$$
You can calculate this a number of ways. First, you can take the log of the number of microstates $2^N$. Second, you can evaluate $-\sum_i p_i \log p_i$ with $2^N$ states each with probability $2^{-N}$ (ie, evaluating $S(1/2, 1/2, \cdots, 1/2)$). Third, you can use the fact that entropy is additive, meaning that if $p(x,y)=p(x)p(y)$, then $S[p(x,y)]=S[p(x)]+S[p(y)]$, so the entropy for $N$ non-interacting particles is just $N$ times the entropy of $1$ particle, or $N \log 2$.
The main reason I went through this in such explicit detail is to point out that there are states in the above distribution that don't have $n_i/N=p_i=1/2$ (where $n_i$ is the occuptation number, remember since $k=2$ we have $n_1$ and $n_2$ with $n_1+n_2=N$). For example, there is a non-zero probability that all $N$ particles are in state $1$ and zero are in state $2$. (In the notation above, this corresponds to $i_1=i_2=\cdots=i_N=1$). The probability is very small for large $N$ -- specifically $2^{-N}$. But nevertheless, those states exist in the ensemble. In the next section, we'll show that the core of your question is that the approximation you made was to ignore those states, and at large $N$ those states can be ignored.
Approximate calculation of entropy at large $N$
So, now, finally, let's come back to the calculation you did for $\Omega$. Let's again fix $k=2$ so we can simplify the algebra. Let's suppose that we are looking at the maximum entropy distribution, with $p=1/2$. You argued that
$$
\Omega = \frac{N!}{n_1! (N-n_1)!} = \binom{N}{k}
$$
is the number of states. In fact, this is only the number of states for one particular way of arranging the particles. In general, the real value of $\Omega$ involves summing over all the ways we can arrange the particles
$$
\Omega = \sum_{n_1=0}^N \binom{N}{k} = 2^N
$$
where the second equation follows from the sum of binomial coefficients. Physically it's easy to understand this result; given that we have $N$ distinguishable particles and $2$ states, we can decide the configuration of each independently, so we get $2^N$ possibilities. Since we assume each configuration is equally likely, the entropy reduces to the Boltzmann entropy, and
$$
S = \log \Omega = N \log 2
$$
as we expect. (And as we calculated multiple ways in the previous section.)
What happens at large $N$ is that the occupation numbers $n_i/N$ becomes sharply peaked around the probabilities $p=1/k$. So the sum over $n_1$ in $\Omega$ can be approximated by just the term with $n_1/N=n_2/N=1/2$. So
$$
S = \log \Omega = \log \sum_{n_1} \frac{N!}{n_1!(N-n_1)!} \approx \log \frac{N!}{((N/2)!)^2} = \log N! - 2 \log\left(\frac{N}{2}\right)!
$$
Then we can use Stirling's approximation $\log N! = N \log N - N$ to evaluate the factorials at large $N$
\begin{eqnarray}
S &\approx& N \log N - N - N \log \frac{N}{2} + N \\
&=& N \log 2
\end{eqnarray}
Since all these approximations become better and better at large $N$, the exact calculation we did of the entropy of the maximum entropy distribution above, agrees with the approximate calculation of the entropy using Stirling's approximation plus the limiting value $n_i/N=p$, in the large $N$ limit.
Appendix: A different (but related) problem
As a technical aside not directly related to your question, one technical point I found surprising while I was writing this up is that there is a subtle distinction between the entropy of the binomial distribution, and the entropy of the microcanonical ensemble for $k=2$ we looked at above. The ultimate reason is that the binomial distribution is appropriate for identical bosons at zero temperature with a degenerate ground state, whereas above we were looking at distinguishable particles with a fixed energy for the total system.
The binomial distribution tells us the probability of getting $n_1$ successes, given $N$ trials, and that the probability of success is $p$. The distribution is
$$
p(n_1) = \binom{N}{n_1} p^{n_1}(1-p)^{N-n_1}
$$
and using Stirling's approximation, one can show for large $N$ the entropy is approximately
$$
S \approx \frac{1}{2} \log\left(2\pi e N p(1-p)\right)
$$
The maximum entropy distribution has $p=1/2$, in which case
$$
p(n_1) = \binom{N}{n_1} \left(\frac{1}{2}\right)^{n_1 + N - n_1} = \binom{N}{n_1} 2^{-N}
$$
in which case using Stirling's approximation the entropy is approximately
$$
S = \frac{1}{2} \log\left(\frac{\pi e N}{2}\right)
$$
This grows as $\log N$, not like $N$ as we saw above.
The issue is that the binominal distribution only counts the number of successes, and doesn't distinguish between trials. In other words, the binomial distribution tells you the probability of getting $1$ head out of $100$ trials, not the probability of the first trial being heads.
In the microcanonical ensemble used in the main question, each of the $\Omega=2^N$ states has probability $2^{-N}$. We have $2^N$ states because we can assign each particle to be in state $0$ or $1$ independently. This counting is not the same for identical bosons.