1: At first, I thought the resolution was that storing information in the demon's memory corresponded to an increase in the demon's entropy.
Maxwell's daemon is a thought-experiment concept, an hypothetical device or a being which filters particles of gas in such a way that a violation of 2nd law is achieved. More specifically, the simplest case is that Maxwell's daemon filters in/out the appropriate particles through an opening joining two compartments, and achieves systematic decrease of entropy of the whole system, without transferring the same or higher amount of entropy to the outside environment. If it did had such a side effect, it would be just a refrigerator, and as such it would not violate 2nd law.
Some people are offended by the idea such device may exist and work as advertised, and try to provide all sorts of subtle arguments for why it cannot work. Your use of resolution points to such attitude.
However, I think this is a misconceived goal, as the primary argument against it is known - the 2nd law itself. Any attempt to go deeper will sooner or later collide with the mathematical fact that the underlying mechanics, classical or quantum, allows for violations of 2nd law. They're just not usually observed in our world, and given some plausible assumptions, they can be shown to be a very improbable thing to happen. However, Brownian motion of several particles, after a long time, can produce what looks like a state with lower entropy. So nobody actually believes decrease of entropy is absolutely impossible. Instead, many people believe it can happen, but only transitionally, as a fluctuation, and that it can't be amplified into a macroscopic decrease of entropy which would allow us to harvest it to get useful work out of it.
If you store N bits of information, then there are 2^N possible configurations, and thus an entropy of S=Nkln(2).
That is actually information entropy of a probability distribution which assigns equal probability $1/{2^N}$ to each register state, multiplied by $k_B$, as the information entropy is:
$$
\sum_{i\in \text{states of register}} -p_i \ln p_i = \sum_{i=1}^{2^N} \frac{1}{2^N} \ln 2^N = N\ln2 .
$$
It would also be the value of the Boltzmann entropy of the $N-$bit register, if all of the $2^N$ register bit states were all equally weighed microstates compatible with the same macrostate. But that is not usually the case, and I see no scenario where it would be the case - usually, different bit states have different energy, and thus they correspond to different macrostates, and can't be equally weighed.
Thermodynamic entropy of the bit register may be very different, depending on the actual state of the bits. If one bit in bit state 0 contributes thermodynamic entropy 0, and one bit in state 1 contributes thermodynamic entropy $\sigma k_B$, then the actual thermodynamic entropy of the $N-$ bit register is $m\sigma k_B$, where $m$ is the number of bits in state 1. This may be very different from the information entropy above.
If close to half the bits are 0's, and half are 1's, which is quite likely if the bits are set up randomly, thermodynamic entropy of the register is given approximately by
$$
\frac{N}{2} \sigma k_B.
$$
However, $\sigma$ may be very different from $2\ln 2$ (usually, it is much higher).
But everyone talks about how deleting information corresponds to an entropy increase in the environment, and since the demon's memory is finite, it must eventually delete info to have space for more. But why is this necessary to save the 2nd law, seeing how storing information is enough?
Actually, the daemon resetting to its original state is necessary in order for us to be sure the device violates 2nd law. If the daemon just decreases entropy of the gas, and increases its own by the same or greater amount, that is just a strange refrigerator, it does not violate 2nd law. To make the device fulfill the role of Maxwell's daemon and violate 2nd law, it has to decrease gas entropy without increasing its own entropy by the same or higher amount, or dumping the same or higher entropy into environment.
Also, why can't you just imagine having a demon with a memory big enough to store all the N bits of info, making it unneccesary to delete?
We can imagine that, but then the device is just a fridge, not a Maxwell's daemon.
It's like with isothermal expansion of ideal gas powered by a single thermal reservoir. It can have 100% efficiency, but it is not a violation of 2nd law, because this process is not a cyclic process; the gas only expands, it never compresses. So it's not an example of violation of 2nd law, however strange it seems that we turned 100% heat into work.
2: I have also been starting to think that there is a subtle difference between the statistical mechanical entropy and the Shannon information entropy. In the case of statistical entropy, we talk about the number of microstates in a given macrostate, and an important postulate for this formulation is that each microstate is equally likely, and that due to the internal dynamics of the system, the microstates change with time, such that the system spends an equal amount of time in each state.
Yes, all microstates with the same energy are assumed to have the same probabilities. The idea these probabilities hae something to do with time spent in those microstates is called quasi-ergodic hypothesis, and both validity of this idea, and also its need in arguments of statistical physics is very problematic. If possible, it's best to avoid relying on it in arguments.
But in the case of storing memories/information, the configuration of the N bits don't change with time, i.e. there is no internal dynamics to ensure that the configuration of ones and zeros change.
Indeed! Methodologically, we describe this by treating the register state as a stable macrostate, which changes only during memory storage/erasing operation, when the bits change. What can change with time continuously are the microstates realizing that bit state.
So for the same reason that the entropy at absolute zero is zero because the particles are "frozen" into a single microstate, shouldn't the statistical entropy corresponding to either storing or deleting info be zero since the ones and zeros are "frozen" into a single configuration?
No, because the fact the register macrostate is stable does not mean it has zero entropy, it just means it has constant entropy; changing bits requires energy and entropy change.