My bottom line up front answer to your question is that renormalization is a complicated subject because there different approaches to solve the same basic technical problem, but focused on elucidating different aspects of the solution. It is a common situation in field theory that you cannot simultaneously see all the important properties of a system from one approach. For example, the path integral makes Lorentz invariance manifest but unitarity is very hard to see, whereas the Hamiltonian formalism makes unitary manifest but Lorentz invariance is far from obvious. It is much the same with renormalization; there isn't a good framework where you can both easily understand conceptually what is going on physically, and perform calculations "optimally" in some sense. So my answer to your question is that it is very easy to end up in a conceptual quagmire when studying renormalization, but ultimately everything does make sense if defined and thought of properly.
More detailed answer:
As is common in field theory, there are multiple tools used to understand what is going on, that overlap but have different purposes.
First, just to define things in my terms, fundamentally to talk about renormalization we need to talk both about a regularization scheme and a renormalization scheme. Wilsonian renormalization fundamentally is built around the idea of a cutoff regulator. The sliding scale $\mu$ that you introduce in your second bullet point, in my mind, is associated with dimensional regularization, and arises formally for dimensional analysis reasons.
With that in mind, and oversimplifying, here are what I consider the main goals of Wilsonian renormalization vs dimensional regularization:
- Wilsonian: The main purpose of Wilsonian renormalization is philosophical. First, Wilsonian renormalization gives us a physical interpretation of what renormalization is all about. We don't know what the UV physics is, but we can parameterize our ignorance by integrating out physics above the scale $\Lambda$ (the cutoff), and we find that we can define effective coupling constants such that no low energy observables depend on $\Lambda$, except through these coupling constants. Second, Wilsonian renormalization lets us study the UV completeness of the theory, by asking if it is possible to send $\Lambda\rightarrow\infty$. Conceptually, $\Lambda$ is the cutoff of the theory. It is meant to be an energy scale that separates the "low energy" regime where we think the effective theory works, from the "high energy" regime where we don't know the physics. This $\Lambda$ should be larger than energy scales we are interested in, so that power counting arguments about divergences can apply, because $\Lambda$ should be the largest scale in the calculation (even though in the end it will cancel out after we renormalize).
- Dimensional regularization: The main purpose of dimensional regularization is practical. Dim reg is a nice trick for renormalizing theories in a Lorentz invariant way, and is doubly useful when we talk about gauge theories because it respects gauge invariance. Furthermore, we can choose the sliding scale $\mu$ to be equal to the energy scale we are interested in, in order to resum large logs of the form $\log E/\mu$. In other words, we can choose our parameter $\mu$ in a clever way, to make the perturbative expansion of QFT converge faster (put more of the calculation into the "tree" part that is more intuitive for humans, than into the "loop" part).
Both approaches are valid for what they are trying to do, but you are right that we shouldn't equate $\Lambda$ and $\mu$. They serve different purposes.
However, the approaches are connected, since they are both ultimately approaches to the same underlying technical issues.
- First, let's bring in that third scale, $M$, which I will think of as the mass of a particle. In the effective field theory framework, we can have a theory at energies above $M$, where that particle should be a degree of freedom in the theory. We can also have a theory at energies below $M$, where that particle should not be a degree of freedom in the theory, and $M$ simply contributes to the renormalized low energy coupling constants. We can describe this transition from energies above or below $M$ in either of the two renormalization pictures described above. As you slide $\Lambda$, or $\mu$, from above $M$ to below $M$, you go from including the particle to not including the particle in the effective action. You can determine how $M$ enters into the coupling constants of the effective theory without the massive particle, by performing a matching calculation, where you compute an observable in the effective theory with $M$ and without $M$, and match the coupling constants of the low energy theory to the coupling constants of the high energy theory. These threshold corrections that help you generate a new theory are where you rigorously see things like the hierarchy problem show up (of course the hierarchy problem is an aesthetic issue, not a mathematical contradiction, but you can phrase the problem in a way where arbitrary cutoffs don't appear and you are only talking about physical scales, and IMO this is the clearest way to see the issue).
- Second, at least to one loop, the beta function does not depend on the renormalization scheme, so if you compute beta functions to one loop, which are actually observable (unlike power law divergences), Wilsonian renormalization with a cutoff and dimensional regularization should agree.
- Third, at a philosophical level, the reason that dimensional regularization works, even though Wilsonian renormalization gives a much clearer picture of why we're doing renormalization, is that the "error" in using dimensional regularization is to include high energy modes above $\Lambda$, but the point of renormalization is that nothing in the low energy theory depends on what is happening above the cutoff. So while the two approaches are different, they are equivalent from the point of view of the low energy theory, which is what we care about.
My thinking on this topic is very much influenced by this paper by Burgess, which I cannot recommend highly enough: https://arxiv.org/abs/hep-th/0701053