The point is that if the dynamics is gauge invariant, i.e. the equation of motion being gauge invariant, the gauge invariant quantities will not feel those transformations, and then we are allowed to identify the gauge transformations as a redundancy of our description, such that it maps a physical state to the same physical state:
$$
|\psi\rangle\cong |\psi\rangle + G|\psi\rangle
$$
where $G$ stands for the gauge generator.
In order to do this we only need the action to be gauge invariant up to equations of motion, since this is sufficient to guarantee that the equations of motion are gauge invariant. In other words, if
$$
S[\phi + \delta_G\phi]=S[\phi] + \frac{\delta S[\phi]}{\delta\phi_{\alpha}}F_{\alpha}[\phi]
$$
then $\phi_{\alpha}^{*}$ being a solution for the equations of motion:
$$
\frac{\delta S[\phi]}{\delta\phi_{\alpha}}=0
$$
implies that $\phi_{\alpha}^{*}+\delta_{G}\phi_{\alpha}^{*}$ is also a solution, for any gauge transformation.
Note that since the gauge transformation is local not only in space but also in time, the gauge invariance implies that are free function of time $f(t)$ that is not fixed by the equations of motion, i.e. for each solution $\phi^{*}_\alpha$the there is a family of solutions of the type
$$\phi_{\alpha}^{*}(t)+f(t) g[\phi_{\alpha}^{*}]$$
where $f(t)$ is a arbitrary function.
This leads to problems of predicting the future using this equations plus initial conditions since we can use this free function of time to get a whole family of future states associated with a single initial state. The identification of states that are related by a gauge transformations resolve this problem.
The relation with this and Lorentz invariance is that the representation of the Lorentz group by the quantum states and the local fields are different. The first one is unitary and second is finite. The map between this two usually involve some kind of constraints that comes from the equations of motion.
For the case of mapping massless states with spin grater than $1/2$ to local fields with the same spin, constraints are not enough. There is a topological obstruction to that. One way out is to consider a one-to-many map, and deal with the "many" as a redundancy. Another way out is to consider field strengths with higher spin. For the case of spin $1$ we can consider a $2$-form satisfying the constraints:
$$
dF=d*F=0
$$
Turns out that the map between states and fields set the dimension of the field, and here the field has too big dimension such that renormalizable interactions with matter cannot be build. This oblige us to consider the one-to-many map, where now the field satisfy a constraint with a redundancy:
$$
d*dA =0 \qquad A\cong A+ d\chi
$$
The dimension of $A$ is smaller than $F$ allowing to construct a unique renormalizable interaction by:
$$
V= \int j^{\mu}A_{\mu}
$$
where $j^{\mu}$ is a conserved current.