The requirement that must be satisfied is Hamilton’s Principle
$$
\delta S = \delta\int_{t_1}^{t_2} L \ dt = 0
$$
which states that the variation of the action must vanish. We choose, by definition, that the variation $\delta$ keeps the endpoints fixed. In Hamiltonian mechanics, we can express Hamilton’s principle by rewriting the Lagrangian in terms of the Hamiltonian
$$
\delta S = \delta\int _{t_1}^{t_2} \left( p_i \dot{q}_i - H\ \right) dt = 0
$$
So any alternative choice of coordinates $p_i, q_i \rightarrow P_i, Q_i$ and “Kamiltonian” $H \rightarrow K$ has to satisfy the same variational equation.
$$
\delta\int_{t_1}^{t_2} \left( P_i \dot{Q}_i - K\ \right) dt = 0
$$
So now the question is: what can we do to the action $S$ that keeps the variation $\delta S $ zero? We can either multiply it by a constant $\lambda$ (which Goldstein discusses) or add a new function $F$ which only changes $S$ at the endpoints where there is no variation. To only affect the endpoints, we can add a term within the action integral which is a total derivative. This will only impact the endpoints, which will leave the variation procedure implied by $\delta$ unaffected.
$$
\int_{t_1}^{t_2}
\frac{dF}{dt}dt = F(p(t_2), q(t_2), t_2) - F(p(t_1), q(t_1), t_1)
$$
Note that $F$ is a function of the phase space variables and time. As to your third question, consider what the utility of the generating function would be if it only depends on one half of the coordinates. When there is one from each (i.e. $F(q,Q,t)$), you can find relations like those given in table 9.1 in Goldstein, which allow you to solve for the new coordinates given the old ones. The procedure to find these relations is laid out there as well.