Firstly, we can only say anything about the quantum world in as far as we can make observation of it. For that reason, the underlying mechanisms that cannot be directly observed always rely on how we interpret the observations. So, unless we can find a logically unambiguous way to relate observations to an understanding of the underlying mechanisms, these interpretations (of which there are many) will always remain questionable.
Nevertheless, from what we do observe we can make some conclusions about the nature light. The idea of photons came from the observations of black body radiation and the photoelectric effect (among others). Based on these, Planck and Einstein (among others) concluded that light consists of quantized bits of energy. Strictly speaking, they could only make this statement conclusively about the nature of interactions between light and matter. In other words: light is always radiated or absorbed in such quantized bits, which came to be known as photons. The consequences is that after being produced from radiation, light must consist of such photons.
Are photons localized like classical particles? That we cannot say because whenever they are radiated or absorbed, it is the atoms or molecules from which they are radiated or by which they are absorbed that provide the localization, making these event appear as localized events. So, we can think of photons as having the same wave structure as a classical light wave would have, but having only a fixed quantized amount of energy. That is why people often refer to photons as single excitations of the wave.
If photons are not localized, how can they be absorbed at localized points? That is a bit more difficult to explain (getting more technical). It requires the concept of quantum superposition. Basically what happens is that the wave of a single photon can be represented as a superposition of localized wavelets, each representing the complete quantized energy of the photon, but multiplied by a complex coefficient representing the probability amplitude for that wavelet. Only the one wavelet that is aligned with the absorbing atom or molecule will be absorbed, with a probability given by the magnitude squared of the coefficient. So the notion of collapse it not necessary.
How many photons are there in an optical wave? That is easier to answer. If you measure the energy or power (energy per unit time) in the optical wave and divide it by the energy per photon you get the average number of photons (or number per unit time) in the wave. However, due to the concept of superposition, we cannot in general fix the number of photons in an optical field as an exact number.
Hope it is a bit clearer.