You are quite correct that the conduction electrons in a metal are delocalised. They are approximately described by a function called a Bloch wave. In principle the electrons are delocalised over the whole piece of metal, though in practice they are delocalised over the distance between scattering events, which I think is generally about $10$ to $100\textrm{nm}$ though I don't have firm figures for this and it will depend on the metal and the temperature. But photons are also delocalised, so what we get is an interaction between two delocalised objects that changes the state of both.
I think where the problem arises is in your use of the word collide, because the interaction is not like a macroscopic collision e.g. between two billiard balls. Neither the electron nor the photon are point particles. Instead what happens is the oscillating electric field of the photon interacts with the electron and the two particles become entangled. In this state we no longer have two distinct particles as the wavefunction describing the two of them cannot be factored into an electron part and a photon part.
The combined wavefunction is time dependent and can evolve in different ways. One possibility is that it will evolve back into the original photon and electron states and there won't be any interaction between the photon and electron. Another possibility is Compton scattering where the state evolves back into a photon and electron but now their energies have changed, and a third possibility is that it evolves into a state describing a free electron and no photon. This third outcome is what produces the photoelectron.
I don't think there is a useful analogy to be made with a water wave as it isn't obvious what the equivalent of the entangled state would be for a water wave.