Yes, FPGAs are excellent for implementing massively parallel things.
Many people have put 8 or more CPUs on a FPGA -- it's not merely "in principle".
Check out the floorplan image in the article
"A 24 Processors System on Chip FPGA Design with Network on Chip"
by Zhoukun WANG and Omar HAMMAMI.
That floorplan makes it pretty obvious that that particular FPGA is pretty much packed full of stuff.
The 24 CPU cores -- each one a 32 bit MicroBlaze CPU with 32 KByte total of local instruction and data memory -- fill up roughly half the FPGA (around the perimeter).
The routing between the CPU cores and the 4 independent external buses pretty much fills up all the rest of the FPGA.
(The external buses are each 64 data bits wide plus some control signals, each one leading to an independent DDR2 memory module).
(This particular IC also includes two PowerPC 405 CPU hard cores in addition to the FPGA fabric -- Zhoukun and Omar apparently didn't bother using them).
As other people here have pointed out, dividing "number of gates in a FPGA" by "number of gates in a CPU" is overly optimistic.
In this case, 142,128 LUTs on a Xilinx FPGA Virtex-4 FX140 divided by about 1000 LUTs required for a minimum-size MicroBlaze gives (optimistically) 142 CPUs per chip.
So are you disappointed that apparently "only" 24 CPUs fit in that FPGA fabric (not counting the two PowerPC 405 hard cores outside the FPGA fabric on that IC)?
A 1 million gate FPGA divided by a 50k gate CPU gives (optimistically) 20 CPUs per chip.
I think you will be lucky to squeeze even 4 CPUs onto that FPGA.
"It is amazing what you can squeeze
onto these parts if you design the
machine architecture carefully to
exploit FPGA resources. In contrast,
there was a very interesting article
in a recent EE Times by a fellow from
VAutomation doing virtual 6502's in
VHDL, then synthesizing them down into
arbitrary FPGA architectures.
Although the 6502 design used only
about 4000 "ASIC gates" it didn't
quite fit in a XC4010, a so- called
"10,000 gate" FPGA. That a dual-issue
32-bit RISC should fit, and a 4 MHz
6502 does not, states a great deal
about VHDL synthesis vs. manual
placement, about legacy architectures
vs. custom ones, and maybe even
something about CISC vs. RISC..."
-- Jan Gray
The Wikipedia: "soft processor" article has more information on packing multiple CPUs on a single FPGA.