7

Let's say I have a 1 million gate FPGA. I've found a few processors on OpenCores which only require 50k gates. So would it be possible to make, say, an 8 core processor with such an FPGA? Or are there limits to FPGAs when it comes to implementing massively parallel things?

Sorry if it sounds newbish, but I am a newb to FPGAs.

Thomas O
  • 31,786
  • 58
  • 184
  • 322

6 Answers6

8

You're basically asking two questions:

-- "Are FPGAs as powerful as the number of gates they have?"

I'd actually say that the answer is "no". The size tells you how "big" they are, but not how "powerful" they are, because:

  1. "powerful" is very subjective to your end application.
  2. "powerful" is typically more associated with the technology node than the die's size (a small device from current generation can be the same size as one from an older generation -- are they equally "powerful"?)

-- Paraphrasing: "Can I divide the number of required "gates" by the number of total "gates" and get the number of cores I could fit in the device?"

The answer to this is also "no". This is because FPGAs are routing limited, which means that:

  1. Even though there may be enough logic resources on the chip, the software won't be able to route that many cores, particularly if there's connectivity overhead.
  2. The design, overall, is likely not to meet the timing performance of the individual core.

Another problem is how to handle off-chip I/O -- will there be enough bandwidth and number of I/O to supply/read data from that amount of cores?

(BTW, we're trying to start an SE site dedicated to FPGAs... consider supporting it... http://area51.stackexchange.com/proposals/20632/programmable-logic-and-fpga-design?referrer=YmxhQ2OJUo-FAaI1gMp5oQ2)

Saar Drimer
  • 868
  • 1
  • 8
  • 12
5

Yes in principle. Although if you want them to talk to each other, it may use quite a lot more logic for arbitration, bus switching etc. and when running busses around the FPGA a lot of gates and connection capacity may get used up for routing.

mikeselectricstuff
  • 10,683
  • 34
  • 34
4

You'll find gate counts are barely estimations, rather than simple facts about the FPGAs. Looking more closely, the gate counts advertised will include things like using portions of block RAM for logic, and the logic you synthesize will not be reduced to a single standard gate type as if you were building with nothing but 2-input NAND gates, since the logic blocks are a fair bit more complex - typically featuring LUTs with 4 or more inputs for each register.

So the real question is how efficiently the synthesis software can map your specific design to your specific chip. You probably can do an 8 core processor easily - if the cores, and just as importantly their interconnects, suit the chip well enough. It's quite possible a design fails for lack of signal routing rather than lack of logic.

Yann Vernier
  • 2,824
  • 18
  • 15
2

In addition to on-die peripherals (which can make quite a large difference in "power" with things like Block RAM, embedded multipliers, etc), and clock speed/"timing closure", another limiting factor of FPGAs is quite often the pin count.

Sure, you can put 8 cores into an FPGA, but then you have to get those 8 cores to talk to the outside world. To make matters worse, once you get beyond a few hundred pins you have to use a BGA package, which is much more difficult to design a PCB for.

ajs410
  • 8,491
  • 5
  • 35
  • 43
2

Yes, FPGAs are excellent for implementing massively parallel things. Many people have put 8 or more CPUs on a FPGA -- it's not merely "in principle".

Check out the floorplan image in the article "A 24 Processors System on Chip FPGA Design with Network on Chip" by Zhoukun WANG and Omar HAMMAMI.

That floorplan makes it pretty obvious that that particular FPGA is pretty much packed full of stuff. The 24 CPU cores -- each one a 32 bit MicroBlaze CPU with 32 KByte total of local instruction and data memory -- fill up roughly half the FPGA (around the perimeter). The routing between the CPU cores and the 4 independent external buses pretty much fills up all the rest of the FPGA. (The external buses are each 64 data bits wide plus some control signals, each one leading to an independent DDR2 memory module).

(This particular IC also includes two PowerPC 405 CPU hard cores in addition to the FPGA fabric -- Zhoukun and Omar apparently didn't bother using them).

As other people here have pointed out, dividing "number of gates in a FPGA" by "number of gates in a CPU" is overly optimistic. In this case, 142,128 LUTs on a Xilinx FPGA Virtex-4 FX140 divided by about 1000 LUTs required for a minimum-size MicroBlaze gives (optimistically) 142 CPUs per chip. So are you disappointed that apparently "only" 24 CPUs fit in that FPGA fabric (not counting the two PowerPC 405 hard cores outside the FPGA fabric on that IC)?

A 1 million gate FPGA divided by a 50k gate CPU gives (optimistically) 20 CPUs per chip. I think you will be lucky to squeeze even 4 CPUs onto that FPGA.

"It is amazing what you can squeeze onto these parts if you design the machine architecture carefully to exploit FPGA resources. In contrast, there was a very interesting article in a recent EE Times by a fellow from VAutomation doing virtual 6502's in VHDL, then synthesizing them down into arbitrary FPGA architectures. Although the 6502 design used only about 4000 "ASIC gates" it didn't quite fit in a XC4010, a so- called "10,000 gate" FPGA. That a dual-issue 32-bit RISC should fit, and a 4 MHz 6502 does not, states a great deal about VHDL synthesis vs. manual placement, about legacy architectures vs. custom ones, and maybe even something about CISC vs. RISC..." -- Jan Gray

The Wikipedia: "soft processor" article has more information on packing multiple CPUs on a single FPGA.

davidcary
  • 17,606
  • 11
  • 67
  • 117
1

Yes, sort of.

In addition to the differences in on-die peripherals (RAM, IO buffers, etc...) You also need to consider the fact that different FPGAs are rated for different clock-speeds.

You may have two 500k gate FPGAs, but if one has a maximum clock of 50 MHz, and the other goes up to 1 GHz, one is clearly more powerful than the other.

Now, it's generally not that simple in the real world, as FPGAs are more limited by propagation delays than pure clock speed, but different devices have faster or slower logic units, which change how complex your logic can be without having to use synchronous buffering or encounter metastability issues.

Connor Wolf
  • 32,168
  • 6
  • 79
  • 138