Booth multipliers require fewer logic resources because they operate on more than one bit at a time. The idea is to break the input into small sets of bits (in the simplest case, 2 bits) and then doing a slightly more creative operation. Generally a multiplier just uses AND gates to perform the actual multiplication, then uses an adder tree to combine the results. A Booth multipler computes multiple bits at the same time, reducing the complexity of the adder tree. One bit gives you K*0 and K*1, which can be computed with an AND gate. Two bits are K*0, K*1, K*2, and K*3. You get K*2 for free (bit shift). K*3 can be made up of K+K*2, but this requires an additional adder. However, K*3 = K*4 - K*1. This just requires building a signed adder tree, which basically just requires extending the sign bits.
This technique can be extended for more than 2 bits at a time. Three bits (radix 8) requires K*0, K*1, K*2, K*3, K*4, K*5, K*6, and K*7. This reduces to K*0, K*1, K*2, K*3, K*4, K*8-K*3, K*8-K*2, and K*8-K*1. However, this now requires an extra adder to precompute K*3 = K*2+K*1.
I actually have a Python script running around somewhere to generate a radix-8 signed Booth multiplier in Verilog. If you run it for a 64x64 multiplier, it generates around 2300 lines of Verilog, which is just about all full adders.
Edit: Never mind, booth-encoded multipliers are rather different from the Booth multiplication algorithm. The above refers to how a Booth encoded multiplier works for implementation in digital logic. Interestingly, I have not found a wikipedia article on the topic. Booth encoding redirects to the algorithm page, but the page has no reference to booth encoding at all, though booth encoding is likely derived from the algorithm in some way. Very strange.
Anyway, the page you linked to contains a description of how the algorithm works mathematically. Any continuous string of set bits can be replaced by one addition and one subtraction, no matter how long the string. The addition is performed at one end of the bit string, and the subtraction is performed at the other. The decision is made not by looking at the LSB, but by looking at two LSBs. If they are the same (00 or 11), no operation is performed. If they are different (10 or 01), then an addition operation is performed, either with one of the multiplicands or its twos complement, which results in a subtraction. Another way to think about it is that instead of just adding one, you add two and subtract one. The added two then causes carries out of all of the adjacent set bit positions until you hit a cleared bit. As for signed multiplication, the algorithm is already set up to perform a signed multiply. The example on the page 3 * -4 = -12, no additional steps are required. However, with a VLSI implementation of a signed, booth-encoded multiplier, you do need to be careful with sign extension to get everything working correctly.