In this case, the difference is that dd is constrained to reading 4096-byte blocks at a time, since you have used bs=4096. The likely effect is that dd will be much, much slower than cp. Try with a larger block size (10M, 50M?).
The particular buffer size that's best suited for the current devices might be different from cp's (or cat's). You can't easily control cp's buffering. dd's utility shines when:
- you have very large devices to copy, so that experimenting to determine the best block-size is worthwhile.
- you have to copy only part of a disk. You can specify
count to limit how many blocks are copied.
- you want to resume an interrupted copy. You can't do so with
cp, but you can try with dd, by using the seek and skip options.
you want to pipe it to the standard input of something (admittedly, cat will work here too):
dd if=/dev/sda bs=10M | ssh host dd of=/dev/sdb
dd usefulness is very well discussed in this Unix and Linux post:
dd vs cat — is dd still relevant these days?