Therefore, does the diffusion equation not require the random jump lengths at random times but only a randomness in the direction of motion?
Yes, a discrete time random walk on a discrete lattice, with randomness only in the direction of motion, is sufficient to obtain a discrete approximation to the continuous diffusion process, and this approximation indeed converges in distribution to the continuous diffusion process in the limit where $a, \tau \to 0$ while $a^2 = 2D\tau$.
The simplest rigorous way (that I know of) to prove this is just to solve both processes (i.e. find the distribution of the particle's position $X(t)$ for all $t > 0$, starting at $X(0) = 0$, for each process) and observe that the solution for the discrete lattice random walk converges in distribution, under appropriate scaling, to the solution for the continuous diffusion process.
Specifically, let $X_1(t)$ denote the position of the particle under the one-dimensional discrete lattice random walk, with spatial step size $a$ and time step $\tau$, at time $n\tau ≤ t < (n+1)\tau$, i.e. after $n$ random steps of $\pm a$ distance. Then the distribution of $X_1(t)$ is a scaled and translated binomial distribution: $$\frac{X_1(n\tau) + na}{2a} = \frac{X_1(n\tau)}{2a} + \frac n2 \sim B\left(n, \frac12\right). \tag{1}$$
Meanwhile, let $X_2(t)$ denote the position of the particle under the continuous diffusion process on one-dimensional Euclidean space with diffusion constant $D$ at time $t > 0$. Then $X_2(t)$ is normally distributed with mean $X_2(0) = 0$ and variance $\sigma^2 = 2Dt$: $$X_2(t) \sim \mathcal N(0, 2Dt). \tag{2}$$
(I'll leave verifying those solutions as an exercise. The one for the discrete random walk is easy enough to derive from first principles, while the continuous diffusion case amounts to deriving the fundamental solution of the heat equation in one-dimensional Euclidean space, which is a common enough exercise in the study of parabolic PDEs.)
Now, by rescaling space by $1/\sigma = 1/\sqrt{2Dt}$, we can transform $(2)$ into $$\frac{X_2(t)}{\sqrt{2Dt}} \sim \mathcal N(0, 1).$$
Meanwhile, for $(1)$, we can use the De Moivre–Laplace theorem, which says that if $X \sim B(n, p)$ and $n \to \infty$, then the distribution of $X_{\text{norm}} = (X-np)/\sqrt{np(1-p)}$, i.e. $X$ scaled and translated to normalize its mean to $0$ and variance to $1$, approaches a standard normal distribution: $$X_{\text{norm}} = \frac{X-np}{\sqrt{np(1-p)}} \overset{\mathcal D}{\to} \mathcal N(0,1),$$ where $\overset{\mathcal D}{\to}$ denotes convergence in distribution. Applying this theorem to $(1)$ — specifically, plugging in $p = \frac12$ and $X = \frac{X_1(n\tau)}{2a} + \frac n2$ — we can see that, as $n \to \infty$, then $$\frac{X-\frac12n}{\frac12\sqrt{n}} = \frac{X_1(n\tau)}{a\sqrt{n}} \overset{\mathcal D}{\to} \mathcal N(0,1).$$
If we now substitute in $n = t/\tau$ and $a = \sqrt{2D\tau}$, we can see that, as $\tau \to 0$ (and thus $n \to \infty$ and $a \to 0$), $$\frac{X_1(t)}{a\sqrt{t/\tau}} = \frac{X_1(t)}{\sqrt{2Dt}} \overset{\mathcal D}{\to} \mathcal N(0,1).$$ Thus, in particular, we see that as $\tau \to 0$, $X_1(t) \overset{\mathcal D}{\to} X_2(t)$.
Ps. Let me briefly mention multidimensional diffusion and random walks, since other answers have touched upon them.
In general, the same discrete approximation works just fine in more than one dimension, and as long as the transition kernel (i.e. the distribution of the particle's position after one time step) is sufficiently symmetric, it will converge to an isotropic diffusion process with no drift. (A sufficient, but not necessary, symmetry condition is that the kernel is invariant under inversion along any lattice axis and under the exchange of any two axes.)
In particular, both of the "four nearest neighbors" and "eight nearest neighbors" kernels mentioned in James's answer are sufficiently symmetric, and will converge to isotropic diffusion in two-dimensional Euclidean space. However, the effective diffusion coefficients will be different for the same lattice step size $a$ and time step $\tau$ depending on the kernel used.
In particular, the "four nearest neighbors" kernel (or, more generally, the "$2k$ nearest neighbors" kernel on a $k$-dimensional lattice) gives an effective diffusion coefficient of $D = \frac{a^2}{2k\tau}$. An intuitive way to see this is to note that each step of the "$2k$ nearest neighbors" random walk is equivalent to the following two steps:
- Choose one of the $k$ lattice axes at random.
- Move the particle by $\pm a$ along the chosen axis.
Thus, in effect, the particle undergoes a one-dimensional random walk along each of the $k$ axes, but only moves along a given axis with probability $1/k$ on each time step. Thus, it's not surprising that the resulting diffusion coefficient ends up being only $1/k$ times what one would get from a random walk in one dimension.
(FWIW, these random walks along each axis, while not independent, are uncorrelated, which is sufficient to yield isotropic diffusion in the limit.)
Meanwhile, the "eight nearest neighbors" random walk can be seen as a variation of the "four nearest neighbors" random walk, except that on each time step we first flip a coin choose whether to move the particle by distance $a$ orthogonally or by $\sqrt2 a$ diagonally. (Alas, this does not generalize neatly to higher dimensions.) Thus, the mean squared step distance (which is what determines the diffusion coefficient) for this random walk is not $a^2$ but rather $\frac{a^2+(\sqrt2a)^2}{2} = \frac32a^2$, and the resulting effective diffusion coefficient is thus $\frac32$ times what it would be for four nearest neighbors (or $\frac34$ times what it would be for one-dimensional diffusion with the same step size), i.e. $D = \frac{3a^2}{8\tau}$.
(Alternatively, you can view the "eight nearest neighbors" random walk as flipping a coin on each time step to decide whether to move the particle by $\pm a$ along just one randomly chosen axis or along both of them independently. Since in this view the particle moves by $\pm a$ along each axis with probability $\frac34$ per time step, we obtain the same result as above.)