This article is a useful start.
https://en.wikipedia.org/wiki/Matrix_calculus
So throughout the discussions with the kind folks here I think that I've sorted out my confusion, although there wasn't any particular answer that did it. Here I'll analyze my confusion over this notation in case anyone comes to this thread with similar confusions.
The most straightforward answer (IMO) to the question "how do I interpret $\frac{\partial}{\partial(\partial_\mu\phi)}$?" is to interpret as four distinct operators. I.e.
$$
\frac{\partial L}{\partial(\partial_\mu\phi)} = [\frac{\partial L}{\partial(\partial_0\phi)}, \frac{\partial L}{\partial(\partial_1\phi)}, \frac{\partial L}{\partial(\partial_2\phi)}, \frac{\partial L}{\partial(\partial_3\phi)}]
$$
This is essentially what I suggested in my question, but it wasn't obvious to me that it was necessarily correct. The reason I didn't immediately jump to this conclusion when I saw this is that we often (usually?) talk about four-vectors (using the $\mu$ notation $\partial_\mu$) as composite objects, not as elements of composite objects. For example, whether it's strictly speaking correct or not, I have often seen people refer to $\partial_\mu$ as the covariant derivative, not the $\mu$th element of the covariant derivative. Similarly, I see the notation $x_\mu = [t,x,y,z]$ and not $x_\mu = [t,x,y,z]_\mu$, which would more strongly reinforce the idea that $x_\mu$ is referring to a single but arbitrary element of the collection $[t,x,y,z]$. All of these things encouraged me to think of the term $x_\mu$ as the entire collection, not as an arbitrary element of said collection. In fact these seem like incompatible interpretations to me. For example, if I interpret these objects as arbitrary elements of said collections, then the term $x_\nu g_{\nu\mu}$ (which I have directly seen in work before) is perfectly well-defined, while if I interpret $x_\nu$ as a vector and $g_{\nu\mu}$ as a matrix, then it clearly isn't.
However there does seem to be another consistent way to work with these derivatives which does abstract out the $\partial_\mu\phi$ as an entire object, like $x$, for the derivative to be taken with respect to. Sean Carroll briefly describes how to work with this in his book "Spacetime and Geometry". It seems a bit tricky though. He gives the guidelines that, to be careful, to make sure that the "derivitee" is expressed with tensors with subscripts in the same places as the parameter the derivative is with respect to, and to use different subscripts for the two as well. For example,
$$
\frac{\partial}{\partial(\partial_\xi \phi)} (\frac{1}{2}\partial^\mu\phi\partial_\mu\phi) = \frac{\partial}{\partial(\partial_\xi\phi)} (\frac{1}{2}g^{\nu\mu}\partial_\nu\phi\partial_\mu\phi) = \frac{1}{2}g^{\nu\mu}(\frac{\partial}{\partial(\partial_\xi\phi)} (\partial_\nu\phi)\partial_\mu\phi + \partial_\nu\phi \frac{\partial}{\partial(\partial_\xi\phi)} (\partial_\mu\phi))
= \frac{1}{2}g^{\nu\mu}(\delta^\xi_\nu\partial_\mu\phi + \partial_\nu\phi \delta^\xi_\mu) = g^{\mu\nu}\partial_\nu\phi = \partial^\nu\phi
$$
It seems to me like there might be cases in which this sort of approach breaks down, such as if I ever had a Lagrangian (or other thing that I wanted to take the derivative of) like $L = \nabla^2\phi$. Maybe dealing with that's trivial and I just don't see it, but at this point I'd return to my previous interpretation, as while less compact, it seems a little more trustworthy.
Sorry if I seem a bit thick as I try to grapple with this notation but I've had an unusually hard time working with it and this result was not obvious to me. Let me know if I've still got something wrong in here. Thanks all.