I read two enlightening papers by professor Bruce Sherwood that give many details which can answer the questions mentioned above. I'll link to those papers at the end. Here are the points to mention:
1. What is the work-energy theorem?
There are some ambiguities in defining this theorem:
- Does it apply to compound systems?
- What about non-rigid bodies?
- Does it include internal works and energies? etc.
I realized that a lack of precise definition for this theorem leads to abusing it. My textbook (Halliday and Resnick) defines the "Work-Kinetic energy theorem" this way:
change in the kinetic energy of a particle = net work done on the particle
The energy of a particle is merely kinetic, so it makes sense to say that all the work done on a particle contributes to its kinetic energy. In fact, this theorem can be mathematically derived using Newton's 2nd law:
$$ W_{net} = \int_{r_1}^{r_2} F_{net}\cdot dr = \int_{r_1}^{r_2} ma\cdot dr =m\int_{v_1}^{v_2} \frac{dv}{dt} \frac{dr}{dv} \cdot dv =m\int_{v_1}^{v_2} v \cdot dv = \Delta(\frac{1}{2}mv^2) $$
But what about a system of multiple particles? It can be shown using Newton's 2nd law that for a system of particles:
$$ \sum_i \vec{F}_{external,i} = M \vec{a}_{com} $$
$M$ denotes the total mass of the system. This formula only includes external forces, because internal forces cancel each other out and don't count. Integrating this formula with respect to the path the center of mass takes gives:
$$ W_{\text{net, external, along center of mass}} = \Delta K_{\text{center of mass}} ~~~~~~~~~~(1)$$
In this formula, the work for every force is calculated by integrating it along the COM path, not the actual path that the point of application of that force takes. This means that the left-hand side of this equation does not represent the actual work done on the system, and the right-hand side does not represent the total kinetic energy of the system, so we can't really call it energy; it's just $\frac{1}{2}Mv_{\text{com}}^2$. Basically, this is not an energy formula, so the left-hand side is known as "pseudowork" or "center-of-mass work", but in any case this formula holds for all systems and bodies, be it compound, particle-like, rigid or non-rigid. For rigid bodies, however, we can say $\Delta K_{\text{center of mass}} = \Delta K_{\text{translational}}$.
However, when we consider particles, formula (1) becomes an energy formula (the left-hand side is the real work done and the right-hand side is the real change in kinetic energy), as explained above. This is because of the fact that the center of mass for a particle is equivalent to the point of application of a force on that particle.
If we were to consider the real work done on a system (i.e. the real input/output of energy to a system by means other than heat which is due the a difference in temperature), we'd have to integrate each force along the path that the point of application of that force takes, but we can't set this equal to $\Delta K_{system}$, because some of this input of energy might be stored as potential, thermal or internal (other than thermal) energies.
So finally, calling the famous formula $W = \Delta K$ the "work-energy theorem" is somehow confusing in a sense that it feels like it's working with real energy and work, which is only the case for particles. For compound systems, as professor Sherwood suggests, it might be better to call it the "CM equation", because it can only be applied to center-of-mass quantities.
2. First law of thermodynamics
The version of "work-energy theorem" that can work with real work and change in energy, is included in the first law of thermodynamics (here we consider positive $W$ as adding energy to the system):
$$ W + Q = \Delta E $$
This time $W$ is the real work. When there is no difference in temperature, as in the sliding box problem, energy transfer to and from the system does not involve heat, so we can omit the $Q$ term (actually the heated box can transfer heat to the floor, but we just neglect it). We end up with the same equation my textbook suggested ($\Delta E_{internal}$ denotes the changes in other forms of internal energy than heat):
$$W_{external} = \Delta E_{mechanical} + \Delta E_{thermal} + \Delta E_{internal}$$
It is important to notice that we accounted for thermal energy in the term $\Delta E_{thermal}$, so $\Delta K$ in $\Delta E_{mechanical}$ represents the change in purely mechanical kinetic energy and we don't need to care if "thermal energy" is modeled as "kinetic energy of atoms and molecules". Now for the original sliding box problem, taking the box as our system, we can write:
$$ W_{external} = \Delta K + \Delta E_{thermal,box} $$
$$ \Longrightarrow \Delta E_{thermal,box} = W_{external} - \Delta K $$
Where $ W_{external} = W_{friction} $, the real work done by friction on the box. For simplicity, I take the frictional force $f$ to be constant. Now, if the point of application of this frictional force moves a distance $d_{friction}$, then the real work friction does is $W_{friction} = -fd_{friction}$, where the negative sign is because of fact that the direction of $f$ is the opposite of the direction of the movement of the point of application. So we can write:
$$ \Delta E_{thermal,box} = -fd_{friction} - \Delta K ~~~~~~~~~~(2) $$
Taking the box as a rigid body, we can write the center-of-mass equation:
$$ W_{\text{net, external, along COM}} = W_{\text{friction, along COM}} = \Delta K_{com} $$
$$ \Longrightarrow -fd_{com} = \Delta K_{com} ~~~~~~~~~~(3)$$
Where $d_{com}$ is the distance traveled by the center of mass of the box. Since the box is considered to be rigid, we can say that $\Delta K_{com} = \Delta K$, where $\Delta K$ is the total change in kinetic energy of the system. Again, here $\Delta K$ denotes changes in purely mechanical kinetic energies, and we don't include thermal energy in it. Finally we combine equations (2) and (3):
$$ \Delta E_{thermal,box} = f\cdot(d_{com} - d_{friction}) $$
So, if the effective distance the point(s) of application of the frictional force moves equals the distance the center of mass travels, the box won't heat up. Imagine, instead of friction stopping the box, someone tries to stop the box traveling in the deep space by pushing against the box with his hand (without the hand slipping on the surface of the box). The point of application of this opposing force (where the hand touches the box) moves the same distance as the COM, until the box stops, without any change in its thermal energy.
If we take friction to be the force required to break the cold-weldings that form between two surfaces, then $d_{friction}$ would be the total distance these welded spots move before they break, because they are the points of application of the frictional force. So, the reason the box heats up is that $d_{frictional} < d_{com}$. The second paper linked below has some nice explanations for this. Finally, if we had $d_{frictional} = d_{com}$, then the box wouldn't heat up, but the thermal energy of the floor would increase by an amount of $fd_{com}$ where $d_{com}$ is the distance the COM of the box moves.
Some models for friction suggest that the heating up effect is because of the vibrations of the cold-welded spots after tearing, which means this can't happen in rigid bodies. I assumed that the box is rigid so we can take $\Delta K_{com} = \Delta K_{total}$, So, these results are essentially sort of an approximation. What's important though, is that the confusion around the "work-energy theorem" and its application for friction is cleared.
Papers by professor Bruce Sherwood:
And his blog post on "pseudowork".