When using a quantum channel to transmit classical information, we consider an ensemble $\mathcal{E} = \{(\rho_x, p(x))\}$ consisting of states $\rho_x$ labelled with a symbol $x$ from a finite alphabet $\Sigma$, each of which is associated with a probability $p(x)$. With just this ensemble we can compute things like Holevo $\chi$ or entropy or relative entropy. If we further include a communication protocol (sender Alice tries to communicate $x$ to Bob by transmitting $\rho_x$ through channel $\mathcal{N}_{A\rightarrow B}$ with probability $p_X(x)$) we have additional quantities we can compute like channel capacity and Holevo information.
Now define the ensemble with respect to a continuous probability distribution $p(x)$ ($\Sigma$ is no longer finite), e.g. the classical-quantum state associated with $\mathcal{E}$ becomes $$ \sigma_{XB} = \int_\Sigma dx p(x) |x\rangle \langle x | \otimes \rho_B^x, \tag{1} $$ and the state of the ensemble becomes $\rho = \text{Tr}_B (\sigma_{XB})$. Entropy $H(\rho)$ remains well-defined, and an equation for the Holevo quantity like \begin{align} \chi(\mathcal{E}) &:= H(\rho) - \int_\Sigma dx p(x) H( \rho_x) \tag{2} \end{align} doesn't seem wrong in any obvious way. On the other hand, Shannon entropy seems to fall apart for probability densities, and $\chi$ is implicitly describing a scenario in which a continuous variable $x$ will be measured.
But some other quantities seem sketchy, e.g. conditional min-entropy seems well defined but its interpretation in terms of optimal measurements feels off. E.g. trying to adapt the operational interpretation that the conditional min-entropy maximizes the state identification probability gives something like \begin{equation} 2^{-H_{min}(X|B)} = \max_{\{ \Lambda_B^x\}} \int_\Sigma dx p(x) \text{Tr}(\Lambda_B^x \rho_B^x) \tag{3} \end{equation} where the maximization is over all POVMs associating each element of $\Sigma$ with a positive operator in $A$. I do not have a good feel for whether this is a reasonable approach; I am again concerned with the idea that Bob is extracting a continuously-valued variable on his end in some way that would blow up the mutual information between his measurement and $X$ to infinity.
Question
Which information-theoretic quantities retain their operational meaning when we substitute a discrete distribution with a continuous one? References are appreciated