There's no essential difference, information theoretically speaking any loss consisting of a negative log-likelihood (NLL) is a cross entropy between the empirical distribution defined by the training set and the probability distribution defined by a specific model. Even the usual mean squared error (MSE) can be shown as the cross entropy between the empirical distribution and a Gaussian distribution.
For discrete probability distributions $p$ and $q$ with the same support $\mathcal {X}$, this means ${H(p,q)=-\sum _{x\in {\mathcal {X}}}p(x)\,\log q(x)}$. (Eq.1)
Therefore clearly binary cross entropy is just a special case of above equation for discrete probability distributions where your $k_i$ is the ground truth label for the $i$-th example, say $0$ for negative case and $1$ for positive case here, acting as the empirical distribution of the training set, and your $p_i$ is the predicted probability assigned by the model to the positive class for the $i$-th example. And now we can derive the binary cross-entropy loss using the maximum likelihood principle and the Binomial distribution to show their relation.
Since each example of Binomial data-generating process is independent, the likelihood function can be expressed as the product of the probabilities of each individual example $L=∏_{i=1}^Np_i^{k_i}(1−p_i)^{1−k_i}$. To simplify computations and prevent numerical underflow, it's common to work with the logarithm of the likelihood function, known as the log-likelihood $\log L=\sum_{i=1}^N[k_i\log(p_i)+(1−k_i)\log(1−p_i)]$. Finally maximizing the log-likelihood for a fixed $N$ is equivalent to minimizing the negative log-likelihood (NLL) which leads to your binary cross-entropy loss function $-\sum [k_i \log(p_i)+(1-k_i) \log(1-p_i)]$ which measures the discrepancy between the predicted probabilities and the true labels.