89

I really can't understand what Leonard Susskind means when he says in the video Leonard Susskind on The World As Hologram that information is indestructible.

Is that information that is lost, through the increase of entropy really recoverable?

He himself said that entropy is hidden information. Then, although the information hidden has measurable effects, I think information lost in an irreversible process cannot be retrieved. However, Susskind's claim is quite the opposite. How does one understand the loss of information by an entropy increasing process, and its connection to the statement “information is indestructible”.

Black hole physics can be used in answers, but, as he proposes a general law of physics, I would prefer an answer not involving black holes.

MarianD
  • 2,089
HDE
  • 2,977

8 Answers8

76

How is the claim "information is indestructible" compatible with "information is lost in entropy"?

Let's make things as specific and as simple as possible. Let's forget about quantum physics and unitary dynamics, let's toy with utterly simple reversible cellular automata.

Consider a spacetime consisting of a square lattice of cells with a trinary (3-valued) field defined on it. The values are color-coded such that cells can be yellow, orange, and red. The 'field equations' consist of a set of allowed colorings for each 2x2 block of cells:

enter image description here

A total of 27 local color patterns are allowed. These are defined such that when three of the four squares are colored, the color of the fourth cell is uniquely defined. (Check this!)

The field equations don't contain a 'direction of evolution'. So how to define a timelike direction? Suppose that when looking "North" or "West" along the lattice directions, you hit a horizon beyond which an infinite sea of yellow squares stretches:

enter image description here

"North" and "West" we label as 'light rays from the past'. These two rays of cells constitute the 'snapshot' of the universe taken from the spacetime point defined by the intersection of the two rays. Given this 'snapshot', and using the field equations (the allowed 2x2 colorings), we can start reconstructing the past:

enter image description here

Here, the rule applied to color the cell follows from the square at the bottom of the center column in the overview of the 27 allowed 2x2 squares. This is the only 2x2 pattern out of the 27 that fits the given colors at the right, the bottom, and the bottom-right of the cell being colored. Identifying this 2x2 pattern as uniquely fitting the cell colors provided, the top-left color becomes fixed.

Continuing like this, we obtain the full past of the universe up to any point we desire:

enter image description here

We notice that we constructed the full past knowing the colorings of 'light ray cells' in the 'snapshot' that, excluding the uniform sea beyond the horizons, count no more than 25 cells. We identify this count as the entropy (number of trits) as observed from the point where the two light rays meet. Notice that at later times the entropy is larger: the second law of thermodynamics is honored by this simple model.

Now we reverse the dynamics, and an interesting thing happens: knowing only 9 color values of light rays to the future (again excluding the uniform sea beyond the horizon):

enter image description here

We can reconstruct the full future:

enter image description here

We refer to these 9 trits that define the full evolution of this cellular automata universe as the 'information content' of the universe. Obviously, the 25 trits of entropy do contain the 9 trits of information. This information is present but 'hidden' in the entropy trits. The entropy in this model will keep growing. The 9 trits of information remains constant and hidden in (but recoverable from) an ever larger number of entropy trits.

Note that none of the observations made depend on the details of the 'field equations'. In fact, any set of allowed 2x2 colorings that uniquely define the color of the remaining cell given the colors of three cells, will produce the same observations.

Many more observations can be made based on this toy model. One obvious feature being that the model does not sport a 'big bang' but rather a 'big bounce'. Furthermore, the information content (9 trits in the above example) defining this universe is significantly smaller than the later entropy (which grows without bound). This is a direct consequence of a 'past horizon' being present in the model. Also, despite the 'field equations' in this model being fully reversible, the 'snapshot' taken allows you to reconstruct the full past, but not the future. This 'arrow of time' can be be circumvented by reconstructing the past beyond the 'big bounce' where past and future change roles and a new snapshot can be derived from the reconstruction taken. This snapshot is future oriented and allows you to construct the future beyond the original snapshot.

These observations, however, go well beyond the questions asked.

Johannes
  • 19,343
23

I don't know in which context Susskind mentioned this, but he probably meant time evolution is unitary. That means, among other things, that it's reversible, ie no information can ever get lost because you can essentially, starting from any time (time-like slice), run time backwards (theoretically) and compute what happened earlier.

If black hole evolution was indeed perfectly thermal, it would violate unitarity and information would be lost indeed. Susskind, I believe, thinks that this is not the case.

WIMP
  • 2,685
13

I'll forewarn that I'm no string theorist and Susskind's work is not therefore fully wonted to me (and likely I couldn't understand it if it were) so I do not fully know the context (of the supposed quote that entropy is hidden information).

But what he maybe means by "hidden" information is one or both of two things: the first theoretical, the second practical:

  1. The Kolmogorov complexity $K(\Omega)$ for a given system $\Omega$ (more precisely: the complexity of the system's unambiguous description) is in general not computable. $K(\Omega)$ is related to the concept of Shannon entropy $S_{Sha}(\Omega)$ (see footnote);
  2. Both a system's Kolmogorov complexity and Shannon entropy are masked from macroscopic observations by statistical correlations between microscopic components of the systems: thermodynamic systems the measurable entropy $S_{exp}(\Omega)$ (which is usually the Boltzmann) equals the true Shannon entropy $S_{Sha}(\Omega)$ plus any mutual information $M(\Omega)$ (logarithmic measure of statistical correlation) between the system's components: $S_{exp}(\Omega)= S_{Sha}(\Omega) + M(\Omega)$

Hopefully the following explanations will show you why these ideas of "hidden" are in no way related to being "destroyed" or even "unrecoverable".

A system's Kolmogorov complexity is the size (wontedly measured in bits) of the smallest possible description of the system's state. Or, as user @Johannes wonderfully put it: it's the minimum number of yes / no questions one would have to have answered to uniquely specify the system. Even if you can unambiguously and perfectly describe a system's state, there is in general no algorithm to decide whether a more compressed description can be equivalent. See the discussion of the Uncomputability theorem for a Kolmogorov complexity on Wikipedia for example. So in this sense, the true entropy of a thing is hidden from an observer, even though the thing and a perfect description of it are fully observable by them.

So much for the hiddenness of the entropy (quantity of information). But what of the information itself? Uncomputability of Kolmogorov complexity bears on this question too: given that the amount of entropy describing a system state is uncomputable, there is in general no way of telling whether that system's state has been reversibly encoded into the state of an augmented system if our original system merges with other systems: otherwise put in words more applicable to black holes: there is no algorithm that can tell whether our original system's state is encoded in the state of some other system that swallows the first one up.

For a discussion on the second point i.e. how the experimentally measured entropy and the Kolmogorov complexity differ, please see my answer here I also discuss there why information might not be destroyed in certain simple situations, to wit: if the relevant laws of physics are reversible, then

The World has to remember in some way how to get back to any state it has evolved from (the mapping between system states at different times is one-to-one and onto).

This is a more general way of putting the unitary evolution description given in other answers.

Afterword: Charles Bennett in his paper "The thermodynamics of computation-a review" puts forward the intriguing and satisfying theory that the reason that physical chemists can't come up with a failsafe algorithm for calculating entropies of the molecules they deal with is precisely this uncomputability theorem (note that there does not rule out algorithms for certain specific cases, so the theorem can't prove that's why physical chemists can't calculate entropies, but it's highly plausible in the same sense that one could say that one reason why debugging software is a hard problem is Turing's undecidability of the halting problem theorem).

Footnote: Shannon entropy is a concept more readily applicable to systems which are thought of as belonging to a stochastic process when one has a detailed statistical description of the process. In contrast Kolmogorov complexity applies more to "descriptions" and one must define the language of the description to fully define $K(\Omega)$. Exactly how they are related (or even if either is relevant) in questions such as those addressed in the black hole information paradox is a question whose answer probably awaits further work beyond physics community "views" (as put in another answer) about whether or not information outlives the underlying matter and energy thrown into a black hole.

Another footnote (26th July 13): See also the Wikipedia page on the Berry Paradox, and a wonderful talk by Gregory Chaitin called "The Berry Paradox" and given at a Physics - Computer Science Colloquium at the University of New Mexico. The Berry Paradox introduces (albeit incompletely, but in everyday words) the beginnings of the ideas underlying Kolmogorov Complexity and indeed lead Chaitin to his independent discovery of the Kolmogorov Complexity, even though the unformalised Berry Paradox is actually ambigious. The talk also gives some poignant little examples of dealing personally with Kurt Gödel.

Edit 2nd August 2013 Answers to Prathyush's questions:

I could not understand the connection between thermodynamic entropy and kolmogorov complexity, Please can you comment on that. Esp the part "So in this sense, the true entropy of a thing is hidden from an observer, even though the thing and a perfect description of it are fully observable by them. " If you know the exact state of the system, then in physics entropy is zero, whether we can simplify the description does not come into picture

First let's try to deal with

If you know the exact state of the system, then in physics entropy is zero, whether we can simplify the description does not come into picture

Actually, whether or not there is possible simplification is central to the present problem. Suppose our description of our system $\Omega$ is $N_\Omega$ bits long. Moreover, suppose we have worked very hard to get the shortest full description we can, so we hope that $N_\Omega$ is somewhere near the Kolmogorov complexity $K(\Omega) < N_\Omega$. Along comes another "swallower" system $\Sigma$, which we study very carefully until we have what we believe is a full description of $\Sigma$, which is $N_\Sigma$ bits long. Again, we believe that $N_\Sigma$ is near $\Sigma$'s Kolmogorov complexity $K(\Sigma) < N_\Sigma$ The swallower $\Sigma$ absorbs system $\Omega$ - so the two systems merge following some physical process. Now we study our merged system very carefully, and find that somehow we can get a full description whose length $N_{\Omega \cup \Sigma}$ is much shorter than $N_\Omega + N_\Sigma$ bits long. Can we say that the merging process has been irreversible, in the sense that if we ran time backwards, the original, separated $\Omega$ and $\Sigma$ would not re-emerge? The point is we cannot, even if $N_{\Omega \cup \Sigma} \ll N_\Omega + N_\Sigma$. Why? Because we can never be sure that we truly did find the shortest possible descriptions of $\Omega$ and $\Sigma$. There is no way of telling whether $K(\Omega) = N_\Omega, K(\Sigma) = N_\Sigma$.

Ultimately what it being driven at here is the question of whether time evolutions in physics are one-to-one functions, i.e. given an ending state for a system, does this always unambiguously imply a unique beginning state? Our great central problem here is, forgive some floridity of speech, that we do not know how Nature encodes the states of her systems. Figuratively speaking, the coding scheme and codebook are what physicists make their business to work out. Kolmogorov Complexity, or related concepts, are presumed to be relevant here because it is assumed that if one truly knows how Nature works, then one knows what the maximally compressed (in the information theoretic sense) configuration space for a given system is and thus the shortest possible description of a system's state is a number that names which of the points in the configuration space a particular system is at. If the number of possible points in the ending configuration space - the ending Kolmogorov complexity (modulo an additive constant) - is less than the number of possible points in the beginning space, then we can say in general the process destroys information because two or more beginning states map to an ending state. Finding hidden order in seemingly random behaviour is a difficult problem: that fact makes cryptography work. Seemingly random sequences can be generated from exquisitely simple laws: witness Blum Blum Shub or Mersenne Twisters. We might observe seemingly random or otherwise fine structure in something and assume we have to have a hugely complicated theory to describe it, whereas Nature might be using a metaphorical Mersenne twister all along and summing up exquisite structure in a few bits in Her codebook!

Now let's try to deal with:

I could not understand the connection between thermodynamic entropy and kolmogorov complexity, Please can you comment on that.

One interpretation of the thermodynamic entropy is that it is an approximation to the "information content" the system, or the number of bits needed to wholly specify a system given only its macroscopic properties. Actually your comment "I could not understand the connection between thermodynamic entropy and kolmogorov complexity" is a very good answer to this whole question! - we don't in general know the link between the two and that thwarts efforts to know just how much information it really takes to encode a system's state unambiguously.

But the concepts are linked in some cases. The classic example here is the Boltzmann $H$-entropy for a gas made up of statistically independent particles:

$H = -\sum_i p_i \log_2 p_i$

where $p_i$ is the probability that a particle is in state number $i$. The above expression is in bits per particle (here I've just rescaled units so that the Boltzmann constant $k_B = \log_e 2$).

If indeed the particles' occupations of the states are truly random and statistically independent, then it can be shown through the Shannon Noiseless Coding Theorem that the number of bits needed to encode the states of a large number $N$ of them is precisely $H$ bits per particle. This is the minimum number of bits in the sense that if one tries to construct a code that assigns $H-\epsilon$ bits per particle then, as $N\rightarrow\infty$ the probability of coding failure approaches unity, for any $\epsilon > 0$. Conversely, if we are willing to assign $H+\epsilon$, then there always exists a code such that the probability of wholly unambiguous coding approaches unity as $N\rightarrow\infty$ for any $\epsilon > 0$. So, in this special case, the Boltzmann entropy equals the Kolmogorov complexity as $N\rightarrow\infty$: we have to choose $H+\epsilon$ bits per particle, plus a constant overhead to describe how the coding works in the language we are working with. This overhead spread over all the particles approaches nought bits per particle as $N\rightarrow\infty$.

When a thermodynamic system is at "equilibrium" and the particle state occupations statistically independent, we can plug the Boltzmann probability distribution

$p_i = \mathcal{Z}^{-1} e^{-\beta E_i}$

into the $H$ and show that it gives the same as the Clausius entropy $S_{exp}$ derived from experimental macrostates.

If there is correlation between particle occupations, similar comments in principle apply to the Gibbs's Entropy, if the joint state probability distributions are known for all the particles. However, the joint probability distributions are in general impossible to find, at least from macroscopic measurements. See the paper Gibbs vs Boltzmann Entropy by E. T. Jaynes, as well as many other works by him on this subject). Moreover, user Nathaniel of Physics Stack Exchange has an excellent PhD thesis as well as several papers which may be of interest. The difficulty of measuring the Gibbs' Entropy is yet another difficulty with this whole problem. I also gave another answer summarizing this problem.

A final way to link KC to other concepts of entropy: you can, if you like, use the notion of KC to define what we mean by "random" and "statistically independent". Motivated by the Shannon Noiseless Coding theorem, we can even use it to define probabilities. A sequence of variables is random if there is no model (no description) that can be used to describe their values other than to name their values. The degree of "randomness" in a random variable can be thought of like this: you can find a model that describes the sequence of variables somewhat - but it is only approximate. A shorter description of a random sequence is to define a model and its boundary conditions, then to code that model and conditions as well as the discrepancies between the observed variables and the model. If the model is better than guessing, this will be a pithier description than simply naming the values in full. Variables are "statistically independent" if there is no description, even in principle, that can model how the value of some variables affects the others and thus the pithiest description of the sequence is to name all the separate variables in full. This is what correlation functions between rvs do, for example: the knowledge of value of X can be used to reduce the variance of a second correlated variable Y through a linear model involving the correlation co-efficient (I mean, reduce the variance in the conditional probability distribution). Finally, we can turn the Shannon Noiseless Coding Theorem on its head and use it to define probabilities through the KC: the probability that discrete rv $X$ equals $x$ is $p$ if the following holds. Take a sequence of rvs and for each one record the sequence of truth values $X=x$" or $X\neq x$ and "find the pithiest possible description" (we shall need an "oracle" because of the uncomputability of KK) of this truth value sequence and its length in bits and bits per sequence member. The probability "p" is then the number such that $-p\log_2 p - (1-p)\log_2 (1-p)$ equals this bits per sequence member, as the sequence length $\rightarrow\infty$ (taking the limit both improves the statistical estimates and spreads the fixed length overhead in describing the coding scheme over many sequence members, so that this overhead does not contribute to the bits per sequence member). This approach gets around some of the philosophical minefield that arises in even defining randomness and probability - see the Stanford Dictionary of Philosophy entry "Chance Versus Randomness for some flavor of this.

Lastly:

If you know the exact state of the system, then in physics entropy is zero

Here our problems are the subtle distinctions (1) between an instance of an ensemble of systems, all assumed to be members of the same random process or "population" and the ensemble itself, (2) Information and Thermodynamic entropies and (3) unconditional and conditional information theoretic entropies.

When I said that "the true entropy of a thing is hidden from an observer, even though the thing and a perfect description of it are fully observable by them" well of course the information theoretic Shannon entropy, conditioned on the observer's full knowledge of the system is nought. By contrast, the thermodynamic entropy will be the same as it is for everybody. By yet another contrast, the information theoretic entropy for another observer who does not have full knowledge is nonzero. What I was driving at in this instance is the Kolmogorov Complexity, or the number of yes/no questions needed to specify a system from the same underlying statistical population, because this quantity, if it can be calculated before and after a physical process, is what one can use to tell whether the process has been reversible (in the sense of being a one-to-one function of system configuration).

I hope that these reflexions help you Prathyush on your quest to understand the indestructability, or otherwise, of information in physics.

Selene Routley
  • 90,184
  • 7
  • 198
  • 428
5

Like many people have said here, he's probably talking about unitarity. Susskind is echoing the general view among physicists. I don't think we have (yet) a concrete way to even precisely formulate the principle, leave alone any kind of proof. But based on unitarity in quantum mechanics and (for what it's worth) physical intuition about gravity, it seems like the sensible thing would be for information content to be conserved.

A simple illustration of this principle would be the no-cloning theorem. The way I see it, it says that you can't destroy the information in the register (the qubit into which you cant to copy some information) in a way consistent with unitary evolution. If you managed to do it, then you should be able to invert the unitary evolution and generate the information from the register which you're supposed to have destroyed.

As for hidden information, think of it as being temporarily hidden. When some information is inside the black hole, you can't access that information and the black hole has a corresponding entropy. When the black hole evaporates away, there is nothing left to contain the entropy, so the information must have been sent out somehow and it's now un-hidden (or so it's believed, as of today). Again, I don't think there's a concrete calculation to establish this definitively -- mainly since we don't have a good handle on quantum gravity.

Siva
  • 6,204
4

tl;dr- Information's indestructibility is more of an ideal scientists seek than a law of nature. It's the ideal because physical transforms can, at best, discriminate between as many states before the transform as after it; anything more is impossible while anything less looks like an opportunity for improvement.


Trivially, we know that we can't gain information without measurement/observation. So, the very best a physical model can ever do is conserve information.

For example, if 10 bits of information are known about a physical system and no more information is gained (e.g., through measurement), then it's strictly impossible under any hypothetical type of physics to ever have more than 10 bits of information about the physical system after any transform, e.g. after moving forward or backward in time.

By contrast, it's easy to lose information. In fact, if someone simply forgets what they know about physics, then even ideal Newtonian systems lose 100% of information as no prediction of their evolution can be made. (Worth noting that information is a property of a model and not the universe itself, so different observers can perceive different information leaks.)

So, the ideal's perfect information preservation. Whenever we fail to preserve information, we can't be sure that our models are complete. Then, the assertion that information's indestructible is basically the idealistic demand that the laws of physics reach that theoretical optimality.

As an ideal, it's worth noting that it's not necessarily a practical truth. We can construct hypothetical laws of physics that would practically not preserve information; if any of those happen to be the case, then the claim that information's indestructible would continue to be unrealized.

Regardless, systems that appear to lose information are glaring targets for scientists for two big reasons:

  1. Any sort of prediction that can be made based on the "lost" information constitutes a novel discovery.

  2. Most of the current laws of physics purport to conserve information, so they're ready tools to attack the lossful system with.

The black hole stuff is an example of the second point. If black holes appear to leak information whereas current theories don't, then that seems like a prime opportunity to attack black hole models with other theories and see what falls out of it.

Nat
  • 4,720
3

A bunch of people answered but with very complicated things. So I am going to answer with some far more understandable and fun things...

Firstly Susskind thinks, like many people, that physical laws are reversible and therefore it stands to reason that information cannot be lost otherwise you would't be able to reverse things.

And when he says that information is not lost, he means in theory, regarding the whole universe with a god-like state of knowledge, not to any particular person.

Then there is the question of exactly what you mean by entropy. Entropy is information in the system that you don't know. For example in a bathtub of water conventional observations might include the temperature, pressure, and volume, but there are countless bits of information encoded in the states of all the water molecules, their motions and vibrational modes. This is unknown information and much is not even observable in practice; all we know is things about the energy distribution. The number of bits of entropy would be the number of bits of additional information above what you know already, that you would need to laboriously catalog in order to fully describe the system at one instant.

Let's consider one mode of information loss: erasing computer data. Whenever a bit is flipped in a computer memory, that information is over written and conventionally we consider it to be lost.

However physically flipping these bits generates heat in the circuitry and that cascade of atomic scale events involves the dissipation of that bit of information into thermal vibration modes. In fact $E=kTln2$ is the maximum amount of energy that will be liberated flipping a bit at temperature T.

So you ask, can this information be recovered from the environment so we can know the value of the bit? The answer is no in this specific case because the heat from the bit has almost infinite dimensions to dissipate into, and so there's no real practical way to gather that back together, but it doesn't mean that the information is destroyed, just that it is no longer accessible to us, and therefore becomes unknown information, which we know is there, and is quantifiable, and so we call it entropy.

Now let me show you a way in which entropy can actually be reduced. Let's suppose you have a box into which you throw computer cables, like USB cables or power cords. Maybe you initially lay them on top of each other in an ordered way. But then a year later you come to that box and all the cords are tangled up in a big hairball. The initial ordered state has low entropy. You put the cables in on top each other in some order, so supposedly you should know some information about the arrangement of the contents of the box, even if you don't know all the specifics. Now over time, people might poke around in the box looking for one cable or another so stirring around the contents, and pushing things aside and vibrating the contents in various ways. This is disordered unknown environmental information that is being added to the box contents. Its a random bunch of forces on various cables over time, and you are not making any note of that information. So the entropy of the system (hidden information from the external random perturbations) is being increased.

In the end you have a whole bunch of cables that are knotted together in various ways, instead of being independent and simply organized. The information encoded in all those knots and tangles came from the random environmental information that was added. This is the increase in entropy.

So then not being happy with this situation, you decide to organize them. But in practice what that means is that you have to undo all the knots by perceptually following each cable through the system and becoming cognizant of the information that was added, in order to unthread all the tangles and separate them again. So this process of sorting that you do is lowering the entropy of the system because you are exhaustively cataloging (and rapidly forgetting) exactly how the hidden information was encoded in the cable tangles. But also note that this process required energy and time on your part. And the information that was encoded in the cable tangles went into your brain, and then was forgotten, and dissipated as thermal energy.

But the weird thing is that entropy is related to your state of knowledge. So that means you and I can potentially ascribe different entropy to the same system depending on what we know in advance.

For example if I receive a million bits of information, I can calculate the frequency of the 1s and 0s and other statistics, and that gives me some information, but then the rest I consider to be hidden and therefore I can put a large entropy number on it. But someone else could have a certain amount of information about the fact that the bits are some encoded message, so to them the entropy is lower because the extra information constrains the bit pattern into a structured space smaller than $2^N$.

In the same way if someone had somehow noted how each interaction with the box of cables over time had affected them, then at the end the entropy would be low from the viewpoint of that person even though the cables would still be tangled. It's just that that person who watched how they got tangled didn't allow the information to become hidden, and in theory doesn't need to actually analyze the cables at the end in order to understand them, they could mechanically untangle them like a robot with zero or low levels of perception.

Robotbugs
  • 365
0

My understanding was always that this was a result of time evolution preserving measure in state space. So we have a space of states $\mathcal{P}$ with measure $\mu$ and there is an ensemble of states in $\mathcal{P}$ distributed according to some other measure $\nu$. We also have a dynamical system discribing time evolution $f:\mathcal{P} \times \mathbb{R} \to \mathcal{P}$ where $f(p,t)$ is the state where a particle initially in state $p$ ends up after a time $t$. The crucial property of $f$ is that it preserves measure in the sense that if a small region of phase space has some phase-space volume $V$, then any time later it will have the same phase space volume $V$.

$\DeclareMathOperator{\Tr}{Tr}$Now let's look at the classical mechanics of $N$ particles in $d$ dimensions. The measure $\mu$ is given by $d \mu = d^{dN}xd^{dN}p$. The function $f(p,t)$ is determined by Hamilton's equations. We have an ensemble of states distributed according to some measure $\nu$. Usually we talk about the phase space density $\rho$ given by $d\nu = \rho d\mu$. Then the entropy is defined by $S=-\rho(p) \log \rho(p) d\mu$.

Now let's consider the time evolution of the entropy. We have $S(t)=-\int \rho(p,t) \log \rho(p,t) d\mu$. Thus we must find the time evolution of $\rho$. We have $\rho(p,t) = \frac{\rho(f^{-1}(p),0)}{\det \partial_p f(p,t)}$. But Louiville's theorem says the determinant in the denominator must be one, so $\rho(p,t) = \rho(f^{-1}(p),0)$. Now $S(t) = -\int \rho(f^{-1}(p),0) \log \rho(f^{-1}(p),0) d\mu$. Now again by Louville's theorem we can do the change of variables $f^{-1}(p) \to p$ to get $S(t)=-\int \rho(p,0) \log \rho(p,0) d\mu = S(0)$ so the entropy must be a constant.

Another case to look at is quantum mechanics. Here the phase space $\mathcal{P}$ is the space of wave functions and $\mu$ is the measure on this space (it is more complicated for infinite dimensional Hilbert spaces). The function $f$ is given by $| \psi(0) \rangle \to U(t,0)|\psi(0) \rangle$, where $U$ is the (unitary) time evolution operator. We have a distribution of states given by $\nu$ and the density matrix describing this collection of states is given by $\rho = \int |\psi \rangle \langle \psi| d \nu$. The entropy is then defined as $S = -\Tr(\rho \log \rho).$

Now let's consider the time evolution of the entropy. We have $S(t) = -\Tr(\rho(t) \log \rho(t))$. Thus we must find the time evolution of $\rho$. We have $\rho(t) = \int |\psi \rangle \langle \psi| d \nu_t$, where the subscript $t$ denotes we are talking about the distribution at the time $t$. Now since $\nu_t$ is the pushforward of $\nu_0$ under $f(\cdot, t)$, we have that $\rho(t) = \int U(t,0) |\psi \rangle \langle \psi| U^\dagger(t,0) d\nu_0=U(t,0) \int |\psi \rangle \langle \psi| d\nu_0 U^\dagger(t,0) = U(t,0) \rho(0) U^\dagger(t,0) $. Now since $\log (U(t,0) \rho(0) U^\dagger(t,0)) = U(t,0)\log (\rho(0) )U^\dagger(t,0)$, and by cyclicity of trace, we have $S(t) = -\Tr(\rho(t) \log \rho(t)) = -\Tr(\rho(0) \log \rho(0)) =S(0)$, so the entropy is constant.

Notice here that it wasn't sufficient for the dynamics to be reversible. The damped harmonic oscillator is reversible, but its entropy decreases (it gives entropy to its surroundings assuming its initial energy is much larger that $kT$). The dynamics really need to preserve volume in state space.

Brian Moths
  • 11,134
-6

This will be a philosophical answer, however, useful :

The logic is actually quite simple :

There is no discontinuity. There is cause and effect.

ANY current state of matters is an 'effect' that resulted from infinite amount of causes. And it also is a cause for subsequent effects itself.

In short, just like matter, information also transforms, changes into different states through cause and effect mechanics.

So, what we call 'chaos' or 'entropy' or any other seemingly incomprehensible and un-trackable state of existence, is also a state which results from infinite numbers of causes leading to effects.

That we are not able to track, distinguish, calculate, comprehend, explain such states of existence does not mean that they are outside the cause-effect mechanic and other mechanics that make existence.

So any state in a chaotic, entropic state should be theoretically traceable to earlier states, should actually be coming to being due to cause-effect mechanics that can be observed, calculated if you had the means to, and also should naturally be linked to any earlier state of information - including the state where the entropy, chaos or 'destroyed information' did not come to being yet, and the earlier information we were observing was there as it was.

Conservation of information, if you will. Information is also subject to the cause-effect mechanics that is inviolable anywhere in existence. (That some cases seem to 'violate' cause and effect relationships - like some quantum physics experiments - does not mean that they violate the mechanic in regard to general existence itself, leave aside universe)

If you would look at black holes and explanation susskind and others brought, there is no exception - information is protected and conserved and linked in this or that way.

Therefore it is indestructible : you should be able to reconstruct any information which led to the CURRENT state of information by analyzing current state of information and deconstructing it. Which includes anything falling into black hole and merging into singularity.