(Throughout this post I will be using the convention and
In my post on Shannon entropy I introduced the notion of information
entropy with respect to discrete random variables. There we derived, albeit in
a hand-wavy fashion, the entropy formula:
I hinted at the fact that is maximized when the underlying random variable
is distributed uniformly. We saw this in the coin-toss example by seeing how
any deviation from a fair-coin reduced the entropy.
My aim in this post is to prove (at least in the discrete case) that is
indeed maximized when is distributed uniformly.
As an aside, one thing I neglected to mention in my original post on
entropy is how is computed when . By convention, when .
This agrees with our intuition since the case is analogous to the case
in that the event will almost-surely not occur. And recall that
. So can be seen as the symmetric case to ,
making the above convention not far-fetched.
You can actually show using methods of indeterminate forms that . So that if we interpret the product as a limit the above equality ceases to be a convention but instead a fact.
First we need to quickly review a standard log inequality.
Where equality holds if and only if .
The proof follows directly from the definition,
Where the inequality follows from the fact that
If it follows trivially that . We want to show, however,
that is the only root.
For this we define the auxiliary function
Suppose that has two roots
Assume, without loss of generality, that .
According to Rolle’s theorem, since is continuous on and
differentiable on , there must exist a point such
However, the only root of is .
Therefore, is the only root of .
Next we want to show that is an upper bound for .
By the change of base formula I can write
Suppose for now that for each .
Suppose for some . Recall from the preliminary discussion above
that by convention. Then from step above simply
which is precisely the inequality that is required to be shown in the last step.
The inequality therefore still holds.
Finally, as discussed earlier, equality is reached if and only if
That is, when for each .
That is, when is uniformly distributed.