All Articles

Maximizing Entropy


(Throughout this post I will be using the convention logloge\log \equiv \log_e and lglog2\lg \equiv \log_2).

In my post on Shannon entropy I introduced the notion of information entropy with respect to discrete random variables. There we derived, albeit in a hand-wavy fashion, the entropy formula:

H(X)=H(p1,,pn)=pilgpiH(X) = H(p_1,\ldots,p_n) = -\sum p_i \cdot \lg p_i

I hinted at the fact that HH is maximized when the underlying random variable XX is distributed uniformly. We saw this in the coin-toss example by seeing how any deviation from a fair-coin reduced the entropy.

My aim in this post is to prove (at least in the discrete case) that HH is indeed maximized when XX is distributed uniformly.

As an aside, one thing I neglected to mention in my original post on entropy is how HH is computed when p=0p = 0. By convention, plgp0-p\cdot \lg p \equiv 0 when p=0p = 0.

This agrees with our intuition since the case p=0p = 0 is analogous to the case p=1p = 1 in that the event will almost-surely not occur. And recall that 1lg1=0-1\cdot \lg 1 = 0. So p=0p = 0 can be seen as the symmetric case to p=1p = 1, making the above convention not far-fetched.

You can actually show using methods of indeterminate forms that plgp0 as p0p\cdot \lg p \to 0 \text{ as } p \to 0. So that if we interpret the product as a limit p0p\to 0 the above equality ceases to be a convention but instead a fact.

First we need to quickly review a standard log inequality.

logxx1x>0\log x \leq x - 1 \enspace \forall x >0

Where equality holds if and only if x=1x = 1.

Step 1

The proof follows directly from the definition,

logx=1x1t.\log x = \int_1^x \frac{1}{t} \enspace .

If x1x\geq 1 then

logx=1x1t1x1=x1.\log x = \int_1^x \frac{1}{t} \leq \int_1^x 1 = x - 1 \enspace .

If 0<x<10 < x < 1 then

logx=1x1t=x11tx11=(1x)=x1.\log x = \int_1^x \frac{1}{t} = -\int_x^1 \frac{1}{t} \leq -\int_x^1 1 = -(1-x) = x - 1 \enspace .

Where the inequality follows from the fact that

t[x,1]1/t11/t1.t\in [x,1] \Rightarrow 1/t \geq 1 \Rightarrow -1/t \leq -1 \enspace .

If x=1x = 1 it follows trivially that logx=x1\log x = x - 1. We want to show, however, that x=1x = 1 is the only root.

For this we define the auxiliary function

f(x)=logxx+1f(x) = \log x - x + 1

Suppose that ff has two roots

x=1x01\begin{aligned} x &= 1 \\ x_0 &\neq 1 \end{aligned}

Assume, without loss of generality, that x0<1x_0 < 1.

According to Rolle’s theorem, since ff is continuous on [x0,1][x_0,1] and differentiable on (x0,x)(x_0,x), there must exist a point ξ(x0,1)\xi \in (x_0,1) such that

f(ξ)=1ξ1=0.f'(\xi) = \frac{1}{\xi} - 1 = 0 \enspace .

However, the only root of 1/ξ1=01/\xi -1 = 0 is ξ=1\xi = 1.

But 1(x0,1)1 \notin (x_0,1).


Therefore, x=1x = 1 is the only root of ff.

Next we want to show that lgn\lg n is an upper bound for HH.

Step 2

By the change of base formula I can write

lg(n)=log(n)log(2)\lg(n) = \frac{\log(n)}{\log(2)}

Suppose for now that pj>0p_j > 0 for each j=1,,nj = 1,\ldots, n.


H(X)lgn=H(p1,,pn)lgn=pjlgpjlgn=pjlogpjlog2logpjlog2=1log2pj(logpj+logn)sincepj=1=1log2pj(logpjn)=1log2pj(log1pjn)1log2pj(1pjn1)from the inequality in step 1=1log2(1npj)=1log2(n1n1)=1log2(11)=0Hlgn\begin{aligned} H(X) - \lg n &= H(p_1,\ldots,p_n) - \lg n \\ &= -\sum p_j \cdot \lg p_j - \lg n \\ &= -\sum p_j \cdot \frac{\log p_j }{\log 2} - \frac{\log p_j }{\log 2} \\ &= -\frac{1}{\log 2}\cdot \sum p_j \cdot\left(\log p_j + \log n \right) \qquad \small{\text{since} \sum p_j = 1} \\ &= -\frac{1}{\log 2}\cdot \sum p_j \cdot\left(\log p_j \cdot n \right) \qquad \star \\ &= \frac{1}{\log 2} \cdot \sum p_j \cdot\left(\log \frac{1}{p_j \cdot n} \right) \\ &\leq \frac{1}{\log 2} \cdot \sum p_j \cdot\left(\frac{1}{p_j \cdot n} - 1 \right) \qquad \small{\text{from the inequality in step 1}} \\ &= \frac{1}{\log 2} \cdot \sum \left(\frac{1}{n} - p_j \right) \\ &= \frac{1}{\log 2}\cdot\left(n\cdot\frac{1}{n} - 1\right) \\ &= \frac{1}{\log 2}\cdot\left(1 - 1\right) \\ &= 0 \\ &\Rightarrow H \leq \lg n \\ &\square \end{aligned}

Suppose pj=0p_j = 0 for some jj. Recall from the preliminary discussion above that plgp0-p\cdot\lg p \equiv 0 by convention. Then from step \star above simply note that

pj=0pjlgp=0<1n0=1npjp_j = 0 \Rightarrow -p_j\cdot \lg p = 0 < \frac{1}{n} - 0 = \frac{1}{n} - p_j \enspace

which is precisely the inequality that is required to be shown in the last step. The inequality therefore still holds.

Finally, as discussed earlier, equality is reached if and only if

1pjn=1j.\frac{1}{p_j\cdot n} = 1 \quad \forall j \enspace .

That is, when pj=1np_j = \frac{1}{n} for each jj.

That is, when XX is uniformly distributed.


Published 12 Jul 2018