Cross Entropy
In machine learning, the cross-entropy loss is a very popular loss function. Here’s the math definiton of it:
$$ H(P^* | P) = - \sum_{i} P^*(i) log P(i) $$
I have been using cross entropy for decades without truly understanding this loss function. Recently, I watched a few YouTube videos and want to share my most recent understanding of it.
Essentially, cross entropy is very useful for measuring the difference between two distributions.
Let’s imagine we want to build a classifier that outputs the prediction probability of an image. We denote our model’s probability function as ( P(x) ) and the real distribution as ( P^*(x) ).
We can use KL divergence to measure the difference between these functions.
Once we understand the concept of KL divergence, we just need to perform some mathematical operations to transform KL divergence into the final cross entropy formula.
Here are the YouTube videos that I found pretty useful: