Lightnews — Scholar-powered news

Tivadar Danka

@tivadardanka.bsky.social

Computationally speaking, decomposing models into graphs is the idea that fuels backpropagation.

My neural networks series is dedicated to explaining exactly what's going on in the picture above.

Here's everything you need to know: thepalindrome.org/p/introduct...

Introduction to Computational Graphs

Neural Networks From Scratch, Part I

thepalindrome.org

November 24, 2025 at 1:33 PM

Tivadar Danka

@tivadardanka.bsky.social

If you liked this post, you will love The Palindrome, my weekly newsletter on Mathematics and Machine Learning.

Join 35,000+ curious readers here: thepalindrome.org/

The Palindrome | Tivadar Danka | Substack

mathematics ∪ machine learning. Click to read The Palindrome, a Substack publication with tens of thousands of subscribers.

thepalindrome.org

November 23, 2025 at 1:32 PM

Tivadar Danka

@tivadardanka.bsky.social

Is the Bayesian viewpoint better than the frequentist one?

No. It's just different. In certain situations, frequentist estimations are perfectly enough. In others, Bayesian methods have the advantage. Use the right tool for the task, and don't worry about the rest.

November 23, 2025 at 1:32 PM

Tivadar Danka

@tivadardanka.bsky.social

To sum up: as a mathematical concept, probability is independent of interpretation. The question of frequentist vs. Bayesian comes up when we are building probabilistic models from data.

November 23, 2025 at 1:32 PM

Tivadar Danka

@tivadardanka.bsky.social

After this, we get a concrete formula for the posterior density.

(The symbol ∝ reads as “proportional to”, and we write this instead of equality because of the omitted denominator.)

November 23, 2025 at 1:32 PM

Tivadar Danka

@tivadardanka.bsky.social

Back to our coin-tossing example. Given the probability of heads, the likelihood can be computed using simple combinatorics.

November 23, 2025 at 1:32 PM

Tivadar Danka

@tivadardanka.bsky.social

Bad news: the evidence can be impossible to evaluate. Good news: we don’t have to! We find the parameter estimate by maximizing the posterior, and as the evidence doesn’t depend on the parameter at all, we can simply omit it.

November 23, 2025 at 1:32 PM

Tivadar Danka

@tivadardanka.bsky.social

In pure English,

• the likelihood describes the probability of the observation given the model parameter,
• the prior describes our assumptions about the parameter before the observation,
• and the evidence is the total probability of our observation.

November 23, 2025 at 1:32 PM

Tivadar Danka

@tivadardanka.bsky.social

Don't worry if this seems complex! We'll unravel it term by term.

There are three terms on the right side: the likelihood, the prior, and the evidence.

November 23, 2025 at 1:32 PM

Tivadar Danka

@tivadardanka.bsky.social

The Bayes formula connects the prior and the likelihood to the posterior.

November 23, 2025 at 1:32 PM

Tivadar Danka

@tivadardanka.bsky.social

What we want is to include the experimental observations in our estimation, which is expressed in terms of conditional probabilities.

This is called posterior estimation.

November 23, 2025 at 1:32 PM

Tivadar Danka

@tivadardanka.bsky.social

Our prior assumption about the probability is called, well, the prior.

For instance, if we know absolutely nothing about our coin, we assume this to be uniform.

November 23, 2025 at 1:32 PM

Tivadar Danka

@tivadardanka.bsky.social

In Bayesian statistics, we treat our probability-to-be-estimated as a random variable. Thus, we are working with probability distributions or densities.

Yes, I know. The probability of probability. It’s kind of an Inception-moment, but you’ll get used to it.

November 23, 2025 at 1:32 PM

Tivadar Danka

@tivadardanka.bsky.social

Let's stick to our coin-tossing example to show how this works in practice. Regardless of the actual probabilities, 90 heads from 100 tosses is a possible outcome in (almost) every case.

Is the coin biased, or were we just lucky? How can we tell?

November 23, 2025 at 1:32 PM

Tivadar Danka

@tivadardanka.bsky.social

Conditional probabilities allow us to update our probabilistic model in light of new information. This is called the Bayes formula, hence the terminology "Bayesian statistics".

Again, this is a mathematically provable fact, not an interpretation.

November 23, 2025 at 1:32 PM

Tivadar Danka

@tivadardanka.bsky.social

With conditional probabilities, we can quantify our intuition about the relation of rain and the clouds in the sky.

November 23, 2025 at 1:32 PM

Tivadar Danka

@tivadardanka.bsky.social

In probabilistic models, observing certain events can influence our beliefs about others. For instance, if the sky is clear, the probability of rain goes down. If it’s cloudy, the same probability goes up.

This is expressed in terms of conditional probabilities.

November 23, 2025 at 1:32 PM

Tivadar Danka

@tivadardanka.bsky.social

On the other hand, the Bayesian school argues that such estimations are wrong, because probabilities are not absolute, but a measure of our current beliefs.

This is way too abstract, so let's elaborate.

November 23, 2025 at 1:32 PM

Tivadar Danka

@tivadardanka.bsky.social

Frequentists leverage this to build probabilistic models. For example, if we toss a coin n times and heads come up exactly k times, then the probability of heads is estimated to be k/n.

November 23, 2025 at 1:32 PM

Tivadar Danka

@tivadardanka.bsky.social

As the number of observations grows, the relative frequency will converge to the true probability.

This is not an interpretation of probability. This is a mathematically provable fact, independent of interpretations. (A special case of the famous Law of Large Numbers.)

November 23, 2025 at 1:32 PM

Tivadar Danka

@tivadardanka.bsky.social

Suppose that we repeatedly perform a single experiment, counting the number of occurrences of the possible events. Say, we are tossing a coin and count the number of times it turns up heads.

The ratio of the heads and the tosses is called “the relative frequency of heads”.

November 23, 2025 at 1:32 PM

Tivadar Danka

@tivadardanka.bsky.social

Now comes the part that has been fueling debates for decades.

How can we assign probabilities? There are (at least) two schools of thought, constantly in conflict with each other.

Let's start with the frequentist school.

November 23, 2025 at 1:32 PM

Tivadar Danka

@tivadardanka.bsky.social

Note that at this point, there is no frequentist or a Bayesian interpretation yet!

Probability is a well-defined mathematical object. This concept is separated from how probabilities are assigned.

November 23, 2025 at 1:32 PM

Tivadar Danka

@tivadardanka.bsky.social

2. Throwing darts. Suppose that we are throwing darts at a large wall in front of us, which is our event space. (We'll always hit the wall.)

If we throw the dart randomly, the probability of hitting a certain shape is proportional to the shape's area.

November 23, 2025 at 1:32 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news