Skip to content

The Statistical Foundations of Machine Learning

Bruce Desmarais
April 18, 2025

Why your stats background is more useful than you might think

Gain hands-on experience with machine learning tools and methodologies in Machine Learning and Advanced Machine Learning, both taught by Professor Desmarais. Join one or both seminars to learn how to apply machine learning techniques to enhance your research endeavors.

If you learned statistics in, say, economics, political science, psychology, or another discipline outside computer science, you may have encountered statistics and machine learning as two distinct toolkits—each with its own goals, language, and techniques.

In traditional statistics, the emphasis is often on inference: estimating parameters, testing hypotheses, quantifying uncertainty, and uncovering causal relationships. Machine learning, on the other hand, often seems focused on prediction, algorithmic performance, and computational efficiency.

It’s not uncommon for people in my machine learning workshops to come in thinking that statistics and machine learning are fundamentally different—philosophically, methodologically, or both. But the truth is: statistics and machine learning are deeply intertwined. In fact, statistical thinking provides the foundation for much of what we call machine learning today.

LEARN MORE IN A SEMINAR WITH BRUCE DESMARAIS

Let’s walk through two key ideas that illustrate this connection:

  1. How regression lies at the shared core of both fields.
  2. Why statistical and probabilistic thinking is essential for understanding machine learning models.

Regression: Common Ground

Regression is a statistical workhorse—and also the gateway into machine learning.

In a stats course, you probably learned linear regression as a tool for modeling relationships between variables, interpreting coefficients, and maybe even testing causal hypotheses. You worried about things like confounding, heteroskedasticity, and whether your model met key assumptions.

In machine learning, you might use the same model, but the focus is different: instead of estimating parameters, you’re optimizing predictive performance. The algorithm might make no claims about causality, but it aims to minimize prediction error on new, unseen data.

A Real-World Example

In a project I co-authored for the American Journal of Political Science, we analyzed how contributions from political action committees (PACs) affected U.S. House election outcomes. The dependent variable was the percentage of the vote won by the challenger.

We began with a traditional ordinary least squares (OLS) regression model. But we hit a snag: the dataset included about 2,000 candidates—but over 25,000 active PACs. Many candidates received contributions from hundreds of PACs, but there wasn’t nearly enough data to estimate an effect for each one.

Enter Lasso regression—a regularized machine learning method that shrinks or drops coefficients based on their contribution to model performance. By applying Lasso, we filtered out PAC indicators that didn’t meaningfully improve prediction, allowing us to adjust only for those that significantly improved predictive performance.

Same structure as linear regression. Different goal. Different estimation strategy. Same statistical roots.

Why Machine Learning Needs Statistics

Many machine learning models are, at their core, statistical models.

Take logistic regression: a staple in applied statistics for binary outcomes, and also one of the most widely used classification tools in machine learning. It works well because:

  • It models probabilities
  • It has a well-defined likelihood function that can be globally optimized
  • It’s compatible with regularization techniques like Lasso or Ridge.

Or consider Naive Bayes, a favorite for text classification. It’s just Bayes’ Theorem in action—calculating class probabilities based on observed features.

Zooming out further, we see that machine learning leans on several core statistical concepts:

  • The bias-variance tradeoff explains why complex models can overfit and fail to generalize.
  • Cross-validation provides a principled way to estimate out-of-sample accuracy.
  • Loss functions like squared error or log loss derive from assumptions about data distributions.

Even when machine learning feels like a different animal, its bones are statistical.

Bridging the Gap

So, what does this mean for your training or practice?

It means your background in statistics is not just helpful—it’s essential. Understanding things like assumptions, bias, variance, and estimation gives you a powerful edge as a machine learning practitioner.

At the same time, machine learning can push your statistical skills further—helping you tackle high-dimensional problems, reframe classical tools for predictive goals, and think more critically about model performance in the real world.

In short: machine learning isn’t a rejection of statistics—it’s a modern extension of it.

Share

Leave a Reply

Your email address will not be published. Required fields are marked *