← All writing

Physics priors are a free lunch (when you season them right)

There's a school of thought that says modern neural networks have so much capacity, you might as well let them learn the physics from data. Just give them more data, the argument goes, and the laws of conservation will fall out of the loss naturally. I've spent the last two years arguing the opposite, and I have the experiments to back it up.

The short version: if you know that mass is conserved, or that a quantity must be non-negative, or that a field must be divergence-free — tell the network. The right place to tell it is in the loss, not in a post-hoc projection. Done correctly, this is one of the closest things to a free lunch I've encountered in deep learning.

The setup

Most of my work is on tomographic reconstruction — given a stack of 2D projections, recover the 3D structure. It's an inverse problem, and like all inverse problems, it's ill-posed: many different 3D volumes are consistent with the same projections.

The classical answer is to add a regularizer that prefers "nice" solutions. The deep-learning answer is usually to learn a prior from data. Both work; both have failure modes. Physics priors give you a third option that composes with either.

Conservation laws aren't constraints on the answer — they're constraints on the space of plausible answers. Encoding them shrinks the search space without throwing away anything you'd ever want to keep.

Three ways to enforce a prior

Roughly, you can enforce a physical constraint at three places in the pipeline:

  1. In the architecture — e.g. a softplus output to guarantee positivity.
  2. In the loss — penalize violations during training.
  3. In a post-processing step — project the prediction onto the feasible set after the fact.

I've tried all three on the same problem. The architectural approach is the cleanest but the least flexible — it only works for trivial constraints. The post-hoc projection always works but you pay an accuracy cost, because the network never learns to produce something projectable.

The loss is where the magic happens.

Why the loss wins

When you penalize a constraint violation in the loss, gradient descent doesn't just learn to satisfy the constraint — it learns features that make satisfying the constraint easy. That's a subtle but enormous distinction. The network reorganizes its internal representations to keep conservation laws cheap.

In the atom-probe paper, switching from a post-hoc mass-balance projection to a soft mass-balance penalty in the loss cut our reconstruction error by ~18% at the same number of projections. No new data, no new architecture, no extra inference cost.

The one case where it doesn't work

Soft penalties have one well-known pathology: if the penalty weight is too high, you optimize for the constraint at the expense of the data fit. The network learns to be perfectly conservative and perfectly wrong.

The fix is unromantic: schedule the penalty weight. Start small, ramp up. We use a cosine schedule that hits its peak around 70% of training. It's not principled — it's just what works.

When to reach for this

Physics priors are most useful when:

If those three things are true, encoding the prior in the loss is almost always worth a try. It's a small change to your training code and a potentially large win at inference.

— Cedric