Open vs Closed Models on a Reasoning Test
The research question
How does a freely downloadable open model compare to a closed commercial model on the same reasoning problems?
Abstract
I gave an open model and a closed model the same set of logic puzzles and scored them. The closed model scored higher, but the open model was closer than expected.
Background
Open models can be run by anyone, which matters for AI sovereignty. I wanted to measure the capability gap on reasoning.
What I did
I built a set of 20 logic and word problems with known answers and ran both models three times each.
What I found
The closed model answered more correctly, especially on multi-step problems, but the open model still solved a clear majority.
What's next
I would test whether more careful prompting closes more of the gap.
Takeaway
Closed models still lead on hard reasoning — but open models are capable enough to take seriously.