An experiment, led by the University of Salford’s Professor Richard Whittle and University College London’s Dr James Ransom, ran six leading AI (artificial intelligence) models as a synthetic electorate of 42,265 voters on the day of the Makerfield by-election.
Four of the six called the seat for Reform, whilst in reality, Andy Burnham, won it for Labour with a significant majority.
On the day Makerfield went to the polls, researchers at the University of Salford asked six of the most capable AI models to vote in the by-election, one simulated ballot at a time, across a synthetic electorate of 42,265 voters built from census data.
Four of the six predicted that Reform UK’s Robert Kenyon would take the seat. The result declared overnight was a Labour victory for Andy Burnham on 54.8%, a majority of 9,231, with Reform on 34.5%.
The most notable finding was not whether the models were right (they mostly were not), but how far they disagreed. Given the same briefing, the same instruction and the same voters, the six returned Reform vote shares ranging from 38.5% to 80.5%, a spread of more than 40 points. One, GPT-5.4-mini, came closest to the real outcome, placing Burnham on 60.7% against an actual 54.8%. Others predicted a Reform landslide of up to 80 per cent.
Every model reproduced familiar demographic patterns, with older voters and men more likely to back Reform. The levels were often far from reality, and most missed what decided the contest, namely Burnham’s personal standing, the tactical consolidation of anti-Reform voters behind Labour, and the new Restore Britain party, which took 6.8%. The models compressed a 14 name ballot to two or three contenders, writing off the Greens, Liberal Democrats and Conservatives, each of whom lost their deposit.
Professor Richard Whittle described the exercise less as a forecasting tool than as a caution for anyone tempted to use these systems to read public opinion. He said: “The headline for anyone thinking of using these models to gauge public opinion is a simple one. Which model you ask matters more than how you ask it.
“We gave six systems an identical electorate and an identical briefing, and they produced everything from a large Labour win to a Reform supermajority.”
“What the models are good at is the texture, the age gap and the gender gap turn up every time. What they are poor at is the politics, the things that swing a by-election, a strong local candidate, tactical voting, a new party entering the field. A synthetic electorate built from demographics alone votes its stereotypes, and Makerfield did not,” continued University College London’s Dr James Ransom, lead author of the study.
The researchers note that one model performed well. “GPT-5.4-mini came close to the real result.
“But you only learn which model was right after the votes are counted and, on the day, you would have had no principled way of choosing between them. A one-in-three chance of calling the winner is not a method anyone would want to brief a campaign on,” concluded Richard.






