Neural networks nearly ready to sit A-level maths

I recently saw this video about a model called Minerva, developed by a couple of Google teams, that appears to be much better at the kind of mathematical reasoning required for A-level maths than I’d have expected: Is Google’s New AI As Smart As A Human? 🤖 - YouTube . See this blog post too: Minerva: Solving Quantitative Reasoning Problems with Language Models – Google AI Blog, and the sample results: Minerva Explorer . It’s surprising because it doesn’t use the reasoning frameworks provided by theorem provers, nor does it even delegate the necessary calculations to a maths library like NumPy; it’s just learnt the patterns in how people write solutions to these kinds of problems.

I haven’t read much of the paper yet, but I’m very curious about what the limits of this approach are. E.g. how well does it generalise to question styles that were not present in the training set? How many significant figures can the numbers in the questions have before its success rate plummets? Interestingly, the paper says that the largest model (540 billion parameters), which achieved the best performance, is undertrained. So it would also be interesting to know how much better it would be if it was fully trained. More importantly, it will also be interesting to see if it will ever be possible to achieve comparable results with much smaller models, because the financial cost of training such massive models puts this approach out of reach for most organisations, and the environmental cost is troubling too.

This was actually a very interesting approached to a more generalised model based on language processing rather than pure theorem validation. While your concerns seem valid, it might prove to one day be an effective technique to apply in conjunction with other mathematical models?

Yeah, I think there’s a lot of interest, across all applications, in figuring out how to incorporate our existing knowledge about things like how to reason into deep learning, so that the training process doesn’t have to attempt to completely reinvent the wheel.

1 Like