For many in the Field of Artificial Intelligence (AI) the holy grail is the development of human-like Artificial General Intelligence, or AGI. This is postulated to be the level of AI where the machine can show human intelligence, not just algorithmic, heuristic or complex pattern-matching but having the ability to reason, think, analyze and create (even feel passion, love, despair?).
If an AI system can do that at the level of humans, collectively then we bestow upon it the degree of “AGI” – or Artificial General Intelligence. Lo we have created a machine clone of the overall collective intelligence of the human race! (Note we are not talking here about one human expert in one field who may not know much outside his area of expertise, but about the ability of humans to share, use and advance the knowledge obtained by everyone.)
The problem is that we need to create an objective test to decide whether we have actually reached AGI – something like a Turing Test proposed many decades ago to decide whether we are interacting with a computer or a human. We need to subject today’s systems at the forefront of AI, such as ChatGPT from OpenAI, Gemini from Google, Grok from XAi etc., to a rigorous exam to gauge their advancement toward the holy grail of AGI. And this is what researchers at the forefront of this field have tried to do. They devise problems in different fields, both scientific and artistic, to test the problem-solving ability of the system in new situations, where they simply cannot look up solutions on the Internet or blindly compute complex patterns on the corpus of human knowledge that is documented and available for reference. There are problem sets, such as, GPQA (Google Proof Questions and Answers) or MMLU (Massive Multitask Language Understanding) or HLE (Humanity’s Last Exam) and others, that give AI systems a score – a 100% score means that the system has reached AGI.
Depending on who you believe we are now beginning to see scores of more than 90%. AI systems today are so good that it is becoming very difficult for individual experts to give it a challenging exam.
AI and Math
Organizations such as Epoch AI for instance want to devise Math Problem Sets (that human experts find challenging) to test the Math abilities of AI. First the systems were tested on SAT exam questions – they easily solved those. Then they were challenged with questions from the International Math Olympiad (IMO). These questions are only solvable by a few experts in Mathematics and even the gold medalists score at a sub-50% level. The AI systems now greatly exceed these levels.
My friend, Ravi Vakil, a Mathematics Professor at Stanford is currently President of the American Mathematical Society (AMS). Under his leadership a subgroup at AMS, called Advisory Group on AI in Mathematics, is being reinvigorated.
He, and a select group of mathematicians, who recently met in Berkeley, are now busy devising harder and harder exams for AI. The pace of advancement in AI being able to solve these problems is staggering. An optimist at heart, I made a wager with Ravi that AI will be solving problems like the Riemann Hypothesis within the next five years. He doesn’t believe that will happen. If it does he will owe me lunch!
ASI – Artificial Super Intelligence
It is now believed by the AI community that the improvement in AI is not going to stop at AGI. The next step is superhuman intelligence or ASI. These are systems vastly superior to humans – a human cannot compete. It will be like a human grandmaster playing chess with Super Grandmaster AI. The human will lose almost 100% of the time. At this point we will not be able to give AI meaningful exams. If you want to test the chess superintelligence of a Chess-playing ASI system the problems will have to be devised by ASI itself. This same thing is coming into being for other areas of human learning and cognition.
Grok -4
This week XAi, a company launched less than two years ago by Elon Musk, released their latest version of an AI System, called Grok 4. The announcement of its release was accompanied by a demonstration of its prowess, and it was impressive indeed. Grok 4 is being tested on Humanity’s Last Exam (HLE), the best questions that humans can formulate to test an AI system. After that no human or humans collectively will presumably be able to devise imaginative enough questions to test AI systems which will have reached ASI. Grok 4 now, it was claimed at the release, can outperform the best Ph. D.’s in every field! Just think about what that implies – it can do better than the best Physics Ph. D, the best in Math, Biology, Architecture, Psychology, Humanism, Ethics etc. through a collaborative bunch of domain expert AI Agents. This could just be hyperbole accompanying the release, but the developers certainly seemed confident that it is inevitably soon to come. In my opinion it doesn’t matter if an AI system beats a Ph. D. – even if it becomes a knowledgable multi-domain expert it is very impressive.
Please watch this video from Peter Diamandis, a known and respected futurist:
In the next year or so, (again this is a claim from the developers) Grok 4 is expected go beyond simply reflecting known knowledge – it will advance to creating new science, new cosmological models, new insights into biology, genetics, diseases, evolution, quantum computing, encryption, space exploration and the like. It will devise new strategies for advancing science, waging war and advancing social, business and governmental roles for human peace and prosperity – that at least is the hope. ASI will hopefully be a tool still in human hands, and it is humankind’s dream that our collective good nature will prevail, specially in an era of unimaginable prosperity possible through ASI. We will train AI as we train our own next generation of kids, with values, virtue and collaborative goals for the betterment of humans.
Grok 4’s release, and its claimed abilities to solve complex problems at a high level of expertise, is indeed a watershed moment for AI. Tesla has already announced that it will integrate Grok 4 into their cars, as a voice assistant to intelligent mobility issues beyond Full Self Driving, and as an intelligent companions while “driving”.
I feel exhilarated, if a bit wary, to be a witness to this coming new Wave!
















In essence Modern Monetary Theory (MMT) says government deficits are not necessarily bad. Since our government has the authority to “print” money it cannot go bankrupt. In this way the government is not like a household which must balance its income with its expenditures. We shouldn’t even label the gap between taxes and public spending as a “deficit”. To the economy at large, such as the American households, the small businesses, the NGO’s, the corporations etc. the extra spending by government means a surplus. So the deficit is really a surplus when viewed from the point of view of the actual economy.










The second mathematical curiosity is the magic square. A magic square is an arrangement of the numbers from 1 to n-squared in an nxn matrix, with each number occurring exactly once, and such that the sum of the entries of any row, any column, or any main diagonal is the same. In the figure to the right we have 4X4 magic square with numbers from 1 to 16 arranged in a 4X4 pattern and each row, column and diagonal adding up to 34.