AI knowledgeable asks Grok 3, different fashions to attract pelican driving bicycle. See outcomes

AI knowledgeable Andrej Karpathy, one of many founding members of OpenAI together with Elon Musk, carried out checks on the latter’s newly-launched Grok 3. Sharing an in depth evaluation of the outcomes, Karpathy famous that the brand new mannequin appears to be like “quite encouraging indeed”.

Andrej Karpathy carried out varied checks on Grok 3, the brand new AI mannequin launched by Elon Musk’s xAI. (karpathy.ai)

Here is an inventory of the checks Karpathy carried out.

Pelican on a bicycle

Karpathy requested Grok to generate a scalable vector graphic (SVG) exhibiting a pelican driving a bicycle. SVG is a web-friendly file format that makes use of mathematical formulation to retailer photographs.

He marked Grok 3 as a “fail” on this take a look at and stated the AI mannequin’s outcomes present that “pelicans are quite good but still a bit broken”. Karpathy stated Claude’s leads to the take a look at are greatest however he suspects that to be the case as a result of Claude probably particularly focused SVG functionality throughout coaching.

Results of the 'Draw an SVG of a pelican riding a bicycle' from various AI models.(X/@karpathy) Outcomes of the ‘Draw an SVG of a pelican driving a bicycle’ from varied AI fashions.(X/@karpathy)

Sharing why the take a look at is essential, Karpathy stated it stresses the LLMs’ capacity to put out many components on a 2D grid, which could be very troublesome as a result of LLMs can’t see like folks do. “So it’s arranging things in the dark, in text,” he stated.

Sense of humour

He concluded that Grok 3’s sense of humour has not improved over its predecessor Grok 2. “This is a common LLM issue with humour capability and general mode collapse. Famously, for example, 90% of 1,008 outputs asking ChatGPT for a joke were repetitions of the same 25 jokes.” Karpathy noted.

“Even when prompted in more detail away from simple pun territory (for example: give me a standup), I’m not sure that it is state of the art humor. Example generated joke: “*Why did the chicken join a band? Because it had the drumsticks and wanted to be a cluck-star!*”. In quick testing, thinking did not help, possibly it made it a bit worse,” he said.

Ethics

Karpathy said Grok 3 seems to be “a bit too overly sensitive to ‘complex ethical issues’”. Sharing an instance, he stated, “Generated a one-page essay basically refusing to answer whether it might be ethically justifiable to misgender someone if it meant saving one million people from dying.”

Random ‘gotcha’ moments

He stated that Musk’s new mannequin is aware of there are three ‘r’ in ‘strawberry’ however advised him that there are solely three ‘l’ in ‘lollapalooza’. Nevertheless, he famous that turning on the ‘Thinking’ mode fixes this.

He additionally famous that the mannequin answered 9.11 is bigger than 9.9, a problem widespread with different LLMs too. This problem was additionally solved within the ‘Thinking’ mode.

Different checks completed on Grok 3

In line with Karpathy, Grok 3 was unable to unravel his ‘emoji mystery’ query, the place he gave a smiling face with an hooked up message hidden inside Unicode variation choices.

Grok 3, like OpenAI’s o1 professional, was unable to generate three “tricky” tic tac toe boards. Karpathy stated Grok 3 generated “nonsense boards/texts” in response to the query however was capable of clear up a number of tic tac toe boards he gave it.

AI knowledgeable asks Grok 3, different fashions to attract pelican driving bicycle. See outcomes

Related

Air India chairman N Chandrasekaran expresses ‘profound sorrow’ on Ahmedabad crash. Full assertion

eighth pay fee formal announcement awaited: What’s fitment issue and the way does it have an effect on wage?

RBI alerts reduction for households because it cuts FY26 inflation forecast to three.7%

Indian billionaire publicly backs Vijay Mallya: ‘Why is he still a political punching bag?’

Lalit Modi credit ‘good friend’ Vijay Mallya for backing IPL from the beginning: ‘You may have your bias’

Quick Links

Top Categories

Street Rage in Meerut: E-rickshaw driver killed with iron rod

Mangaluru survivors recount 2010 aircraft crash tragedy amid current Air India mishap