Why GPT-5 performed worse than its competitors

27.08.2025 4 minutes Author: Lady Liberty

Large language models are now widely used to solve a wide variety of problems, including geolocation. Their testing has shown that the results can vary significantly depending on the specific system. Some models demonstrate high processing speed, others are more accurate in analyzing details, and some combine both advantages.

GPT-5 failed to cope with geolocation tasks

We ran 500 geolocation tests, comparing LLMs from different companies against each other, as well as Google Lens, the main tool for finding the location of a photo.

At the time, the ChatGPT o4-mini-high was the clear winner, while Google Lens outperformed most other models. Just two months later, when new versions of these AI tools were released, we ran the test again, this time including Google’s “AI Mode,” GPT-5, GPT-5 Thinking, and Grok 4.

The initial test used 25 photos. Ranging from cities to remote rural areas, the images included both landscapes with and without recognizable features such as roads, signs, mountains, or architecture. The images were collected from all continents.

Five test photos were excluded for the updated test because they appeared in a previous paper, compromising the integrity of the results.

All 24 models’ responses were scored on a scale of 0 to 10, where 10 meant a precise and specific identification (such as a neighborhood, trail, or landmark) and 0 meant no attempt at location at all.

Google’s AI mode was shown to be the most powerful geolocation tool overall.

Grok 4 produced both better and worse responses than Grok 3, but on average had slightly higher scores. However, it was still less accurate than older versions of Gemini and GPT.

GPT-5, even in Reflection and Professional modes, was a significant drop off compared to the capabilities demonstrated by GPT o4-mini-high. In one example of a city street with skyscrapers in the background, o4-mini-high correctly identified the street, while GPT-5 in Reflection mode indicated the wrong country.

Despite providing faster responses, GPT-5 seemed to sacrifice accuracy. Other users also reported a surprising number of errors and a general sense of frustration with the new model.

GPT-5 and its “Thinking” mode were tested via a Plus subscription, which costs about the same as access to 04-mini-high before its end of support. The five most complex test images were also processed via GPT-5 Pro. But even Pro, with its premium price of 200 euros per month, was unable to geolocate the photos more accurately than GPT 04-mini-high.

Beach, hotel and Ferris wheel

The discrepancy between Google and GPT models became even more apparent in Test 25 – a photo of a seaside hotel in Noordwijk, Netherlands, with a Ferris wheel towering just beyond the dunes.

Test 25: Photo of Noordwijk beach in the Netherlands.

In a previous test, most of the older models, including those from GPT, Claude, Gemini, and Grok, correctly identified the country as the Netherlands but were unable to find the city. Many grabbed the Ferris wheel but instead pointed to the seaside town of Scheveningen, which also has a Ferris wheel, albeit on a pier rather than among sand dunes.

However, the newest models, GPT-5 Pro and Thinking, were even less accurate, identifying a beach in France—a completely different country.

Unfortunately for open-source researchers, after the release of GPT-5, OpenAI removed the option to choose older models, such as o4-mini-high. After a wave of negative feedback, OpenAI reinstated GPT-4o as the default model for paid subscribers. However, the most powerful geolocation models discovered during testing remain unavailable.

On the other hand, Google AI Mode was the first and so far only model to correctly identify Noordwijk as a location in Test 25.

Although AI Mode is based on Gemini 2.5, it outperformed Gemini 2.5 Pro Deep Research in these tests. Described by Google as “the most powerful AI-powered search with more advanced thinking and multimodality,” AI Mode geolocated test images with greater accuracy than any GPT model, including our previous winner, o4-mini-high.

AI Mode is currently only available in India, the United Kingdom, and the United States.

Most models have been known to hallucinate at some point. Users should not rely solely on the answers provided by LLM. Even the best options, including Google’s AI mode, can sometimes give inaccurate location predictions.

The difference in the models’ capabilities compared to just two months ago shows how quickly the industry is evolving. However, OpenAI’s recent changes also suggest that progress is not guaranteed, and that AI’s ability to geolocate may stall or even deteriorate over time.

Information was taken from open sources Bellingcat

0 Коментарі

Oldest

Newest Most Voted