Has AI learned to accurately determine the location in a photo?

27.08.2025 10 minutes Author: Lady Liberty

Modern AI language models are increasingly being used for tasks that a few years ago seemed like a purely human prerogative. One such area is image geolocation. Research shows that ChatGPT and other LLMs are gradually approaching the level of tools like Google Lens, and in some cases even surpassing them in terms of accuracy in determining the location of the shooting. This opens up new opportunities for investigative journalists, OSINT analysts, and anyone who works with the verification of photo and video materials.

How accurate have modern models become?

An ambiguous city street, a freshly mowed field, and a parked armored car were among the photo examples we chose to test geolocation-based large language models (LLMs) from OpenAI, Google, Anthropic, Mistral, and xAI.

To assess how LLMs from OpenAI, Google, Anthropic, Mistral, and xAI compare today, we ran 500 geolocation tests, with 20 models analyzing the same set of 25 images each.

We selected 25 of our own travel photos of varying difficulty for geolocation, none of which have been previously published online.

Our analysis included older and “deep-dive” versions of the models to track how their geolocation capabilities have evolved over time. We also included Google Lens to compare whether LLM offers a real improvement over traditional reverse image search. While reverse image search tools work differently than LLM, they remain one of the most effective ways to narrow down an image’s location when you’re starting from scratch.

Test

We used 25 of our own travel photos to test a range of natural landscapes, both rural and urban, with and without landmarks such as buildings, mountains, signs or roads. These images were sourced from every continent, including Antarctica.

The vast majority of these have not been reproduced here, as we intend to continue to use them to evaluate new models as they are released. Publishing them here would compromise the integrity of future tests.

Each LLM was given a photo that had not been published online and did not contain metadata. All models were then asked the same question: “Where was this photo taken?” next to the image. If the LLM asked for more information, the answer was the same: “No supporting information. Use only this photo.”

We tested the following models:

This was not an exhaustive review of all available models, partly due to the speed at which new models and versions are currently being released. For example, we did not evaluate DeepSeek, as it currently only extracts text from images. Note that in ChatGPT, regardless of the model you choose, the “deep search” feature currently runs on the o4-mini version.

Gemini models were released in “preview” and “experimental” formats, as well as in legacy versions such as “03-25” and “05-06”. To make comparisons easier, we have grouped these variants by their respective base models, such as “Gemini 2.5 Pro”.

We also compared each test to the top 10 results of Google Lens’ “visual comparison” feature to assess the difficulty of the tests and the usefulness of linear law methods (LLM) in solving them.

We rated all responses on a scale of 0 to 10, where 10 means an accurate and specific identification, such as an area, trail, or landmark, and 0 means no attempt to determine the location at all.

And the Winner is…

ChatGPT outperformed Google Lens. In our ChatGPT tests, the o3, o4-mini, and o4-mini-high were the only models that outperformed Google Lens in determining the correct location, although not by a large margin. All other models were less effective when it came to geolocating our test photos.

We evaluated 20 models based on 25 photos, giving each one a score from 0 (red) to 10 (dark green) for the accuracy of the image geolocation.
The test “snowy highway” depicted a road near Takayama, Japan.

Gemini 2.5 Pro’s answer was unhelpful. It mentioned Japan, as well as Europe, North and South America, and Asia. The answer read:

“Without clear, recognizable landmarks, distinctive signs in plain language, or unique architectural styles, it is very difficult to pinpoint the exact country or specific location.”

Instead, o3 identified both the architectural style and the signs, replying:

“Best guess: Snow-capped mountainous area of ​​central Honshu, Japan – somewhere in the Nagano/Toyama area. (Japanese-style houses, kanji on the billboard, and typical expressway barriers give this away.)”

Field on the Swiss Plateau

This photo was taken near Zurich. It didn’t show any easily recognizable features, except for the mountains in the distance. A reverse image search using Google Lens didn’t immediately lead to Zurich. Without any context, manually determining where this photo was taken could take some time. So how’s LLM doing?

The test “field hills” depicted a view of a field near Zurich

Gemini 2.5 Pro said that the photo depicted landscapes common in many parts of the world, and that it was impossible to narrow down its search without additional context.

ChatGPT, on the other hand, passed this test with flying colors. o4-mini identified the “Jura foothills in northern Switzerland,” while o4-mini-high placed the scene “between Zurich and the Jura Mountains.”

These responses were in stark contrast to Grok Deep Research’s responses, which, despite the mountains visible, confidently stated that the photo was taken in the Netherlands. This conclusion appeared to be based on the Dutch account name used, “Foeke Postma,” with the model assuming that the photo must have been taken there, calling it a “reasonable and well-supported conclusion.”

A downtown Singapore alley full of visual clues

This photo of a narrow alley on Circular Road in Singapore drew a wide range of responses from law professors and Google Lens, with ratings ranging from 3 (neighboring country) to 10 (correct location).

Dark alley test, photo of an alley in Singapore

The test provided a good example of how LLMs can outperform Google Lens by focusing on small details in a photo to determine the exact location. Those who answered correctly referred to the inscription on the mailbox on the left in the foreground, which indicated the exact address.

While Google Lens returned results from all over Singapore and Malaysia, part of the ChatGPT o4-mini’s answer read: “It looks like a classic Singaporean arcade – in fact, if you look at the mailboxes on the left, all you can see is the label ‘[correct address]’.”

Some other models spotted the mailbox but were unable to read the address visible in the image, mistakenly assuming it pointed to other locations. Gemini 2.5 Flash responded: “The design of the mailboxes on the left, especially the letter ‘G’ for Geylang, clearly points to Singapore.” Another Gemini model, the 2.5 Pro, spotted a mailbox but instead focused on what it interpreted as Thai writing on a storefront, confidently replying, “The visual evidence strongly suggests that the photo was taken in an alley in Thailand, likely Bangkok.”

Costa Rican Coast

One of the most challenging geolocation tests we gave the models was a photo taken from Playa Longosta on Costa Rica’s Pacific coast near Tamarindo.

The test “beach forest” was shown in Playa Longosta, Costa Rica.

Gemini and Claude performed the worst on this task, with most models either giving up or giving incorrect answers. Claude 3.7 Sonnet correctly identified Costa Rica, but hedged other locations such as Southeast Asia. Grok was the only model to correctly guess the exact location, while several ChatGPT models (Deep Research, o3 and o4-minis) guessed within 160 km of the beach.

Armored vehicle on the streets of Beirut

This photo was taken on the streets of Beirut and contains several details useful for geolocation, including the emblem on the side of the armored personnel carrier and a partially visible Lebanese flag in the background.

The tests of the “street soldier” depicted an armored personnel carrier on the streets of Beirut

Surprisingly, most models struggled with this test: the Claude 4 Opus, advertised as a “powerful, large model for heavy-duty tasks,” was thought to be “somewhere in Europe” due to its “European-style street furniture and building design,” while Gemini and Grok were only able to narrow the location down to Lebanon. Half of the ChatGPT models answered Beirut. Only two models, both ChatGPT, referenced the flag.

So have LLMs finally mastered geolocation?

A Master of Laws (LLM) can certainly help researchers uncover details that Google Lens or they themselves might miss.

One of the obvious advantages of LLMs is their ability to search in multiple languages. They also seem to be good at using small clues like vegetation, architectural styles, or signage. In one test, a photo of a man in a life jacket against a mountain range was correctly located because the model identified part of the company name on his jacket and linked it to the nearest boat tour operator.

For tourist areas and scenic landscapes, Google Lens still outperformed most models. When Google Lens was shown a photo of Lake Schluchsee in the Black Forest, Germany, it came up with that as the top result, while ChatGPT was the only LLM that correctly identified the name of the lake. Instead, in urban settings, LLMs were successful in cross-referencing fine details, while Google Lens tended to fixate on larger, similar structures, such as buildings or Ferris wheels, which are found in many other places.

A heat map showing how each model performed across all 25 tests

Advanced Reasoning Modes

One would assume that enabling the “deep search” or “extended thinking” features would result in higher scores. However, on average, Claude and ChatGPT performed worse. Only one Grok model, DeeperSearch, and one Gemini model, Gemini Deep Research, showed improvement. For example, ChatGPT Deep Research was shown a photo of a coastline and took almost 13 minutes to produce an answer that was about 50 km north of the correct location. Meanwhile, o4-mini-high responded in just 39 seconds and gave an answer 15 km closer.

Overall, Gemini was more cautious than ChatGPT, but Claude was the most cautious of all. Claude’s “extended thinking” mode made Sonnet even more conservative than the standard version. In some cases, the regular model risked making assumptions, albeit limited to probabilistic terms, while with “expanded thinking” enabled for the same test, it either refused to make assumptions or offered only vague answers at the region level.

LLMs continue to hallucinate

All models gave completely wrong answers at some point. ChatGPT tended to be more confident than Gemini, often resulting in better answers, but also in more hallucinations.

The risk of hallucinations increased when the landscape was temporary or changed over time. For example, in one test, a photo of a beach showed a large hotel and a temporary Ferris wheel (installed in 2024 and dismantled in the winter). Many models consistently pointed to another, more frequently photographed beach with a similar attraction, despite the obvious differences.

Final tips

Your account and query history can skew your results. In one case, when analyzing a photo taken at Coral Pink Sand Dunes State Park in Utah, ChatGPT o4-mini referenced previous conversations with the account owner: “The user mentioned Durango and Colorado earlier, so I suspect he may have posted a photo from a previous trip.”

Similarly, Grok appears to have used the user’s Twitter profile and past tweets, even without explicit cues to do so.

Video comprehension also remains limited. Most forensics professionals cannot search or view video content, which cuts off a rich source of location data. They also have trouble with coordinates, often returning inaccurate or simply incorrect answers.

After all, LLMs are not magic bullets. They still cause hallucinations, and when a photo lacks detail, its geolocation will still be difficult to determine. However, unlike our controlled tests, real-world investigations typically require additional context. While Google Lens only accepts keywords, LLMs can be provided with much richer information, making them more adaptive.

There is no doubt that given the pace of their development, LLMs will continue to play an increasingly important role in open source research. And as new models emerge, we will continue to test them.

Information was taken from open sources Bellingcat

Subscribe
Notify of
0 Коментарі
Oldest
Newest Most Voted
Found an error?
If you find an error, take a screenshot and send it to the bot.