Large Language Models are Geographically Biased

February 5, 2024

Rohin Manvi, Samar Khanna, Marshall Burke, David Lobell, Stefano Ermon

Computer Science

Computation and Language

Artificial Intelligence

Computers and Society

Machine Learning

Large Language Models (LLMs) inherently carry the biases contained in their training corpora, which can lead to the perpetuation of societal harm. As the impact of these foundation models grows, understanding and evaluating their biases becomes crucial to achieving fairness and accuracy. We propose to study what LLMs know about the world we live in through the lens of geography. This approach is particularly powerful as there is ground truth for the numerous aspects of human life that are meaningfully projected onto geographic space such as culture, race, language, politics, and religion. We show various problematic geographic biases, which we define as systemic errors in geospatial predictions. Initially, we demonstrate that LLMs are capable of making accurate zero-shot geospatial predictions in the form of ratings that show strong monotonic correlation with ground truth (Spearman's $\rho$ of up to 0.89). We then show that LLMs exhibit common biases across a range of objective and subjective topics. In particular, LLMs are clearly biased against locations with lower socioeconomic conditions (e.g. most of Africa) on a variety of sensitive subjective topics such as attractiveness, morality, and intelligence (Spearman's $\rho$ of up to 0.70). Finally, we introduce a bias score to quantify this and find that there is significant variation in the magnitude of bias across existing LLMs.

Geographic and Geopolitical Biases of Language Models

December 20, 2022

94% Match

Fahim Faisal, Antonios Anastasopoulos

Computation and Language

Pretrained language models (PLMs) often fail to fairly represent target users from certain world regions because of the under-representation of those regions in training datasets. With recent PLMs trained on enormous data sources, quantifying their potential biases is difficult, due to their black-box nature and the sheer scale of the data sources. In this work, we devise an approach to study the geographic bias (and knowledge) present in PLMs, proposing a Geographic-Represen...

Find SimilarView on arXiv

Evaluation of Geographical Distortions in Language Models: A Crucial Step Towards Equitable Representations

April 26, 2024

93% Match

Rémy Decoupes, Roberto Interdonato, Mathieu Roche, ... , Valentin Sarah

Computation and Language

Language models now constitute essential tools for improving efficiency for many professional tasks such as writing, coding, or learning. For this reason, it is imperative to identify inherent biases. In the field of Natural Language Processing, five sources of bias are well-identified: data, annotation, representation, models, and research design. This study focuses on biases related to geographical knowledge. We explore the connection between geography and language models b...

Find SimilarView on arXiv

GeoLLM: Extracting Geospatial Knowledge from Large Language Models

October 10, 2023

92% Match

Rohin Manvi, Samar Khanna, Gengchen Mai, Marshall Burke, ... , Ermon Stefano

Computation and Language

Machine Learning

The application of machine learning (ML) in a range of geospatial tasks is increasingly common but often relies on globally available covariates such as satellite imagery that can either be expensive or lack predictive power. Here we explore the question of whether the vast amounts of knowledge found in Internet language corpora, now compressed within large language models (LLMs), can be leveraged for geospatial prediction tasks. We first demonstrate that LLMs embed remarkabl...

Find SimilarView on arXiv

On the Scaling Laws of Geographical Representation in Language Models

February 29, 2024

92% Match

Nathan Godey, la Clergerie Éric de, Benoît Sagot

Computation and Language

Artificial Intelligence

Language models have long been shown to embed geographical information in their hidden representations. This line of work has recently been revisited by extending this result to Large Language Models (LLMs). In this paper, we propose to fill the gap between well-established and recent literature by observing how geographical knowledge evolves when scaling language models. We show that geographical knowledge is observable even for tiny models, and that it scales consistently a...

Find SimilarView on arXiv

Are Large Language Models Geospatially Knowledgeable?

October 9, 2023

92% Match

Prabin Bhandari, Antonios Anastasopoulos, Dieter Pfoser

Computation and Language

Despite the impressive performance of Large Language Models (LLM) for various natural language processing tasks, little is known about their comprehension of geographic data and related ability to facilitate informed geospatial decision-making. This paper investigates the extent of geospatial knowledge, awareness, and reasoning abilities encoded within such pretrained LLMs. With a focus on autoregressive language models, we devise experimental approaches related to (i) probin...

Find SimilarView on arXiv

Distortions in Judged Spatial Relations in Large Language Models: The Dawn of Natural Language Geographic Data?

January 8, 2024

91% Match

Nir Fulman, Abdulkadir Memduhoğlu, Alexander Zipf

Computation and Language

We present a benchmark for assessing the capability of Large Language Models (LLMs) to discern intercardinal directions between geographic locations and apply it to three prominent LLMs: GPT-3.5, GPT-4, and Llama-2. This benchmark specifically evaluates whether LLMs exhibit a hierarchical spatial bias similar to humans, where judgments about individual locations' spatial relationships are influenced by the perceived relationships of the larger groups that contain them. To inv...

Find SimilarView on arXiv

HERB: Measuring Hierarchical Regional Bias in Pre-trained Language Models

November 5, 2022

91% Match

Yizhi Li, Ge Zhang, Bohao Yang, Chenghua Lin, Shi Wang, ... , Fu Jie

Computation and Language

Fairness has become a trending topic in natural language processing (NLP), which addresses biases targeting certain social groups such as genders and religions. However, regional bias in language models (LMs), a long-standing global discrimination problem, still remains unexplored. This paper bridges the gap by analysing the regional bias learned by the pre-trained language models that are broadly used in NLP tasks. In addition to verifying the existence of regional bias in L...

Find SimilarView on arXiv

A Comprehensive Survey of Bias in LLMs: Current Landscape and Future Directions

September 24, 2024

91% Match

Rajesh Ranjan, Shailja Gupta, Surya Narayan Singh

Computation and Language

Artificial Intelligence

Computers and Society

Human-Computer Interaction

Large Language Models(LLMs) have revolutionized various applications in natural language processing (NLP) by providing unprecedented text generation, translation, and comprehension capabilities. However, their widespread deployment has brought to light significant concerns regarding biases embedded within these models. This paper presents a comprehensive survey of biases in LLMs, aiming to provide an extensive review of the types, sources, impacts, and mitigation strategies r...

Find SimilarView on arXiv

This Land is {Your, My} Land: Evaluating Geopolitical Biases in Language Models

May 24, 2023

91% Match

Bryan Li, Chris Callison-Burch

Computation and Language

Do the Spratly Islands belong to China, the Philippines, or Vietnam? A pretrained large language model (LLM) may answer differently if asked in the languages of each claimant country: Chinese, Tagalog, or Vietnamese. This contrasts with a multilingual human, who would likely answer consistently. In this work, we show that LLMs recall geopolitical knowledge inconsistently across languages -- a phenomenon we term geopolitical bias. As a targeted case study, we consider territor...

Find SimilarView on arXiv

Large Language Models are Biased Because They Are Large Language Models

June 19, 2024

90% Match

Philip Resnik

Computation and Language

Artificial Intelligence

This paper's primary goal is to provoke thoughtful discussion about the relationship between bias and fundamental properties of large language models. We do this by seeking to convince the reader that harmful biases are an inevitable consequence arising from the design of any large language model as LLMs are currently formulated. To the extent that this is true, it suggests that the problem of harmful bias cannot be properly addressed without a serious reconsideration of AI d...

Find SimilarView on arXiv