Presentation Schedule


Presenter Registration Banner 5

Revisiting Psycholinguistic Norms: Comparing Human and GPT-/DeepSeek-Derived Ratings on Concreteness, Imageability, Familiarity, Valence, and Arousal of 25,000+ Two-Character Chinese Words (94040)

Session Information:

Friday, 11 July 2025 15:45
Session: ECE Poster Session
Room: SOAS, Brunei Suite (Ground Floor)
Presentation Type:Poster Presentation

All presentation times are UTC + 1 (Europe/London)

In typical psycholinguistic norming studies, participants rate individual words on lexical variables (e.g., concreteness, valence). These ratings allow researchers to select stimuli for controlling or manipulating lexical variables (Tse et al., 2021) and examine how they influence lexical processing tasks, addressing questions in word recognition (Tse & Yap, 2018). However, collecting human rating data is time-consuming and labor-intensive. Recently, Large Language Models (LLMs) (e.g., GPT-4o) have been employed to approximate human ratings using conversational probes (e.g., Martínez et al., 2025; Trott, 2024). Extending this approach, our study investigated the relationship between human ratings (Chan & Tse, 2024) and ratings derived from two LLMs (GPT-4o-Turbo, DeepSeek-R1-FW) for the concreteness, imageability, familiarity, valence, and arousal of more than 25,000 two-character Chinese words. Among GPT, DeepSeek, and human ratings, valence yielded the strongest intercorrelation (mean = .82), followed by concreteness (.69), arousal (.64), imageability (.63), and familiarity (.58). Across these five variables, GPT and DeepSeek correlated similarly with human ratings (both mean = .65), which was lower than the correlation between GPT and DeepSeek themselves (.71). We further examined how these ratings predict lexical decision and naming performance (Tse et al., 2017, 2023), while controlling for orthographic, phonological, and semantic factors (Tse et al., 2023). Results indicate that although LLM-derived valence and familiarity ratings aligned with human ratings in predicting lexical decision and naming performance, the predictions diverged for concreteness, imageability, and arousal. These findings suggest caution in replacing human ratings with LLM-derived values when norming lexical variables.

Authors:
Xi Cheng, The Chinese University of Hong Kong, Hong Kong
Xi Huang, The Chinese University of Hong Kong, Hong Kong
Yuen-Lai Chan, Lingnan University, Hong Kong
Chi-Shing Tse, The Chinese University of Hong Kong, Hong Kong


About the Presenter(s)
Professor Chi-Shing Tse is a University Professor/Principle Lecturer at The Chinese University of Hong Kong in Hong Kong

Connect on Linkedin
https://www.linkedin.com/in/chi-shing-tse-44b7451b6/

See this presentation on the full scheduleFriday Schedule



Conference Comments & Feedback

Place a comment using your LinkedIn profile

Comments

Share on activity feed

Powered by WP LinkPress

Share this Presentation

Posted by James Alexander Gordon

Last updated: 2023-02-23 23:45:00