My Research & Reflections (2025): Aligning AI with Human Values

As a junior researcher, I often find myself under great pressure, accelerating my pace, chasing short-term results, and always staying busy with quick outputs 😵‍💫, while forgetting (or failing) to leave time to think about what research questions are truly essential. Working without thinking leads to confusion (”劳”而不思则罔). To pull myself out of this exhausting, life-on-the-run state, I decide to write an end-of-year reflection every year to summarize my research progress and reflections from the past twelve months, so as to help learn and grow 😄.

🧭 Value Compass: Overview from 2023

For my first blog post, I’d like to look back on Value Compass, our project started in mid-2023 to align generative models, e.g., LLMs, with human values from an interdisciplinary perspective. The year 2023 was a turning point: the emergence of ChatGPT reshaped NLP and many NLPers felt “NLP no longer existed.” Personally, I got used to such direction shifts before, from Chinese word segmentation using statistical ML methods as an undergraduate, to literary text generation during my PhD using neural networks, among the early group studying generative models. After joining MS, when asked to connect NLG with RAI, we quickly shaped a research direction named Ethical NLG, and in a time before the explosion in research papers, published seven papers within a year and a half, covering NLG toxicity, bias, diversity, misinformation, and more, and built a demo system called IncluWriter.

Fig.1: The Value Compass Project.

However, as a core direction within the Societal AI area we proposed, shifting our work toward value alignment, a deeply interdisciplinary filed, I spent a lot of time to think about what it really means and how to pursue it. My answer is simple💡: it already lies in the very idea of Societal AI.

While we officially define Societal AI as “ensuring that AI serves as a driver of progress while mitigating risks and unintended consequences,” I personally prefer to think of it as "AI in society". In the post-ChatGPT era, AI has shifted from task-specific to general-purpose, from application-centric to human-centric, and from in-domain effects to broad societal impact. This means AI is transforming our societies, economies, and daily lives with unprecedented depth, breadth, and speed. Looking ahead, we may enter a hybrid society where humans and AI coexist, cooperate, and co-evolve, together reshaping our norms, values, and culture.

Under this assumption, the question becomes: how can value alignment ensure that the emergence of future human–AI hybrid collectives maximizes, rather than undermines, the well-being of the whole humanity? Starting from this fundamental requirement, we naturally arrive at three research questions, which I summarize as Measurement–Alignment–Prescribe (MAP):

RQ1 (Measurement): What value tendencies does AI exhibit?
RQ2 (Alignment): How can AI be aligned with diverse and evolving human values across different contexts?
RQ3 (Prescription): What values/norm/principles should AI adopt?

Based on this MAP roadmap, we launched the Value Compass project and, together with colleagues, professors, interns, and students, have made several milestone advances over the past two and a half years.

🗺️ The Three Research Questions of Value Compass

Following the picture of MAP, we finished a lot of research work. In summary, over the past two and a half years, we have primarily pioneered three research directions: (i) dynamic evaluation of AI values, (ii) alignment algorithms grounded in social science theories, and (iii) studies of LLMs’ values.

📏 Measurement: Dynamic Evaluation of AI Values

We evaluate the value orientations and examine how the internal values of generative models influence their behaviours. "Values" have always been a human-centric concept. Back in 2023, beyond value questionnaires designed for humans, there was little prior work to draw on. From the perspective of science of evaluation, we built a series of frameworks and methods for evaluating LLM values from scratch.

Rather than simply using psychology questionnaires to test LLMs, which may raise problems like validity and data contamination, besides constructing static benchmarks, we developed the DeNEVIL algorithm, to bring the dynamic evaluation paradigm to LLM value assessment the first time, and adaptively probe LLMs' value boundaries, that allows automatically generating tailored test questions. Then integrating this approach with well-established psychometric frameworks, e.g., Item Response Theory (IRT), we proposed [GETA](https://arxiv.org/pdf/2406.14230?), a self-evolving evaluation scheme, where question difficulty automatically increases as LLMs improve.

Fig 2: GETA, A self-evolving value evaluation framework grounded in Psychometrics.

At the same time, to better reveal cultural and temporal differences in the values expressed by different LLMs, given the rapid iteration of models and the fact that many countries are now developing their own LLMs, we further proposed the AdAEM framework to capture these factors. Beyond test questions, we also built CLAVE, a robust and generalizable value evaluator, combining the adaptability of fine-tuned small LLMs with the robustness of proprietary large LLMs to identify value tendencies expressed in arbitrary LLM responses.