Homepage - Liheng Chen

Liheng Chen

BEng(CS) University of Hong Kong (2025)

I am an incoming master student at the University of Oxford. Previously, I was an undergraduate student researcher from the University of Hong Kong (HKU). My research interests include Agentic LLMs, Parameter-Efficient Fine-Tuning (PEFT) and conditional non-autoregressive text generation. I am also interseted in 📷photography and ⛰️hiking.

Love is the one thing we're capable of perceiving that transcends dimensions of time and space

Dr. Amelia Brand in Interstellar

clh648(at)connect.hku.hk Google Scholar GitHub Twitter LinkedIn ORCID

🎓 Education

University of Oxford

Master of Science in Advanced Computer Science (ACS)

Oct. 2025 - Dec. 2026
University of Hong Kong

Bachelor of Engineering (Computer Science)

Sep. 2021 - Jul. 2025
University of California, Berkeley

Visiting Student

Jan. 2024 - May. 2024
Fudan University

School of Economics (SOE) Winter School

Dec. 2022 - Jan. 2023

🔬 Experience

Shanghai Artificial Intelligence Laboratory

NLP Researcher Intern

Jun. 2024 - Oct. 2024
Univeristy of Hong Kong

Research Assistant

Jul. 2021 - Feb. 2024

🎊 Honors & Awards

Teaching Development and Language Enhancement Grant (TDLEG)

2024
HKU Reaching Out Award (ROA) Exchange Scholarship

2024
Dean's Honors List, Department of Computer Science, HKU

2022
China Soong Ching Ling Foundation Zhiyuan Bursary Recipient

2021

News

2025

🔥🔥🔥 OS-Genesis is accepted by ACL 2025 (Main conference)!

May 14

🔥🔥🔥 Yue Benchmark is accepted by NAACL 2025 (Findings)!

Jan 22

🔥🔥🔥 MoS and OS-ATLAS are accepted by ICLR 2025! See you in 🦁Singapore!

Jan 22

🔥🔥🔥 OS-ATLAS and OS-Genesis are published!

Jan 01

2024

🔥🔥🔥 DoT is accepted by NeurIPS 2024 (Poster)!

Sep 24

🔥🔥🔥 ProLoRA is accepted by ACL 2024 (Main conference)!

May 15

🔥🔥🔥 HiddenKey is accepted by ACL 2024 (Findings)! See you in 🏝️🥥Thailand!

May 15

Selected Publications (view all )

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

Qiushi Sun*, Kanzhi Cheng*, Zichen Ding*, Chuanyang Jin*, Yian Wang, Fangzhi Xu, Zhenyu Wu, Liheng Chen, Chengyou Jia, Zhoumianze Liu, Ben Kao, Guohao Li, Junxian He, Yu Qiao, Zhiyong Wu (* equal contribution)

Annual Meeting of the Association for Computational Linguistics 2025 ACL 2025

Graphical User Interface (GUI) agents powered by Vision-Language Models (VLMs) have demonstrated human-like computer control capability. Despite their utility in advancing digital automation, a critical bottleneck persists: collecting high-quality trajectory data for training. Common practices for collecting such data rely on human supervision or synthetic data generation through executing pre-defined tasks, which are either resource-intensive or unable to guarantee data quality. Moreover, these methods suffer from limited data diversity and significant gaps between synthetic data and real-world environments. To address these challenges, we propose OS-Genesis, a novel GUI data synthesis pipeline that reverses the conventional trajectory collection process. Instead of relying on pre-defined tasks, OS-Genesis enables agents first to perceive environments and perform step-wise interactions, then retrospectively derive high-quality tasks to enable trajectory-level exploration. A trajectory reward model is then employed to ensure the quality of the generated trajectories. We demonstrate that training GUI agents with OS-Genesis significantly improves their performance on highly challenging online benchmarks. In-depth analysis further validates OS-Genesis's efficiency and its superior data quality and diversity compared to existing synthesis methods.

[Website] [Paper] [Code] [Checkpoints] [Data] GitHub Repo stars

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

Annual Meeting of the Association for Computational Linguistics 2025 ACL 2025

[Website] [Paper] [Code] [Checkpoints] [Data] GitHub Repo stars

OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Zhiyong Wu*, Zhenyu Wu*, Fangzhi Xu*, Yian Wang*, Qiushi Sun, Chengyou Jia, Kanzhi Cheng, Zichen Ding, Liheng Chen, Paul Pu Liang, Yu Qiao (* equal contribution)

International Conference on Learning Representations 2025 ICLR 2025 Spotlight

Existing efforts in building GUI agents heavily rely on the availability of robust commercial Vision-Language Models (VLMs) such as GPT-4o and GeminiProVision. Practitioners are often reluctant to use open-source VLMs due to their significant performance lag compared to their closed-source counterparts, particularly in GUI grounding and Out-Of-Distribution (OOD) scenarios. To facilitate future research in this area, we developed OS-Atlas —a foundational GUI action model that excels at GUI grounding and OOD agentic tasks through innovations in both data and modeling. We have invested significant engineering effort in developing an open-source toolkit for synthesizing GUI grounding data across multiple platforms, including Windows, Linux, MacOS, Android, and the web. Leveraging this toolkit, we are releasing the largest open-source cross-platform GUI grounding corpus to date, which contains over 13 million GUI elements. This dataset, combined with innovations in model training, provides a solid foundation for OS-Atlas to understand GUI screenshots and generalize to unseen interfaces. Through extensive evaluation across six benchmarks spanning three different platforms (mobile, desktop, and web), OS-Atlas demonstrates significant performance improvements over previous state-of-the-art models. Our evaluation also uncovers valuable insights into continuously improving and scaling the agentic capabilities of open-source VLMs.

[Website] [Paper] [Code] GitHub Repo stars

OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Zhiyong Wu*, Zhenyu Wu*, Fangzhi Xu*, Yian Wang*, Qiushi Sun, Chengyou Jia, Kanzhi Cheng, Zichen Ding, Liheng Chen, Paul Pu Liang, Yu Qiao (* equal contribution)

International Conference on Learning Representations 2025 ICLR 2025 Spotlight

[Website] [Paper] [Code] GitHub Repo stars

MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards

Sheng Wang*, Liheng Chen*, Pengan Chen, Jingwei Dong, Boyang Xue, Jiyue Jiang, Lingpeng Kong, Chuan Wu (* equal contribution)

International Conference on Learning Representations 2025 ICLR 2025

The rapid scaling of large language models necessitates more lightweight finetuning methods to reduce the explosive GPU memory overhead when numerous customized models are served simultaneously. Targeting more parameter-efficient low-rank adaptation (LoRA), parameter sharing presents a promising solution. Empirically, our research into high-level sharing principles highlights the indispensable role of differentiation in reversing the detrimental effects of pure sharing. Guided by this finding, we propose Mixture of Shards (MoS), incorporating both inter-layer and intra-layer sharing schemes, and integrating four nearly cost-free differentiation strategies, namely subset selection, pair dissociation, vector sharding, and shard privatization. Briefly, it selects a designated number of shards from global pools with a Mixture-of-Experts (MoE)-like routing mechanism before sequentially concatenating them to low-rank matrices. Hence, it retains all the advantages of LoRA while offering enhanced parameter efficiency, and effectively circumvents the drawbacks of peer parameter-sharing methods. Our empirical experiments demonstrate approximately 8x parameter savings in a standard LoRA setting. The ablation study confirms the significance of each component. Our insights into parameter sharing and MoS method may illuminate future developments of more parameter-efficient finetuning methods.

[Paper] [Code]

MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards

Sheng Wang*, Liheng Chen*, Pengan Chen, Jingwei Dong, Boyang Xue, Jiyue Jiang, Lingpeng Kong, Chuan Wu (* equal contribution)

International Conference on Learning Representations 2025 ICLR 2025

[Paper] [Code]

Diffusion of Thought: Chain-of-Thought Reasoning in Diffusion Language Models

Jiacheng Ye*, Shansan Gong*, Liheng Chen*, Lin Zheng, Jiahui Gao, Han Shi, Chuan Wu, Xin Jiang, Zhenguo Li, Wei Bi, Lingpeng Kong (* equal contribution)

Annual Conference on Neural Information Processing System 2024 NeurIPS 2024

Recently, diffusion models have garnered significant interest in the field of text processing due to their many potential advantages compared to conventional autoregressive models. In this work, we propose Diffusion-of-Thought (DoT), a novel approach that integrates diffusion models with Chain-of-Thought, a well-established technique for improving the reasoning ability of autoregressive language models. In contrast to autoregressive language models that make decisions in a left-to-right, token-by-token manner, DoT allows reasoning steps to diffuse over time through a diffusion language model and offers greater flexibility in trading-off computation for reasoning performance. Our experimental results demonstrate the effectiveness of DoT in multi-digit multiplication, boolean logic, and grade school math problems, with a small diffusion model outperforming a much larger autoregressive model in both efficiency and accuracy. In addition to that, DoT showcases promising self-correction abilities and benefits from existing reasoning-enhancing techniques like self-consistency decoding. Our findings contribute to the understanding and development of reasoning with diffusion language models.

[Paper] [Code] GitHub Repo stars

Diffusion of Thought: Chain-of-Thought Reasoning in Diffusion Language Models

Jiacheng Ye*, Shansan Gong*, Liheng Chen*, Lin Zheng, Jiahui Gao, Han Shi, Chuan Wu, Xin Jiang, Zhenguo Li, Wei Bi, Lingpeng Kong (* equal contribution)

Annual Conference on Neural Information Processing System 2024 NeurIPS 2024

[Paper] [Code] GitHub Repo stars

How Well Do LLMs Handle Cantonese? Benchmarking Cantonese Capabilities of Large Language Models

Jiyue Jiang, Pengan Chen, Liheng Chen, Sheng Wang, Qinghang Bao, Lingpeng Kong, Yu Li, Chuan Wu

Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics 2025 NAACL 2025

The rapid evolution of large language models (LLMs) has transformed the competitive landscape in natural language processing (NLP), particularly for English and other data-rich languages. However, underrepresented languages like Cantonese, spoken by over 85 million people, face significant development gaps, which is particularly concerning given the economic significance of the Guangdong-Hong Kong-Macau Greater Bay Area, and in substantial Cantonese-speaking populations in places like Singapore and North America. Despite its wide use, Cantonese has scant representation in NLP research, especially compared to other languages from similarly developed regions. To bridge these gaps, we outline current Cantonese NLP methods and introduce new benchmarks designed to evaluate LLM performance in factual generation, mathematical logic, complex reasoning, and general knowledge in Cantonese, which aim to advance open-source Cantonese LLM technology. We also propose future research directions and recommended models to enhance Cantonese LLM development.

[Paper] [Code]

How Well Do LLMs Handle Cantonese? Benchmarking Cantonese Capabilities of Large Language Models

Jiyue Jiang, Pengan Chen, Liheng Chen, Sheng Wang, Qinghang Bao, Lingpeng Kong, Yu Li, Chuan Wu

Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics 2025 NAACL 2025

[Paper] [Code]

PRoLoRA: Partial Rotation Empowers More Parameter-Efficient LoRA

Sheng Wang, Boyang Xue, Jiacheng Ye, Jiyue Jiang, Liheng Chen, Lingpeng Kong, Chuan Wu

Annual Meeting of the Association for Computational Linguistics 2024 ACL 2024

With the rapid scaling of large language models (LLMs), serving numerous LoRAs concurrently has become increasingly impractical, leading to unaffordable costs and necessitating more parameter-efficient finetuning methods. In this work, we introduce Partially Rotation enhanced Low-Rank Adaptation (PRoLoRA), an intra-layer sharing mechanism comprising four essential components, broadcast reduction, rotation enhancement, partially-sharing refinement, and rectified initialization strategy. As a superset of LoRA, PRoLoRA pertains its advantages, and effectively circumvent the drawbacks of peer parameter-sharing methods with superior model capacity, practical feasibility, and broad applicability. Empirical experiments demonstrate the remarkably higher parameter efficiency of PRoLoRA in both specific parameter budget and performance target scenarios, and its scalability to larger LLMs. Notably, with one time less trainable parameters, PRoLoRA still outperforms LoRA on multiple instruction tuning datasets. Subsequently, an ablation study is conducted to validate the necessity of individual components and highlight the superiority of PRoLoRA over three potential variants. Hopefully, the conspicuously higher parameter efficiency can establish PRoLoRA as a resource-friendly alternative to LoRA.

[Paper] [Code]

PRoLoRA: Partial Rotation Empowers More Parameter-Efficient LoRA

Sheng Wang, Boyang Xue, Jiacheng Ye, Jiyue Jiang, Liheng Chen, Lingpeng Kong, Chuan Wu

Annual Meeting of the Association for Computational Linguistics 2024 ACL 2024

[Paper] [Code]

LoRA Meets Dropout under a Unified Framework

Sheng Wang*, Liheng Chen*, Jiyue Jiang, Boyang Xue, Lingpeng Kong, Chuan Wu (* equal contribution)

Annual Meeting of the Association for Computational Linguistics 2024 ACL 2024

With the remarkable capabilities, large language models (LLMs) have emerged as essential elements in numerous NLP applications, while parameter-efficient finetuning, especially LoRA, has gained popularity as a lightweight approach for model customization. Meanwhile, various dropout methods, initially designed for full finetuning with all the parameters updated, alleviates overfitting associated with excessive parameter redundancy. Hence, a possible contradiction arises from negligible trainable parameters of LoRA and the effectiveness of previous dropout methods, which has been largely overlooked. To fill this gap, we first confirm that parameter-efficient LoRA is also overfitting-prone. We then revisit transformerspecific dropout methods, and establish their equivalence and distinctions mathematically and empirically. Building upon this comparative analysis, we introduce a unified framework for a comprehensive investigation, which instantiates these methods based on dropping position, structural pattern and compensation measure. Through this framework, we reveal the new preferences and performance comparisons of them when involved with limited trainable parameters. This framework also allows us to amalgamate the most favorable aspects into a novel dropout method named HiddenKey. Extensive experiments verify the remarkable superiority and sufficiency of HiddenKey across multiple models and tasks, which highlights it as the preferred approach for high-performance and parameter-efficient finetuning of LLMs.

[Paper]

LoRA Meets Dropout under a Unified Framework

Sheng Wang*, Liheng Chen*, Jiyue Jiang, Boyang Xue, Lingpeng Kong, Chuan Wu (* equal contribution)

Annual Meeting of the Association for Computational Linguistics 2024 ACL 2024

[Paper]

Data Augmentation of Multi-turn Psychotherapy Dialogue via Knowledge-driven Progressive Thought Prompting

Jiyue Jiang, Liheng Chen, Sheng Wang, Lingpeng Kong, Yu Li, Chuan Wu

Preprint 2024 Arxiv

Existing dialogue data augmentation (DA) techniques predominantly focus on augmenting utterance-level dialogues, which makes it difficult to take dialogue contextual information into account. The advent of large language models (LLMs) has simplified the implementation of multi-turn dialogues. Due to absence of professional understanding and knowledge, it remains challenging to deliver satisfactory performance in low-resource domain, such as the psychotherapy dialogue. DA involves creating new training or prompting data based on the existing data, which help the model better understand and generate psychotherapy-related responses. In this paper, we aim to address the issue of multi-turn dialogue data augmentation for boosted performance in the psychotherapy domain. We propose a knowledge-driven progressive thought prompting method to guide LLM to generate multi-turn psychotherapy-related dialogue. This method integrates a progressive thought generator, a psychotherapy knowledge generator, and a multi-turn dialogue generator. The thought generated by the progressive thought generator serves as a prompt to prevent the generated dialogue from having significant semantic deviations, while the psychotherapy knowledge generator produces psychotherapy knowledge to serve as the dialogue history for the LLM, guiding the dialogue generator to create multi-turn psychotherapy-related dialogue. To ensure the precision of psychotherapy-related multi-turn dialogue generation by LLM, a meticulous professional evaluation is required. Extensive experiments conducted on three psychotherapy-related datasets verify the effectiveness of the proposed method.

[Paper]

Data Augmentation of Multi-turn Psychotherapy Dialogue via Knowledge-driven Progressive Thought Prompting

Jiyue Jiang, Liheng Chen, Sheng Wang, Lingpeng Kong, Yu Li, Chuan Wu

Preprint 2024 Arxiv

[Paper]

Action required

🎓 Education

🔬 Experience

🎊 Honors & Awards

News

Selected Publications (view all )

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards

MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards

Diffusion of Thought: Chain-of-Thought Reasoning in Diffusion Language Models

Diffusion of Thought: Chain-of-Thought Reasoning in Diffusion Language Models

How Well Do LLMs Handle Cantonese? Benchmarking Cantonese Capabilities of Large Language Models

How Well Do LLMs Handle Cantonese? Benchmarking Cantonese Capabilities of Large Language Models

PRoLoRA: Partial Rotation Empowers More Parameter-Efficient LoRA

PRoLoRA: Partial Rotation Empowers More Parameter-Efficient LoRA

LoRA Meets Dropout under a Unified Framework

LoRA Meets Dropout under a Unified Framework

Data Augmentation of Multi-turn Psychotherapy Dialogue via Knowledge-driven Progressive Thought Prompting

Data Augmentation of Multi-turn Psychotherapy Dialogue via Knowledge-driven Progressive Thought Prompting

All publications

Pageviews