publications | Jacob Xiaochen Li

2025

NAACL

Oral
Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages

Max Zuo^*, Francisco Piedrahita Velez^*, Xiaochen Li, and 2 more authors

In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Apr 2025

Abs DOI arXiv Bib Code

Recent works have explored using language models for planning problems. One approach examines translating natural language descriptions of planning tasks into structured planning languages, such as the planning domain definition language (PDDL). Existing evaluation methods struggle to ensure semantic correctness and rely on simple or unrealistic datasets. To bridge this gap, we introduce Planetarium, a benchmark designed to evaluate language models’ ability to generate PDDL code from natural language descriptions of planning tasks. Planetarium features a novel PDDL equivalence algorithm that flexibly evaluates the correctness of generated PDDL against ground truth, along with a dataset of 145,918 text-to-PDDL pairs across 73 unique state combinations with varying levels of difficulty. Finally, we evaluate several API-access and open-weight language models that reveal this task’s complexity. For example, 96.1% of the PDDL problem descriptions generated by GPT-4o are syntactically parseable, 94.4% are solvable, but only 24.8% are semantically correct, highlighting the need for a more rigorous benchmark for this problem.
@inproceedings{zuo-etal-2025-planetarium, title = {Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages}, author = {Zuo, Max and Velez, Francisco Piedrahita and Li, Xiaochen and Littman, Michael and Bach, Stephen}, booktitle = {Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)}, month = apr, year = {2025}, address = {Albuquerque, New Mexico}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2025.naacl-long.560/}, doi = {10.18653/v1/2025.naacl-long.560}, pages = {11223--11240}, }

2024

EMNLP Findings
Preference Tuning For Toxicity Mitigation Generalizes Across Languages

Xiaochen Li, Zheng Xin Yong, and Stephen Bach

In Findings of the Association for Computational Linguistics: EMNLP 2024, Nov 2024

Abs DOI arXiv Bib Code

Detoxifying multilingual Large Language Models (LLMs) has become crucial due to their increasing global use. In this work, we explore zero-shot cross-lingual generalization of preference tuning in detoxifying LLMs. Unlike previous studies that show limited cross-lingual generalization for other safety tasks, we demonstrate that Direct Preference Optimization (DPO) training with only English data can significantly reduce toxicity in multilingual open-ended generations. For example, the probability of mGPT-1.3B generating toxic continuations drops from 46.8% to 3.9% across 17 different languages after training. Our results also extend to other multilingual LLMs, such as BLOOM, Llama3, and Aya-23. Using mechanistic interpretability tools like causal intervention and activation analysis, we identified the dual multilinguality property of MLP layers in LLMs, which explains the cross-lingual generalization of DPO. Finally, we show that bilingual sentence retrieval can predict the cross-lingual transferability of DPO preference tuning.
@inproceedings{li-etal-2024-preference, title = {Preference Tuning For Toxicity Mitigation Generalizes Across Languages}, author = {Li, Xiaochen and Yong, Zheng Xin and Bach, Stephen}, editor = {Al-Onaizan, Yaser and Bansal, Mohit and Chen, Yun-Nung}, booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2024}, month = nov, year = {2024}, address = {Miami, Florida, USA}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2024.findings-emnlp.784/}, doi = {10.18653/v1/2024.findings-emnlp.784}, pages = {13422--13440}, }
CoRL Workshop
Structured Exploration in Reinforcement Learning by Hypothesizing Linear Temporal Logic Formulas

Yichen Wei, Xiaochen Li, Jason Xinyu Liu, and 5 more authors

In 2nd CoRL Workshop on Learning Effective Abstractions for Planning, Nov 2024

Abs Bib PDF

Exploration in vast domains is a core challenge in reinforcement learning (RL). Existing methods commonly explore by adding noise to the learning process, but they do not scale to complex, long-horizon problems. Goal-based exploration is a promising alternative, but it requires useful goals. We propose an approach that structures an agent’s exploration by constraining the goal space to tasks that can be expressed using a particular formal language: linear temporal logic (LTL). Our agent proposes LTL expressions that it conjectures to be achievable and desirable for maximizing its learning progress in the environment. Upon proposing an LTL expression, the agent uses a combination of planning and goal-conditioned RL to solve the task described by that LTL. The result is a structured exploration process that learns about the environment by hypothesizing various logical and sequential compositions of atomic goals. We demonstrate the performance of our algorithm outperforms in two challenging sparse-reward problems.
@inproceedings{wei2024structured, title = {Structured Exploration in Reinforcement Learning by Hypothesizing Linear Temporal Logic Formulas}, author = {Wei, Yichen and Li, Xiaochen and Liu, Jason Xinyu and Shah, Naman and Quartey, Benedict and Konidaris, George and Tellex, Stefanie and Bagaria, Akhil}, booktitle = {2nd CoRL Workshop on Learning Effective Abstractions for Planning}, year = {2024}, url = {https://openreview.net/forum?id=e8NpNkNrgH}, }

2023

ICML
Abstract-to-executable trajectory translation for one-shot task generalization

Stone Tao, Xiaochen Li, Tongzhou Mu, and 3 more authors

In Proceedings of the 40th International Conference on Machine Learning, Honolulu, Hawaii, USA, Nov 2023

Abs arXiv Bib Code Slides

Training long-horizon robotic policies in complex physical environments is essential for many applications, such as robotic manipulation. However, learning a policy that can generalize to unseen tasks is challenging. In this work, we propose to achieve one-shot task generalization by decoupling plan generation and plan execution. Specifically, our method solves complex long-horizon tasks in three steps: build a paired abstract environment by simplifying geometry and physics, generate abstract trajectories, and solve the original task by an abstract-to-executable trajectory translator. In the abstract environment, complex dynamics such as physical manipulation are removed, making abstract trajectories easier to generate. However, this introduces a large domain gap between abstract trajectories and the actual executed trajectories as abstract trajectories lack low-level details and are not aligned frame-to-frame with the executed trajectory. In a manner reminiscent of language translation, our approach leverages a seq-to-seq model to overcome the large domain gap between the abstract and executable trajectories, enabling the low-level policy to follow the abstract trajectory. Experimental results on various unseen long-horizon tasks with different robot embodiments demonstrate the practicability of our methods to achieve one-shot task generalization. Videos and more details can be found on the project page: https://trajectorytranslation.github.io/.
@inproceedings{10.5555/3618408.3619819, author = {Tao, Stone and Li, Xiaochen and Mu, Tongzhou and Huang, Zhiao and Qin, Yuzhe and Su, Hao}, title = {Abstract-to-executable trajectory translation for one-shot task generalization}, year = {2023}, publisher = {JMLR.org}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, articleno = {1411}, numpages = {33}, location = {Honolulu, Hawaii, USA}, series = {ICML'23}, }
IEEE TPDS
Critique of: “A Parallel Framework for Constraint-Based Bayesian Network Learning via Markov Blanket Discovery” by SCC Team From UC San Diego

Arunav Gupta, John Ge, John Li, and 10 more authors

IEEE Transactions on Parallel & Distributed Systems, Jun 2023

Abs DOI Bib

Bayesian networks (BNs) have become popular in recent years to describe natural phenomena in situations where causal linkages are important to understand. In order to get around the inherent non-tractability of learning BNs, Srivastava et al. propose a markov blanket discovery-based approach to learning in their paper titled “A Parallel Framework for Constraint-based Bayesian Network Learning via Markov Blanket Discovery.” We are able to reproduce both the strong and weak scaling experiments from the paper up to 128 cores, and verify communication cost scaling for all three algorithms in the paper. We also introduce methodological improvements to weak scaling that show the paper’s findings are unique to the methodology and not the datasets used. Slight variations in performance were observed due to differences in datasets, core count, and job scheduling.
@article{10.1109/TPDS.2022.3217284, author = {Gupta, Arunav and Ge, John and Li, John and Kong, Zihao and He, Kaiwen and Mikhailov, Matthew and Chin, Bryan and Li, Xiaochen and Apodaca, Max and Rodriguez, Paul and Tatineni, Mahidar and Thomas, Mary and Bhatt, Santosh}, title = {Critique of: “A Parallel Framework for Constraint-Based Bayesian Network Learning via Markov Blanket Discovery” by SCC Team From UC San Diego}, year = {2023}, issue_date = {June 2023}, publisher = {IEEE Press}, volume = {34}, number = {6}, issn = {1045-9219}, url = {https://doi.org/10.1109/TPDS.2022.3217284}, doi = {10.1109/TPDS.2022.3217284}, journal = { IEEE Transactions on Parallel \& Distributed Systems}, month = jun, pages = {1727–1730}, numpages = {4}, }

2022

IEEE TPDS
Critique of “MemXCT: Memory-Centric X-Ray CT Reconstruction With Massive Parallelization” by SCC Team From University of California San Diego

Xiaochen Li, Maximilian Apodaca, Arunav Gupta, and 8 more authors

IEEE Transactions on Parallel & Distributed Systems , Sep 2022

Abs DOI Bib

In this article, we describe our efforts to reproduce results reported in the SC19 article by Hidayetoğlu et al., titled “MemXCT: Memory-Centric X-ray CT Reconstruction with Massive Parallelization”. MemXCT’s single-device performance, parallelized via OpenMP and MPI, was characterized using AMD Zen2 CPU cores and NVIDIA V100 GPU devices running on the Microsoft Azure cloud. We were able to reproduce most of the results, and exceed the performance of larger inputs, on an AMD EPYC HBv2 cluster. We were also able to reproduce the strong scaling trends for optimized CPU and GPU versions. Slight variations in performance of the CPU version were observed due to differences in the underlying hardware, input size, and number of available nodes. Digital artifacts from these experiments are available at: 10.5281/zenodo.5598108
@article{9618831, author = {Li, Xiaochen and Apodaca, Maximilian and Gupta, Arunav and Kong, Zihao and Pan, Hongyi and Zhou, Hongyu and Thomas, Mary and Kandes, Martin and Li, Zhaoyi and Tatineni, Mahidhar and Carroll, Lewis}, journal = { IEEE Transactions on Parallel \& Distributed Systems }, title = {{ Critique of “MemXCT: Memory-Centric X-Ray CT Reconstruction With Massive Parallelization” by SCC Team From University of California San Diego }}, year = {2022}, volume = {33}, number = {09}, issn = {1558-2183}, pages = {2043-2046}, keywords = {Graphics processing units;Bandwidth;Optimization;Performance evaluation;Random access memory;Hardware;Codes}, doi = {10.1109/TPDS.2021.3128840}, url = {https://doi.ieeecomputersociety.org/10.1109/TPDS.2021.3128840}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, month = sep, }