Greetings👋 I'm a Ph.D. Candidate in Information Science at Indiana University Bloomington. I'm now actively seeking a tenure-track assistant professorship for a start date in Fall 2025.
My research focuses on understanding cultural and sociotechnical dynamics as reflected in language and using this knowledge to proactively develop AI applications aimed at fostering a more informed and equitable society. To achieve my research goals, I integrate and extend methods from language techniques, machine learning, and statistics. My research is highly interdisciplinary, grounded in natural language processing, computational humanities, and biomedical informatics.
My representative work includes:
- 👶 making scientific narratives accessible at a high-school reading level for individuals with lower literacy,
- 🎭 developing strong and usable countermeasures against content fingerprinting for whistleblowers,
- ⚖️ ensuring LLMs operate fairly across users' demographic and socio-economic backgrounds,
- 🍺 facilitating early identification of alcohol use disorder among underrepresented groups,
- 👬 attributing early works in disputes between Lu Xun and Zhou Zuoren through statistical evidence, and
- 🧬 developing a principled measure of scientific novelty that aligns with human judgment.
News
-
(Oct 24, 2024) Our new preprints are out!
- In Science Out of Its Ivory Tower: Improving Accessibility with Reinforcement Learning, we simplify scholarly abstracts from a postgraduate 🧑🎓 to a high school 🧑🏫 reading level without sacrificing faithfulness or quality. All of this is done by RL-tuning a Gemma-2B.
- In Are Large Language Models Ready for Travel Planning?, we observed that LLM generations are generally non-discriminative across gender and racial groups; however, hallucinations are more frequently associated with African American and gender minority groups🤔.
-
(Oct 15, 2024) I presented Language Modeling: From Teaching to Incentivizing at IU Computational Linguistics Discussion Group.
-
(Sep 28, 2024) I presented Science Out of Its Ivory Tower: Improving Accessibility with Reinforcement Learning at the ILS Doctoral Research Forum 2024 and won 🥇.
-
(Aug 7, 2024) Our new preprint Simplifying Scholarly Abstracts for Accessible Digital Libraries is now online. We introduced a novel corpus designed to simplify scholarly abstracts and demonstrated that mainstream LLMs perform just fine with straightforward supervised fine-tuning. Although the improvement doesn't yet make the content fully understandable for a middle-school audience, these models provide a strong baseline for further enhancement.
-
(May 1, 2024) I joined Dr. Jing Su's team as an intern to work on the GAIPA project (Graph Artificial Intelligence for Precision Identification of Alcohol Use Disorder) at the Department of Biostatistics and Health Data Science, Indiana University School of Medicine. Let's push the precision identification of alcohol use disorder to the next level💪.
-
(Apr 30, 2024) I passed my proposal defense for my PhD dissertation, titled Defending Against Authorship Attribution Attacks With Large Language Models. I am now actively seeking a tenure-track assistant professorship. Let me know if your department needs an NLP guy who can research and teach🤠.
-
(Apr 15, 2024) I presented the manuscript titled A Content-Based Novelty Measure for Scholarly Publications: A Proof of Concept at iConference 2024.
-
(Mar 13, 2024) I delivered a presentation titled AI4Library at Nankai University, covering the fundamentals of LLMs, their capabilities, surrounding hype, and their applications within librarianship.
-
(Jan 31, 2024) I joined the Center for Antique Book Conservation and Restoration Research at Wuhan University (武汉大学古籍保护暨文献修复研究中心) as a Research Affiliate. I will be working with Dr. Xincai Wang to improve the discoverability of historical collections using Retrieval-Augmented Generation, empowered by state-of-the-art LLMs.
-
(Jan 29, 2024) I joined Digital Humanities Quarterly as Data Analytics Editor.
-
(Jan 18, 2024) My invited column Defending Against Authorship Identification Attacks has been published by the Montreal AI Ethics Institute. Check out the article here.
-
(Jan 8, 2024) The manuscript for NovEval, A Content-Based Novelty Measure for Scholarly Publications: A Proof of Concept, is up. I will be presenting it at iConference 2024.
-
(Oct 7, 2023) NovEval (pronounced as "Nawv-Ee-val") demo is now online! This GPT-2 based model is designed to evaluate scientific novelty automatically, and its assessments have been proven to align with human evaluation. It's still in beta, and I would love to hear your feedback😉!
-
(Oct 5, 2023) I successfully completed my qualifying defense🎉. The committee consisted of Dr. Allen Riddell (Chair), Dr. Xiaozhong Liu, and Dr. Staša Milojević (Minor Advisor).
-
(Oct 4, 2023) Our new papers are available on arXiv. Check them out!
-
(May 19, 2023) I delivered a lightning talk introducing our jargon-busting AI at LEADING Forum 2022. You can access our poster titled Science Out of the Ivory Tower: Scientific Abstract Simplification for Everyone here.
-
(May 10, 2023) I uploaded a LaTeX poster template to Overleaf. The template, based on the Gemini, is minimal and modern, and features Indiana University's official color palette. You can find the template at this link, or simply search for "iu poster" in the Overleaf gallery.
-
(Apr 14, 2023) I gave a lightning talk and presented a poster titled The Many Voices of the Detached: Revisiting the Disputed Writings of Lu Xun and Zhou Zuoren at the IDAH HASTAC Symposium 2023.
-
(Apr 13, 2023) I gave a guest lecture on Authorship Attribution: An Introduction at Allen's Digital Humanities.
-
(Jan 9, 2023) I will be working with Allen in IARPA HIATUS (Human Interpretable Attribution of Text Using Underlying Structure) Task Three.