Haining Wang

Greetings 👋 I'm an incoming postdoctoral researcher at the Indiana Univeristy School of Medicine. My research focuses on understanding the cultural and sociotechnical dynamics reflected in language and real-world data, and using this knowledge to develop AI applications that enhance societal well-being. My work is grounded in natural language processing, biomedical informatics, and computational humanities/social sciences.

My representative work includes:

💊 Phenotyping with effective, robust, and secure use of medication records through simple chat with LLMs,
🧑‍🔬 Making scientific narratives accessible at a high-school reading level for individuals with lower literacy,
🎭 Developing strong and usable countermeasures against content fingerprinting attacks for whistleblowers,
👬 Attributing early works in disputes between Lu Xun and Zhou Zuoren through statistical evidence, and
🧬 Developing a principled measure of scientific novelty that aligns with human judgment.

What's New

(Jul 8, 2025) Our new preprint Fairness Evaluation of Large Language Models in Academic Library Reference Services is out! TL;DR: Current LLMs show a promising degree of readiness to support equitable and contextually appropriate communication in academic library reference services.
(May 20, 2025) I presented Thinking, Fast and Slow: Knowledge Extraction to Facilitate Phenotyping Using Drug Records in Real-World Data at the Midwest Biopharmaceutical Statistics Workshop 2025. Check out our DualReasoning, which enables effective, robust, and secure use of medication records through simple chat with LLMs.
(May 9, 2025) I successfully defended my PhD dissertation, Defending Against Authorship Attribution Attacks with Large Language Models. I am deeply grateful to my committee members: Dr. Allen Riddell (Chair), Dr. John Walsh, Dr. Kahyun Choi, Dr. Staša Milojević, and Dr. Xiaozhong Liu. I also extend my sincere thanks to Dr. Sandra Kübler for her guidance. Check out the slides and manuscript.
(Mar 20, 2025) I presented Improving Scholarship Accessibility With Reinforcement Learning at iConference, hosted at Indiana University Bloomington. Our jargon-busting AI paper has become a Best Paper finalist—give it a read!
(Feb 12, 2025) I presented a poster titled Thinking, Fast and Slow: DualReasoning Enhances Clinical Knowledge Extraction from Large Language Models at the Regenstrief Healthcare AI Conference.
- Our DualReasoning approach shows great promise in incorporating long-text medication information for diabetes phenotyping, outperforming alternative methods.
- It stands out as the only privacy-preserving method🛡, because it requires only a set of drug names💊 rather than patient-specific drug use🤒.
(Dec 27, 2024) Our DSH paper on solving authorship disputes of Lu Xun and Zhou Zuoren's early works is out—give it a read!
(Dec 18, 2024) I presented Simplifying Scholarly Abstracts for Accessible Digital Libraries Using Language Models at JCDL 2024.
(Oct 24, 2024) Our new preprints are out!
- In Science Out of Its Ivory Tower: Improving Accessibility with Reinforcement Learning, we simplify scholarly abstracts from a postgraduate 🧑‍🎓 to a high school 🧑‍🏫 reading level without sacrificing faithfulness or quality. All of this is done by RL-tuning a Gemma-2B.
- In Are Large Language Models Ready for Travel Planning?, we observed that LLM generations are generally non-discriminative across gender and racial groups; however, hallucinations are more frequently associated with African American and gender minority groups🤔.
(Oct 15, 2024) I presented Language Modeling: From Teaching to Incentivizing at IU Computational Linguistics Discussion Group.
(Sep 28, 2024) I presented Science Out of Its Ivory Tower: Improving Accessibility with Reinforcement Learning at the ILS Doctoral Research Forum 2024 and won 🥇.
(Aug 7, 2024) Our new preprint Simplifying Scholarly Abstracts for Accessible Digital Libraries is now online. We introduced a novel corpus designed to simplify scholarly abstracts and demonstrated that mainstream LLMs perform just fine with straightforward supervised fine-tuning. Although the improvement doesn't yet make the content fully understandable for a middle-school audience, these models provide a strong baseline for further enhancement.
(May 1, 2024) I joined Dr. Jing Su's team as an intern to work on the GAIPA project (Graph Artificial Intelligence for Precision Identification of Alcohol Use Disorder) at the Department of Biostatistics and Health Data Science, Indiana University School of Medicine. Let's push the precision identification of alcohol use disorder to the next level💪.
(Apr 30, 2024) I passed my proposal defense for my PhD dissertation, titled Defending Against Authorship Attribution Attacks With Large Language Models. I am now actively seeking a tenure-track assistant professorship. Let me know if your department needs an NLP guy who can research and teach🤠.
(Apr 15, 2024) I presented the manuscript titled A Content-Based Novelty Measure for Scholarly Publications: A Proof of Concept at iConference 2024.
(Mar 13, 2024) I delivered a presentation titled AI4Library at Nankai University, covering the fundamentals of LLMs, their capabilities, surrounding hype, and their applications within librarianship.
(Jan 31, 2024) I joined the Center for Antique Book Conservation and Restoration Research at Wuhan University (武汉大学古籍保护暨文献修复研究中心) as a Research Affiliate. I will be working with Dr. Xincai Wang to improve the discoverability of historical collections using Retrieval-Augmented Generation, empowered by state-of-the-art LLMs.
(Jan 29, 2024) I joined Digital Humanities Quarterly as Data Analytics Editor.
(Jan 18, 2024) My invited column Defending Against Authorship Identification Attacks has been published by the Montreal AI Ethics Institute. Check out the article here.
(Jan 8, 2024) The manuscript for NovEval, A Content-Based Novelty Measure for Scholarly Publications: A Proof of Concept, is up. I will be presenting it at iConference 2024.
(Oct 7, 2023) NovEval (pronounced as "Nawv-Ee-val") demo is now online! This GPT-2 based model is designed to evaluate scientific novelty automatically, and its assessments have been proven to align with human evaluation. It's still in beta, and I would love to hear your feedback😉!
(Oct 5, 2023) I successfully completed my qualifying defense🎉. The committee consisted of Dr. Allen Riddell (Chair), Dr. Xiaozhong Liu, and Dr. Staša Milojević (Minor Advisor).
(Oct 4, 2023) Our new papers are available on arXiv. Check them out!
- Defending Against Authorship Identification Attacks,
- Enhancing Representation Generalization in Authorship Identification, and
- The Many Voices of Duying: Revisiting the Disputed Essays Between Lu Xun and Zhou Zuoren
(May 19, 2023) I delivered a lightning talk introducing our jargon-busting AI at LEADING Forum 2022. You can access our poster titled Science Out of the Ivory Tower: Scientific Abstract Simplification for Everyone here.
(May 10, 2023) I uploaded a LaTeX poster template to Overleaf. The template, based on the Gemini, is minimal and modern, and features Indiana University's official color palette. You can find the template at this link, or simply search for "iu poster" in the Overleaf gallery.
(Apr 14, 2023) I gave a lightning talk and presented a poster titled The Many Voices of the Detached: Revisiting the Disputed Writings of Lu Xun and Zhou Zuoren at the IDAH HASTAC Symposium 2023.
(Apr 13, 2023) I gave a guest lecture on Authorship Attribution: An Introduction at Allen's Digital Humanities.
(Jan 9, 2023) I will be working with Allen in IARPA HIATUS (Human Interpretable Attribution of Text Using Underlying Structure) Task Three.