Unleashing the power of NLP, I am a Ph.D. student at Penn State, delving into interdisciplinary research in Data Science and AI. As part of the College of IST, I conduct research at the PIKE Research Lab under the guidance of Dr. Dongwon Lee. I focus on applying NLP and machine learning techniques to combat dis/misinformation across languages, contexts, and distribution gaps. At Penn State, I serve as an NSF LinDiv Fellow, conducting transdisciplinary research to advance human-AI languague interaction globally. My mission is to mitigate evolving security threats, risks and enhance access to human language technology for all with cutting-edge solutions. Beyond research aspirations, I stay active through dance, fitness, martial arts, and community service.
Download my Resumé .
Ph.D. in Informatics, 2025 (Expected)
The Pennslyvania State University
MPH con. in Epidemiology, 2020
St. George's University
M.Sc in Computer Information Systems con. Health Informatics, 2014
Boston University
B.Sc in Information Technology, 2010
St. George's University
This research from Penn State and KiNiT, benchmarks the effectiveness of 10 authorship obfuscation (AO) techniques against 37 machine-generated text (MGT) detection methods across 11 languages, totaling 4,070 evaluations. It reveals that all AO methods can evade detection in every language, particularly highlighting the efficacy of homoglyph attacks. This underscores the need for improved multilingual MGT detection strategies.
This research project is a collaboration with Penn State and MIT Lincoln Lab. Our study demonstrates the dual capacity of LLMs for offensive misuse and defense detection against disinformation without requiring additional training.
This research from Penn State and KiNiT introduces MULTITuDE, a novel multilingual dataset for detecting machine-generated text. Comprised of over 74,000 authentic and artificially-generated texts in 11 languages from 8 models, MULTITuDE benchmarks text generation capabilities in non-English languages and multilingual detection performance. The dataset addresses current gaps in analyzing and systematically evaluating machine text generation and detection across multiple languages.
PDF Cite Code Dataset Project Slides Video Source Document DOI
Book an appointment or contact me using the information below