Fighting Fire with Fire: The Dual Role of LLMs in Crafting and Detecting Elusive Disinformation
Jason Lucas, Adaku Uchendu, Michiharu Yamashita, Jooyoung Lee, Shaurya Rohatgi, Dongwon Lee
December, 2023
Abstract
Recent ubiquity and disruptive impacts of large language models (LLMs) have raised concerns about their potential to be misused (.i.e, generating large-scale harmful and misleading content). To combat this emerging risk of LLMs, we propose a novel “Fighting Fire with Fire” (F3) strategy that harnesses modern LLMs’ generative and emergent reasoning capabilities to counter human-written and LLM-generated disinformation. First, we leverage GPT-3.5-turbo to synthesize authentic and deceptive LLM-generated content through paraphrase-based and perturbation-based prefix-style prompts, respectively. Second, we apply zero-shot in-context semantic reasoning techniques with cloze-style prompts to discern genuine from deceptive posts and news articles. In our extensive experiments, we observe GPT-3.5-turbo’s zero-shot superiority for both in-distribution and out-of-distribution datasets, where GPT-3.5-turbo consistently achieved accuracy at 68-72%, unlike the decline observed in previous customized and fine-tuned disinformation detectors. Our codebase and dataset are available at https://github.com/mickeymst/F3.
Publication
In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.
Create your slides in Markdown - click the Slides button to check out the example.
Add the publication’s full text or supplementary notes here. You can use rich formatting such as including code, math, and images.
Ph.D. Student in Informatics
My research interests include low-resource multilingual NLP, linguistics, adversarial machine learning and mis/disinformation generation/detection. My Ph.D. thesis is in the area of applying artificial intelligence for cybersecurity and social good, with a focus on low-resource multilingual natural language processing. More specifically, I develop NLP techniques to promote cybersecurity, combat mis/disinformation, and enable AI accessibility for non-English languages and underserved populations. This involves creating novel models and techniques for tasks like multilingual and crosslingual text classification, machine translation, text generation, and adversarial attacks in limited training data settings. My goal is to democratize state-of-the-art AI capabilities by extending them beyond high-resource languages like English into the long tail of lower-resourced languages worldwide. By innovating robust learning approaches from scarce linguistic data, this research aims to open promising directions where AI can have dual benefits strengthening security, integrity and social welfare across diverse global locales.