Tortured Phrases and Plagerism

Ben Neely · May 20, 2022

plagerism AI GPT-3 research integrity paper mills tortured phrases

Last summer a paper came out from Cabanac et al. detailing how AI-powered text generation can yield phrases and word patterns that look a lot like human writing, except for certain “tortured phrases” that are obvious and often humerous mistranslations. An example they use is “counterfeit consciousness” instead of “artificial intelligence”. In a recent lecture by Elisabeth Bik (@MicrobiomDigest) at the Lorentz Center Workshop on Proteomics and Machine Learning, this concept was brought up, but even more simply that using word translations can generate undetectable plagerisms.

Since learning about this I had wanted to try it out. Starting with the three sentences below, I tried to make tortured phrases through language translation.

The use of artificial intelligence is helping researchers to better understand the complexities of the innate immune system. Proteomics is providing insights into the proteins that make up the innate immune system and how they interact with each other. This knowledge is helping to develop new therapies for diseases that involve the innate immune system.

First I used Google Translate, and I tried many combinations (english-japanese-english, english-polish-welsh-english, etc.) but each time when it got back to english, it was word for word correct.

Next I went to SYSTRAN translate, and finally going english-norwegian-spanish-english yielded “congenetial immune system” with everything else the same.

Playing around on SYSTRAN I could see how creating translating schemes like english-bengali-english-catalan-english yielded markedly different words, but at some point the resulting text moved past tortured phrases into gibberish.

Though my attempts in language translations to get tortured phrases weren’t completely successful, I can see how this approach can easily be used to bulk translate published papers into new papers, performed at scale in so called “Paper Mills”. These papers in turn can be submitted to journals, making it an easy way to increase productivity and impact in this flawed system we have built. Of course, an even easier way than language translating already written work is to use text prompts and AI-powered text generation as Cabanac et al. discussed. For instance, the test sentences above were created in the OpenAI GPT-3 playground with the prompt “Artificial intelligence is helping proteomics solve innate immunity.”, and they sounded pretty human to me.