A troubling new way to evade plagiarism detection software. (And how to tell if it’s been used.)

Ann Rogerson

Recently, at the end of a tutorial, a student asked Ann Rogerson a question she’d never heard before: Was it okay to use paraphrasing tools to write up assignments? Rogerson, a senior lecturer in the faculty of business at the University of Wollongong in Australia, was stumped — she’d never heard of these tools before.

It turns out, the student had learned of the tool from another student. For an assignment, the student had taken wording from a journal article and run it through a free online tool that automatically paraphrases text, so it evades plagiarism detection software.

Immediately, Rogerson remembered wording from a previous student submission that had always bugged her — in an assignment about employee performance reviews, the student had written awkward phrases such as “constructive employee execution” and “worker execution audits.” A lightbulb went off for Rogerson.

She immediately went to her computer, looked up the tools on Google, and easily found one. She typed in “employee performance reviews,” and the tool spit out “representative execution surveys.”

I had my answer about what the student in the previous session had done.

It was a troubling realization, she said:

I was shocked by both the ease of use and the poor quality of output.  I was also disappointed that here was something else trying to undermine the student learning process while challenging academic integrity.

In a recent paper in the International Journal of Educational Integrity, Rogerson and her co-author Grace McCarthy, also at Wollongong, describe what they’ve learned about these online paraphrasing tools, and the dangers they pose. (For tips on how to recognize when someone has used the tools, see the sidebar below.)

Although the paper focuses on the use of the tools by students, we asked Rogerson if the tools have entered into academic publishing; “anything is possible,” she told us:

…however I have no experience or evidence whether professional academics are using the tools for their scholarly publishing.  I have not observed any in the journal articles, book chapters and conference papers I have reviewed to date.  As the tools are freely available, there is nothing to stop anyone using them.  Some of their use may be masked by post-editing of the output to remove or address the errors that they generate.  Post-edited paraphrase tool work would be more difficult to identify, particularly for those outside of a discipline/research area and unfamiliar with specific terminology, formulae or indicators.

As we’ve seen more retractions stemming from third-party manuscript companies which share material between different groups of authors, we asked if it’s possible these services are employing online paraphrasing tools:

It is certainly in the realms of possibility for some less reputable manuscript editing companies to be using the tools or the algorithms behind them particularly where they offer very quick turnaround times and multiple disclaimers…It comes down to motivation at the end of the day – if editing companies are more concerned about making money through manuscript editing they will look for more ‘cost efficient’ ways to produce their content.  They are less concerned about academic rigour and more concerned about making money quickly.  Genuine editing services offered by qualified editors takes time.

Miguel Roig at St. John’s University, who has written extensively about the problem of plagiarism, told us he is concerned about paraphrasing tools — which he suspects at least one of his students may have used in the past:

…I believe that, as with spell-checkers, [where] the lack of effort in looking up a correct spelling of words likely reduces the probability of retaining the correct spelling of words, the similar lack of mental effort required to come up with a good summary/paraphrase will similarly retard the acquisition of this most important of writing skills in inexperienced writers. After all, like many other skills, the ability to produce good paraphrases or summaries of others’ work takes practice with considerable mental effort. There are no short-cuts; you have to learn by doing.

Roig — who is also a member of the board of our parent non-profit organization — added that he wouldn’t be surprised if some third-party editing companies had used paraphrasing tools to some extent.

Debora Weber-Wulff, another plagiarism expert based at the University of Applied Sciences HTW Berlin in Germany, noted this newer crop of tools shows how difficult it is to quickly determine if a paper has been plagiarized. (Indeed, after Rogerson and McCarthy ran a section of their own work through two paraphrasing tools, the plagiarism detection software Turnitin recognized only a 50% and 30% match to the original). According to Weber-Wulff:

This is why dreams of finding the perfect plagiarism detection system are doomed to failure. The “other side”, if you will, will come up with better disguises. We have to find other ways of teaching and enforcing good academic practice.

Weber-Wulff added that she is confident paraphrasing tools have already worked their way into the academic literature:

Any time you see a paper published (probably in a predatory journal) where the sentences do not make sense, you can guess that something like this is happening…The tools are pitched to the individual, but of course can be used by [third-party] editing companies. I get all sorts of offers of such services all the time and they do use a variety of tools.

Rogerson is now trying to raise awareness of the problem for the next generation of professional authors:

I openly discuss their existence in class demonstrating how poor the tools actually are along with my encouraging student questions about originality, citations and acknowledgements when preparing for assignments.  Confronting the issue is important therefore I work with students on how to learn and develop paraphrasing skills without using an online tool.  Just because some online tools can easily and correctly convert temperature, distance and currency does not mean that Internet based text tools can be relied upon the same way.  The only way of bringing it to light is to talk and write about it.

SIDEBAR: How to identify text modified by a paraphrasing tool

Over time, Rogerson has developed some clues that tip her off when text has been tweaked by paraphrasing tools. For instance, her suspicions rise when she sees “inappropriate terminology related to the subject context:”

…for example the tools in the experiment changed ‘plagiarism’ to ‘copyright infringement’, the other used ‘counterfeiting’…The change in terminology and meaning is a clue as the tools work around synonyms rather than semantic meaning.

And if Rogerson sees a citation near the inappropriate terminology, she’ll go to the original text and do a short test-run:

…I will source the citation and check the terminology in the original.  Taking a few sentences or the abstract and running it through a tool can give insight whether a tool may have been involved in the alteration of terms and definitions.

Some other clues: A paper includes unusual strings of works — “word salads” — where the phrases don’t make sense; or a reliance on only older references, without anything related to more current thinking.

…it may be perfectly acceptable and expected to have some old references in some papers referring to seminal works, or when updating concepts or historical retrospectives.  However if there is a reliance on older publications without any reference to current thinking or research (particularly where the references are not highly cited/regarded) it may be worth doing a little more digging.

To illustrate, we put the above quote into one of the many paraphrasing tools we found online. Here’s how it would read:

it might be splendidly worthy and anticipated that would have some old references in a few papers alluding to original works, or when refreshing ideas or authentic reviews. Nonetheless if there is a dependence on more seasoned distributions with no reference to momentum thinking or research (especially where the references are not exceedingly refered to/respected) it might be worth doing somewhat more burrowing.

Like Retraction Watch? Consider making a tax-deductible contribution to support our growth. You can also follow us on Twitter, like us on Facebook, add us to your RSS reader, sign up on our homepage for an email every time there’s a new post, or subscribe to our daily digest. Click here to review our Comments Policy. For a sneak peek at what we’re working on, click here.

24 thoughts on “A troubling new way to evade plagiarism detection software. (And how to tell if it’s been used.)”

  1. It staggers me that a student would be so brainless and amoral that they would ask their tutor if they could use such a tool.

    1. Must disagree on this one. Student are there precisely to learn. Mostly the focus on plagiarism is on copying the words of others. There has to be special emphasis on the “ideas and concepts”. A lecture I give specifically highlights paraphrasing. The fact that the student asked the question means that the student did not have the information. Our failing, not the students.

    2. I would imagine that in that situation, the student in question knows that somebody elses uses is and disagrees with the use for obvious reasons, but doesn’t want to rat the other person out. So they pretend to be stupid, take the ridicule – and inform the teacher of the existence of said tools.

  2. I would think the “word salad” that is produced by the new paraphrasing software creates the rebuttal presumption that it has been used. Instructors should make known in their syllabi that they routinely use plagiarism detection software like Turnitin.com and prohibit student use of paraphrasing software. The instructor should also promise that s/he will require the student to explain “word salad” and will grade the student’s production accordingly.

    I have no idea what to do about the researchers or publishers who pollute the literature . . . except to disbelieve their crap and avoid citing it.

    1. And your very own word salad…..

      I would assume the “word salad” go off at a tangent is take place by the precedent-setting examination software creates the rebuttal presumption stray it has been used. Instructors be compelled express regrets music pretension in their syllabi zigzag they routinely conformable to plagiarism detection software like Turnitin.com and interdict aficionado use of paraphrasing software. The drill essential aside from self-reliance that s/he courage seek from the devotee to purify “word salad” and will grade the student’s production accordingly. I essay speck credence what to attain fro the researchers or publishers who pollute the literature . . . repudiate to brand their drivel and avoid citing it.

  3. Many studies document widespread cheating, including plagiarism, among American university students, even at the most prestigious schools. These problems will continue as long as they are allowed, meaning that harsh sanctions will be needed to stem the tide. In fairness to the students, the significance and potentially lifelong consequences of academic cheating should be taught repeatedly, so there can be no doubt of their understanding.

    1. The problem is there is no incentive for anyone to stop the cheating. The schools make money from cheaters, the faculty are too busy to do the paperwork (especially against many students) and the students get degrees.

    2. Unfortunately the consequences are becoming less daunting. Twenty and thirty years ago, I could count on administration backing me as the professor in dealing with plagiarism and cheating. Today, I can count on administration backing the student even when there is clear evidence of academic misconduct. When retention and graduation rates become a key factor for rating universities, maintaining a quality education and academic integrity takes a back seat. The goal is happy students who stay to graduate and then donate to their alma mater. This is especially a problem with online universities (and I speak from direct experience).

  4. Another option: Require students to both write a paper and give a presentation. Only feasible for small class sizes, of course.

  5. I expect that plagiarism tools will improve faster than detection methods. See http://www.scienceworldreport.com/articles/58849/20170426/google-neural-machine-technology-added.htm for how AI-based tools are improving translation techniques. As spell-checking tools reduced the value of investing in human-based spelling skills, and as computers reduced the value of human-based computing skills, I think a similar thing will happen as use of AI influences many fields.

    or said another way:
    I expect that written falsification apparatuses will enhance speedier than discovery techniques. See http://www.scienceworldreport.com/articles/58849/20170426/google-neural-machine-innovation added.htm for how AI-based devices are enhancing interpretation strategies. As spell-checking instruments decreased the benefit of putting resources into human-based spelling aptitudes, and as PCs diminished the estimation of human-based processing abilities, I think a comparable thing will occur as utilization of AI impacts many fields. [i guess we’re not there yet]

    1. So-called “AI” will not solve all the problems of the world. I prefer to use the traditional name “statistics” when speaking about machine learning and “probabilities” when dealing with neural networks. There are certain classes of problem that lend themselves to “good enough” statistical solutions. Plagiarism detection is not one of them. False positives can destroy a student-teacher relationship or a career; false negatives let people off who have cheated. We need to invest more in education and prevention, and not foist off the responsibility for determining plagiarism to a software tool.

  6. This just shows you, that if you want to write a decent medical school entrance essay, there is no substitute for paying decent enough money a bottom up writing service you find advertising on Craigslist for one.

  7. Google Translate will produce the same kind of gibberish. Use it back and forth between languages and you have your obfuscator. And the quality of that device may sadly actually improve with time.
    Which reminds me of the story of the high school student who handed in an essay for the German class. His mother swore that she saw him write it. Problem was that it was written in Dutch, not in Deutsch (which is German for ‘German’).

    1. Students have done that at my university. It can actually be harder to catch them this way. When I get an international student who I know to have horrible English hand in a paper with good English and an advanced writing style, I automatically Google a few lines to try to see where they copied it from. However, this would likely come back with English as bad as I would expect.
      In addition, even if one suspected a student did this, proving it would be very hard.

  8. I don’t see this as making much of a difference. Articles consisting of extensive passages of word salad are unlikely to pass the muster in any real peer review. This means they will end up in fourth rate predatory journals, where plagiarised content already sails through.

    What it could do is make things even harder for non-English speaking scientists who use awkward sentence constructs or improper synonyms for words because they don’t understand the difference in context. Their papers could now be rejected not only because of the poor language, but because of accusations of “paraphrasing tool” usage.

  9. This horrifies me as a former academic and current academic publisher.
    Nevertheless, something about this “paraphrase tool” and detection of its use bothers me a bit. I have increasingly struggled with the notions of plagiarism and paraphrasing as I push papers through the usual detection tools and get results that have paragraphs like rainbows – each sentence a different color, but no sentence without tint.
    If you “paraphrase” something – whether manually or by machine – by just using some/any synonym – are you actually adding anything at all to the content, or to the scholarly discussion of the original work? Why is so important that you generate novel text rather than quote the well-explained, well-written original?
    I can’t say I’ve got a good answer to that. I quail at the thought of a long paper that is just strung-together quotes of other people’s work (although quotation and citation would be an improvement over the current mix-n-match sentences without citation). However, if I say “this suggests a cellular mechanism to detect increasing concentration gradients of X” rather than quoting “…this implies a cellular system that senses a gradient of increasing concentration of X” – am I … accomplishing anything as a scholar or am I just a non-automated paraphrase engine?
    In the hard sciences, a lot of what I encounter in “plagiarism detection” is the attempt to create a novel phrase for a very common statement of knowledge/method. The novelty needed is not in phrasing of the language but in phrasing a biological question empirically so it can add to the knowledge of the system.
    In Humanities, there is more use of large block-quotes – because the scholarly contribution consists of a new interpretation of a text, a context, or a body of work.

    You can’t copyright an idea, only the expression of an idea…but that does leave us with this strange gymnastic attempt to bend-around language in order to say “this idea was great and I want to add to it.”

    1. ” Why is so important that you generate novel text rather than quote the well-explained, well-written original?”

      Absolutely, paraphrasing does not generate novel text!

  10. colorless green idea sleep furiously – to quote Chomsky
    frightening. thanks. sadly applying these rules is going to be time consuming
    how about a student exercise, where each students is given 3 of their classmates submissions to check?

  11. Another sneaky method is to provide essays/papers in pdf format with custom character sets displaying a readable plagiarized text while the actual characters are gibberish like “*x2tg2154g”. Automatic detection will find nothing unless you manually copy individual sentences.

  12. I have read several essays this year that I would bet are the product of such tools – they were well structured overall, as if a “real” essay had been the raw material; the sentences were grammatically correct and used a sophisticated, but very awkward, vocabulary; and they were almost impossible to read. Turnitin was no help.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.