Nobody wants plagiarism on their blog or online publication, but it can be tough to prevent and harder to catch. It's tempting for some to plagiarize with the ease and wealth of information out there, and a misunderstand of what plagiarism is can mean false positives (and false negatives) upon your editorial review. To make this process as painless as possible, here are four ways to catch and to prevent plagiarism.
Use Google Instead
Instead of relying on those services that not only cost money to use extensively, but aren't as reliable as we need, use Google and search engines instead. Not only is it free every single time (and it probably searches more sites than those services), but it's also a lot easier to check for attribution, to check if the text even needs attribution, or to see if the text should remain as is (such as a direct quote or a definition). It's also easier to check for those other forms of plagiarism, such as taking another's idea and passing it as one's own.
Encourage Writers to Create Original Stories
If all your publication is doing is rehashing the news and stories of others, then you risk more plagiarism then you may think. As we said our introduction to plagiarism article, just because the text doesn't match anywhere else online doesn't mean that it's not plagiarism. If you're writing about a hot topic, and simply reciting the analysis of others, that is plagiarism unless the ideas are properly cited. To avoid this problem (and to avoid looking like you need to piggyback on everyone else's news stories to build an audience), encourage your writers to find their own news stories, or to come up with their own angles and analysis to current news. It may feel like you need to content out there as soon as possible, but doing that doesn't mean anything if it's just the reinvention of someone else's content and ideas in the first place.
Trust Your Writers
If you make it known that your going to screen every article that comes through, only to send it back because one sentence happens to match another somewhere online (or it includes a phone number or a book title, both of which these services will catch and mark as plagiarism), then you risk scaring away good writers who do good work but are afraid of being accused of plagiarism. You will then be stuck with the writers who will game the plagiarism-catching services to make sure the content passes, or you will get writers who write so poorly that it's not anywhere online (it's so bad that no one else would take their work) Also understand how easy it is to game the plagiarism software. All one needs to do is change every third or fourth word and it passes. If you trust your writers to do the right thing, then you'll get the writers that are worth trusting. Of course, if you suspect something, use Google.
Also understand that having a sentence or two in one article match another's content somewhere out there isn't going to hurt your search engine rankings and isn't going to get you blacklisted. Your site isn't going to make anyone mad by doing that. Relax, and worry about providing awesome content to your readers instead of pleasing the search engines. Search engines don't read your articles or buy your products anyway.
Set a Policy and Make Your Writers Aware of It
It doesn't help if only you know what plagiarism is and your writer's don't. This will only lead to misunderstandings. If you don't yet have a policy on plagiarism, set one and let your writer's know what this policy is and what counts as plagiarism. If you do have one, then make sure this is something everyone understands and is held accountable for when they join the team. Not holding people to the policy is just as bad, if not worse, then not having one at all.
Plagiarism is the cardinal sin of online writing and publishing. No one wants it to happen on their website because it ruins credibility, quality, and search engine rankings. It's understood to be a huge ethical problem. However, with the Internet, it's easier than ever to plagiarize while being just as difficult to catch it or to stop it. Here's what every blogger or online publication needs to know about plagiarism, and what constitutes plagiarism:
"the mere copying of text, but also the presentation of another's ideas as one's own, regardless of the specific words or constructs used to express that idea". Meaning, in order for text to be considered plagiarized, it needs to be a copy or close copy of the text AND lack attribution to the original author or source"
Yes, I copied and pasted that definition verbatim from the Wikipedia. But it's not plagiarism as I attributed the definition, placed the definition in quotes, and provided a hyperlink to the very web page I pulled the definition from. A mere word-for-word copy is NOT plagiarism, I repeat, it is NOT plagiarism. It only counts if it is not properly attributed and the author is trying to pass the words and/or ideas as one's own. There are many times when a word-for-word copy would be perfectly appropriate, or even preferable, such as a definition (especially a long or technical one), a direct quote, or a set of statistics.
Why are Plagiarism Detection Services Bad?
This distinction is important to remember because many plagiarism detection services can only detect blatant word-for-word copies of text and don't take those nuances into account when looking at a block of text. Services like Copyscape and Plagium would say that most of the above paragraph is plagiarism, despite the fact that I attributed the definition, quoted the definition (showing that I didn't write those words and that I am 'quoting' someone else), and provided a link to the exact web page I found the definition.
The word-for-word copy also doesn't account for another form of plagiarism: taking another's idea without attribution. I can take someone's public policy idea, change around enough words to pass these services, and then write about the idea as if I came up with it all on my own. It's also possible to pass these services by changing every third or fourth word, so with plagiarism detection services, it's important to exercise human judgement and intuition when evaluating an article. It's also important to let your writer's know that passing these services isn't enough, and ought to know the difference as well.
In our next post, we'll offer a few ways to catch plagiarism. In the meantime, you need to exercise your judgement by knowing that needs to be credited and what doesn't. Knowing this will make easier to
What Needs, and Doesn't Need, Credit or Attribution
Here, then, is a brief list from the Purdue Online Writing Lab of what needs to be credited or documented:
Words or ideas presented in a magazine, book, newspaper, song, TV program, movie, Web page, computer program, letter, advertisement, or any other medium
Information you gain through interviewing or conversing with another person, face to face, over the phone, or in writing
When you copy the exact words or a unique phrase
When you reprint any diagrams, illustrations, charts, pictures, or other visual materials
When you reuse or repost any electronically-available media, including images, audio, video, or other media
Things that don't need documentation or credit, also taken from the Purdue Online Writing Lab's page on plagiarism, include:
Writing your own lived experiences, your own observations and insights, your own thoughts, and your own conclusions about a subject
When you are writing up your own results obtained through lab or field experiments
When you use your own artwork, digital photographs, video, audio, etc.
When you are using "common knowledge," things like folklore, common sense observations, myths, urban legends, and historical events (but not historical documents)
When you are using generally-accepted facts, e.g., pollution is bad for the environment, including facts that are accepted within particular discourse communities, e.g., in the field of composition studies, "writing is a process" is a generally-accepted fact.
These days, it's so easy for anyone to get up in the morning and decide that day to start their own online magazine or publication. The tools are out there, making it simple, fast, and cheap to do so. however, just because anyone can start and do anything he or she wants doesn't mean that an online publiation should do anything it wants. Here are four things online magazines SHOULD NOT do because they are bad practice to do so:
Copy and Paste Entire Articles Without Credit - This is the cardinal sin of publishing. Copying and pasting without attribution is a huge no-no. Sometimes, copying and pasting with attribution can be dangerous. This is very tempting for online publications to do, especially if there are pressures to publish everyday or to make deadlines. however, those goals can be met if online magazines plan ahead for their content, or takes on enough writers to cover breaking news appropriately. If you are reprinting someone else's work, make sure to make it clear that it is a reprinting and not an original work of the magazine.
Vague Sourcing - This is when a quote or a piece of information is purposely sourced in an unclear or general way. For example, vague sourcing be saying "according to a business magazine," or "industrial magazines have said." Vague sourcing is bad practice because it doesn't help the reader, and it doesn't led credibility to your magazine since this type of sourcing makes it look like you're making things up or doesn't know where it's getting its information. You may be doing this vague sourcing for SEO purposes ('business magazine' or 'industrial magazine' are terms you'd like to rank for), but it looks awful from a reporting standpoint.
Covering Trending Topics Instead of Your Niche - It seems like a good idea to write an article about the latest iPhone for the sake of a few more hits, but if you're an online publication covering environmental news, do your readers care? Probably not, unless you discuss the iPhone and its environmental impact, or the sustainability policy of Apple. Then, it's okay because the topic has been tailored to your audience. But, if you're only covering a topic because you want to jump on a wave and to score some extra web traffic, you're only making your online magazine look bad. How many of those wanting to know about the latest iPhone are also going to care about the latest in the solar industry or how companies are implementing energy management? Probably not many. So, that spike in traffic may not last or be sustainable. How many of your current readers care about the new iPhone? Maybe a lot, but they are expecting environmental news from you, and are expecting to learn about the iPhone elsewhere.
Disguise the Identity of Your Writers - I used to write for an online publication that didn't want me to go by my professional byline. My professional name would bring up all the work that I've done for other clients in search engines resutls. The publication only wanted their articles to come up, so they wanted me to go by a different name to ensure that happened. The problem? It's shady. At least under my professional name, my identity and credentials can be verified. With the unique name, you won't find a LinkedIn profile or a professional website. You won't find any claim that this person exists beyond those articles for the one publication. That's fishy. Why wouldn't a publication want to acknowledge the accomplishments of their writers, or let their writers add the articles from this publications to their portfolio? Sounds a little greedy, as if the publication doesn't have much regard for their writers.
Doing any of these four things may seem like a good idea because it benefits the publication, but consider that your readers are your customer, and that you ought to do things that benefit your readers. If something that benefits the publication creates a negative reader experience, like the vague sourcing or the inability for your readers to verify the identity or the credentials of your writers, then they are considered bad practice and shouldn't be done. Without your readers, you wouldn't be much of an online publication.
Copyscape is a popular plagiarism detection service that many folks use to see if their content is being stolen, as well as to see if prepared content has been plagiarized from other sources. Many are happy with Copyscape and the service it provides, presuming that it does a good job of catching plagiarism and content fraud. However, I hate the darn thing, and more professional writers ought to share in my enmity. Copyscape does not do as good a job as people think it's doing.My rage is due to the fact that a few days ago I was falsely accused of plagiarism by a potential client, because of the Copyscape results he received for my article. In our conversation, he never specified what it had flagged; just said that "chunks" of it were copied. Since I didn't know what it caught, I had no idea how defend myself. I guessed that Copyscape caught the survey statistics I mentioned, and offered that as the explanation, but he didn't like that. He said this whole thing was unprofessional and didn't want to take the risk working with me. Obviously, I did not get the gig, and I did not appreciate the quick and harsh accusation.
Worried of the potential damage this could have to my career and credibility, I ran the article through Copyscape myself to see what it flagged. It flagged TWO sentences, out of this 400-word article. To boot, these two sentences were meant to be a technical definition, something that you'd want to have verbatim to ensure accuracy. He also didn't see that I had included several hyperlinks throughout the article, including a hyperlink to the web page I got these two sentences since technical difficulties forced me to send him a text only version, instead of the actual document that included the hyperlinks (in my experience, one can't hyperlink in chat boxes). If he was able to see the hyperlinks, he would have seen that I had hyperlinked this definition to the web page I got it from. I explained the technical difficulty to the client twice, but it didn't seem to matter. All that mattered was that some words matched some other words somewhere else online, coming to the conclusion that the whole article was copied and that I'm not to be trusted.
Copyscape had also listed 20 results of copied content, except it was 20 different sites that had these same two sentences, so really it was one result instead of 20. Copyscape also didn't catch the survey statistics, which I actually did pull verbatim from the website. I don't think the client really perused these results, cause he would have seen that the results were a false positive.
And I am not the only one. A writer based in El Paso, Texas, who asked to remain anonymous, shared her story with me. Anonymous wrote a piece on gambling addiction, and the editor sent it back to her saying there was plagiarism. The results from Copyspace revealed a few phrases and a hotline from a web page as the plagiarism. Her editor now wants her to rework the piece or write something entirely different. She could rework the piece, but Anonymous fears that the editor won't trust that the rest of her work is original.
I've proceeded to run a few more of my articles (ones that are published and live on the web) through the system, with mixed results. It caught some in their entirety. Others, it only caught sentences and statistics, and not the whole article. There was one article where it didn't catch anything at all, leading me to believe that Copyscape isn't as reliable as people are hoping and thinking it is.
According to Wikipedia, plagiarism is "the mere copying of text, but also the presentation of another's ideas as one's own, regardless of the specific words or constructs used to express that idea". Meaning, in order for text to be considered plagiarized, it needs to be a copy or close copy of the text AND lack attribution to the original author or source. Yes, I copied that definition verbatim from the Wikipedia, but it's not plagiarism as I attributed the definition, placed the definition in quotes, and provided a hyperlink to the very web page I pulled the definition from. And, lovely lovely Copyscapeflagged this paragraph as plagiarism, despite my extra efforts.
Attribution for online content is different from print content like an academic paper. It's not as if endnotes or footnotes really look great on a blog or web page. I think that proper online attribution means a hyperlink and/or a statement of the source, with quotation marks if the words are exact words. Since hyperlinks help in Google rankings, I don't think anyone would challenge
In contrast, many so-called plagiarism detection services, LIKE COPYSCAPE, can only detect blatant word-for-word copies of text. A mere word-for-word copy is NOT plagiarism, I repeat, it is NOT plagiarism. It only counts if it is not properly attributed. There are many times when a word-for-word copy would be perfectly appropriate, like a definition, a direct quote, or a set of statistics.
Here, then, is a brief list from the Purdue Online Writing Lab of what needs to be credited or documented:
Words or ideas presented in a magazine, book, newspaper, song, TV program, movie, Web page, computer program, letter, advertisement, or any other medium
Information you gain through interviewing or conversing with another person, face to face, over the phone, or in writing
When you copy the exact words or a unique phrase (which means that a word-for-word copy is okay, as long as it is attributed)
When you reprint any diagrams, illustrations, charts, pictures, or other visual materials
When you reuse or repost any electronically-available media, including images, audio, video, or other media
There are, of course, certain things that do not need documentation or credit, which is important to note because services like Copyscape just look at the text, but don't look at how the text is used, what the text says, or if the text comes with the proper attributions, Things that don't need documentation or credit, also taken from the Purdue Online Writing Lab's page on plagiarism, include:
Writing your own lived experiences, your own observations and insights, your own thoughts, and your own conclusions about a subject
When you are writing up your own results obtained through lab or field experiments
When you use your own artwork, digital photographs, video, audio, etc.
When you are using "common knowledge," things like folklore, common sense observations, myths, urban legends, and historical events (but not historical documents)
When you are using generally-accepted facts, e.g., pollution is bad for the environment, including facts that are accepted within particular discourse communities, e.g., in the field of composition studies, "writing is a process" is a generally-accepted fact.
I suspect that Anonymous and I aren't the only ones who've been wrongly accused of such an unethical deed. This one incident wouldn't be a big deal, except that as a professional writer, an accusation of plagiarism could have widespread and career-damaging consequences, whether the accusation is true or not. After all, a man cleared from death row after 20 years in prison doesn't suddenly have the ordeal over and done with. That sort of thing remains with you long after the whole thing, just like an "act" of plagiarism.
Writers who've been dealt injustice because of faulty Copyscape results need to come forward with their stories, to show that you are not alone and that this is problem. Those wanting our content need to understand what plagiarism really is, and realize that Copyscape shouldn't be taken as foolproof and absolute.
In Part II, I will complete a full statistical analysis of Copyscape, running all of my online articles through the system and summarizing the results. I have hundreds of articles live on the web, so the results should be valid. In Part III, I will offer alternatives to catching plagiarism and content fraud.