Detecting Plagiarism in Generative AI

Learn how to detect plagiarism with generative AI.

Introduction

Generative AI is a helpful tool that can be used to assist us in a variety of tasks ranging from creating cooking recipes to debugging code and even creating an amazing Dungeons and Dragons scenario! But we need to be cognizant that Generative AI can provide plagiarized data. Wait a second…this sounds a lot like the Detecting Hallucinations in Generative AI article. Well, it is.

Similar to Detecting Hallucinations in Generative AI, we must be on the watch for plagiarism when using Generative AI. Plagiarism: the practice of taking someone else’s work or ideas and passing them off as one’s own. Generative AI plagiarism can be broken up into three categories: using generative AI to detect plagiarism, generative AI plagiarizing content, and detecting generated content.

Detecting Plagiarism with ChatGPT

One of the pros of generative AI is using it as a checker to determine if content is plagiarized. For example, you can submit text to ChatGPT and it will be able to detect some level of plagiarism. Let’s see how this works. We will use the first paragraph of Moby Dick and see if ChatGPT can detect that it is plagiarized.

Prompt:

Can you detect if something is plagiarized?

Response:

I can certainly help you identify potential plagiarism by comparing a given text with publicly available sources up to my knowledge cutoff date in September 2021. However, I am not a dedicated plagiarism detection tool, and my capabilities are limited compared to specialized plagiarism detection software and services.

To detect plagiarism effectively, you should consider using dedicated plagiarism detection tools and services like Turnitin, Copyscape, Grammarly, or specialized software provided by educational institutions or publishers. These tools are designed to compare a given text with a vast database of academic papers, articles, websites, and other sources to identify potential instances of plagiarism.

If you suspect plagiarism in a text and want to check it, you can copy and paste the text into one of these dedicated tools or services to get a comprehensive plagiarism report. Keep in mind that plagiarism detection tools have varying levels of sensitivity and accuracy, and it's essential to use them in conjunction with your judgment and understanding of plagiarism guidelines.

Prompt:

"""
Call me Ishmael. Some years ago—never mind how long precisely—having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world. It is a way I have of driving off the spleen and regulating the circulation. Whenever I find myself growing grim about the mouth; whenever it is a damp, drizzly November in my soul; whenever I find myself involuntarily pausing before coffin warehouses, and bringing up the rear of every funeral I meet; and especially whenever my hypos get such an upper hand of me, that it requires a strong moral principle to prevent me from deliberately stepping into the street, and methodically knocking people’s hats off—then, I account it high time to get to sea as soon as I can. This is my substitute for pistol and ball. With a philosophical flourish Cato throws himself upon his sword; I quietly take to the ship. There is nothing surprising in this. If they but knew it, almost all men in their degree, some time or other, cherish very nearly the same feelings towards the ocean with me.
"""

Response:

The passage you've provided is an excerpt from the novel "Moby-Dick" by Herman Melville. This novel was published in 1851 and is in the public domain, which means it can be freely used and quoted without the need for attribution or concerns about plagiarism. If you have any specific questions or need further information about this passage or the book, please feel free to ask.

As expected, ChatGPT was able to successfully determine that the paragraph provided was originally authored in Moby Dick.

Remember that ChatGPT only references data up to September 2021. Because of this limitation, we will also discuss other tools we can use to check for plagiarism. Additionally, ChatGPT can only detect plagiarism that it has been trained on. It is possible that ChatGPT, or any generative AI, is unfamiliar with your text.

Detecting when ChatGPT Plagiarizes

The next consideration of plagiarism is detecting when ChatGPT generates content that is considered plagiarized. Since generative AI creates content that derives from both prompts and training data, it is possible that generative AI outputs plagiarized content. Let’s ask ChatGPT if it plagiarizes and see the output.

Prompt:

Does generative AI plagiarize?

Response:

Generative AI, like the language model you are currently interacting with, does not inherently plagiarize. Instead, generative AI generates text based on the patterns and information it has learned from the training data it was provided. It can't consciously copy or reproduce specific texts with the intent to plagiarize.

However, the use of generative AI can potentially lead to situations where its generated content resembles existing copyrighted material. In such cases, it's the responsibility of the user to ensure that the content generated by AI complies with copyright and plagiarism guidelines.

To avoid unintentional plagiarism when using generative AI:

Attribute Sources: If you use information from specific sources when generating content with AI, it's important to properly attribute those sources and cite them in your work.

Review and Edit: Always review and edit the content generated by AI to ensure it meets your requirements and doesn't unintentionally resemble existing material too closely.

Avoid Direct Copying: Do not use AI to generate verbatim copies of copyrighted material or other people's work without permission.

Understand Copyright: Familiarize yourself with copyright laws and plagiarism guidelines in your field to ensure you're using AI-generated content responsibly and ethically.

Generative AI can be a valuable tool for content creation, but it's crucial to use it responsibly and in compliance with legal and ethical standards to avoid any potential issues related to plagiarism.

Generative AI may not purposefully, or even explicitly, plagiarize content. As we use generative AI, we must be deliberate in making sure that we author our own words and use generative AI as a tool and not a solution.

Detecting AI-Generated Content

The last consideration for generative AI and plagiarism is detecting AI-generated content. This is considered plagiarism because that means someone submitted AI-generated content, which is technically not theirs. There is a current legal debate about who owns AI-generated content but at this time, it is not the person who used prompts to generate the content.

There are many tools such as GPTZero, Copyleaks, and Writer, that can be used to detect AI-Generated Content. Let’s run one of them through a test!

Prompt:

You are an experienced software engineer. What are some of the best programming practices in Python?

Response:

Certainly! Python is known for its readability and ease of use, and there are several best programming practices that can help you write clean and maintainable Python code:

PEP 8 Style Guide: Follow the PEP 8 style guide, which is the official style guide for Python. It covers naming conventions, code layout, and other recommendations for writing clean and readable code.

Use Descriptive Names: Choose meaningful and descriptive names for variables, functions, classes, and modules. This makes your code self-explanatory and easier to understand.

Whitespace and Indentation: Python uses whitespace and indentation to define code blocks. Use consistent and proper indentation (typically four spaces) to maintain readability.

Comments and Docstrings: Include comments to explain complex logic or non-obvious decisions. Additionally, use docstrings to provide documentation for functions, classes, and modules.

Modularization: Break your code into smaller, reusable modules or functions. This promotes code reusability and makes it easier to maintain and test.

Avoid Global Variables: Minimize the use of global variables, as they can make your code harder to understand and lead to unexpected side effects. Use function parameters and return values instead.

Exception Handling: Handle exceptions gracefully using try...except blocks. Be specific about the exceptions you catch and handle, rather than using a general except clause.

List Comprehensions: Use list comprehensions for concise and readable code when working with lists or other iterable data structures.

Use Built-in Functions and Libraries: Python has a rich standard library. Utilize built-in functions and libraries to avoid reinventing the wheel and to write more efficient code.

Avoid Hardcoding: Avoid hardcoding values that may change. Use constants or configuration files to store such values.

Virtual Environments: Use virtual environments (e.g., venv or virtualenv) to isolate project dependencies and prevent conflicts between packages.

Version Control: Use a version control system (e.g., Git) to track changes in your code. This helps you collaborate with others and maintain a history of your codebase.

Unit Testing: Write unit tests for your code using Python's built-in unittest, doctest, or external libraries like pytest. This ensures that your code functions correctly and can be easily refactored.

PEP 20 (The Zen of Python): Familiarize yourself with the Zen of Python (accessible via import this). It provides guiding principles for writing Pythonic code.

Avoid Inefficient Loops: Try to avoid nested loops with high time complexity. Python provides many built-in tools like list comprehensions, map, filter, and reduce that can make your code more efficient.

Use Context Managers: Utilize context managers (with statements) when working with external resources like files, to ensure proper resource cleanup.

Type Hinting: Use type hinting, introduced in PEP 484, to improve code readability and help IDEs and linters catch potential type-related errors.

Continuous Integration: Set up a continuous integration (CI) system to automate testing and deployment, ensuring that your code remains robust and functional.

Code Reviews: Conduct code reviews with your team. A fresh set of eyes can catch issues, offer suggestions, and improve the overall code quality.

Adhering to these best programming practices will help you write clean, maintainable, and Pythonic code that is easy to understand and work with.

Image of GPTZero detecting plagiarism

Although not perfect, GPTZero had a pretty good determination that most of the sentences were AI-generated.

Conclusion

When it comes to generative AI, plagiarism is a serious matter. First, we can use generative AI to detect plagiarism. Second, we must be mindful that generative AI can produce plagiarized content. Lastly, trying to pass generated content as your own is considered plagiarism. There are tools out there that can determine generated content so be careful!

If you’re interested in reading more about how Generative AI can be applied in your daily life, please check out our AI Catalog of articles!

Author

Codecademy Team

The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.

Meet the full team