skip to main content
10.1145/2023607.2023677acmotherconferencesArticle/Chapter ViewAbstractPublication PagescompsystechConference Proceedingsconference-collections
research-article

The impact of text pre-processing to determine the similarity in students assignments

Authors Info & Claims
Published:16 June 2011Publication History

ABSTRACT

The aim of this paper is to appraise of the problems of plagiarism in students assignments. We focus on pre-processing techniques of Slovak texts assignments such as removing stop words, replacing synonyms, lemmatization, using of readability index. The main goal of this paper is find out if we can identify original student assignment and plagiarism of original student assignment based on their readability. Based on the result of further experimentation, we find which combinations of pre-processing techniques and methods for determining the similarity of students assignments are the most suitable, if we want to detect similarity as exactly as possible and for particular techniques to find out the extent in detection of categorised types of plagiarism.

References

  1. Chudá, D., Návrat, P.: Support for checking plagiarism in e-learning, Procedia -Social and Behavioral Sciences, Volume 2, Issue 2, Innovation and Creativity in Education, 2010, Pages 3140--3144, ISSN 1877--0428Google ScholarGoogle ScholarCross RefCross Ref
  2. DuBay, W. H.: The Principles of Readability. Impact Information, Costa Mesa, 2004.Google ScholarGoogle Scholar
  3. Flesch, R.: A new readability yardstick In Journal of Applied Psychology, Volume 32, 1948, 221--233.Google ScholarGoogle ScholarCross RefCross Ref
  4. Jones, K. O.: Practical Issues for Academics Using the Turnitin Plagiarism Detection Software, In Proceedings of the 9th international Conference on Computer Systems and Technologies and Workshop For PhD Students in Computing(Gabrovo, Bulgaria, June 12-13, 2008). B. Rachev and A. Smrikarov, Eds. CompSysTech '08, vol. 374. ACM, New York, NY, P. IV-1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Lukashenko, R., Graudina, V., and Grundspenkis, J. 2007. Computer-based plagiarism detection methods and tools: an overview. In Proceedings of the 2007 international Conference on Computer Systems and Technologies (Bulgaria, June 14 - 15, 2007). B. Rachev, A. Smrikarov, and D. Dimov, Eds. CompSysTech '07, vol. 285. ACM, New York, NY, 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Návrat, P., Taraba, T., Bou Ezzeddine, A., Chudá, D.: Context Search Enhanced by Readability Index. In: IFIP AI Milan Italy, Artificial Intelligence in Theory and Practice II, Springer Boston, ISBN 978-0-387-09694-0, 2008Google ScholarGoogle Scholar
  7. Zahoranský, D., Polášek, I.: Rule Based Phonetic Search Approaches for Central Europe. In SISY 2010, 8th International Symposium on Intelligent Systems and Informatics, Subotica, Serbia, September 2010. -: IEEE, 2010. - ISBN 978-1-4244-7395-3. - P. 71--76Google ScholarGoogle Scholar

Index Terms

  1. The impact of text pre-processing to determine the similarity in students assignments

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            CompSysTech '11: Proceedings of the 12th International Conference on Computer Systems and Technologies
            June 2011
            688 pages
            ISBN:9781450309172
            DOI:10.1145/2023607

            Copyright © 2011 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 16 June 2011

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate241of492submissions,49%
          • Article Metrics

            • Downloads (Last 12 months)3
            • Downloads (Last 6 weeks)1

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader