Detecting copy-and-pasted Code

I came across this interesting little program to detect code that appears to be copied and pasted from one place to another. It's about $20.00 and is free to try for Linux and Windows and works with any text.
It doesn't seem like a difficult program to write, sounds like it might be similar to the rsync algorithm. Differences would be that you would chunk into n lines instead of bytes and you would store all the checksums in a hash table, to find duplicates. Maybe cleaning up whitespaces and removing comments might also improve the algorithm.

Comments

Popular posts from this blog

Shortest Sudoku solver in Python

Why does God hate amputees?