Detecting copy-and-pasted Code
I came across this interesting little program to detect code that appears to be copied and pasted from one place to another. It's about $20.00 and is free to try for Linux and Windows and works with any text.
It doesn't seem like a difficult program to write, sounds like it might be similar to the rsync algorithm. Differences would be that you would chunk into n lines instead of bytes and you would store all the checksums in a hash table, to find duplicates. Maybe cleaning up whitespaces and removing comments might also improve the algorithm.
It doesn't seem like a difficult program to write, sounds like it might be similar to the rsync algorithm. Differences would be that you would chunk into n lines instead of bytes and you would store all the checksums in a hash table, to find duplicates. Maybe cleaning up whitespaces and removing comments might also improve the algorithm.
Comments