mirror of
https://github.com/Xeeynamo/sotn-decomp.git
synced 2025-02-17 11:37:37 +00:00
![sozud](/assets/img/avatar_default.png)
This adds an equivalent of find_duplicates.py to tools/dups. This runs in about 2 minutes on my machine so it should be significantly faster on the CI. The algorithm isn't exactly the same so the report is a little different. Here's an example: https://gist.github.com/sozud/503fd3b3014668e6644fb2dfae51d5e5 This works by grouping all the functions in to clusters, basically: ``` if levenshtein_similarity > threshold cluster.append(current_function) ``` Memoization gives a little speedup to avoid computing the levenshtein distance for the same pairs over and over again. This is still a brute-force algorithm. I did some research and there's a lot of similar problems but didn't find something that seemed like it would be a good fit. I think this is probably fast enough to last for a while.