mirror of
https://github.com/Xeeynamo/sotn-decomp.git
synced 2024-11-26 22:40:33 +00:00
4588d94071
This adds an equivalent of find_duplicates.py to tools/dups. This runs in about 2 minutes on my machine so it should be significantly faster on the CI. The algorithm isn't exactly the same so the report is a little different. Here's an example: https://gist.github.com/sozud/503fd3b3014668e6644fb2dfae51d5e5 This works by grouping all the functions in to clusters, basically: ``` if levenshtein_similarity > threshold cluster.append(current_function) ``` Memoization gives a little speedup to avoid computing the levenshtein distance for the same pairs over and over again. This is still a brute-force algorithm. I did some research and there's a lot of similar problems but didn't find something that seemed like it would be a good fit. I think this is probably fast enough to last for a while. |
||
---|---|---|
.. | ||
src | ||
.gitignore | ||
Cargo.lock | ||
Cargo.toml |