Fix Jaro and Jaro-Winkler when the length is one

This commit is contained in:
Danny Guo 2019-12-12 21:35:56 -05:00 committed by Danny Guo
parent c4cdd9c35d
commit 605c81c9b9
2 changed files with 60 additions and 2 deletions

View File

@ -1,8 +1,15 @@
# Change Log
This project attempts to adhere to [Semantic Versioning](http://semver.org).
## [Unreleased]
### Fixed
- Fix Jaro and Jaro-Winkler when the arguments have lengths of 1 and are equal.
Previously, the functions would erroneously return 0 instead of 1. Thanks to
[@vvrably](https://github.com/vvrably) for pointing out the issue.
## [0.9.2] - (2019-05-09)
### Changed
@ -25,14 +32,19 @@ This project attempts to adhere to [Semantic Versioning](http://semver.org).
- Generic distance functions (thanks [@lovasoa](https://github.com/lovasoa))
## [0.8.0] - (2018-08-19)
### Added
- Normalized versions of Levenshtein and Damerau-Levenshtein (thanks [@gentoid](https://github.com/gentoid))
## [0.7.0] - (2018-01-17)
### Changed
- Faster Levenshtein implementation (thanks [@wdv4758h](https://github.com/wdv4758h))
### Removed
- Remove the "against_vec" functions. They are one-liners now, so they don't
seem to add enough value to justify making the API larger. I didn't find
anybody using them when I skimmed through a GitHub search. If you do use them,
@ -42,83 +54,117 @@ let distances = strings.iter().map(|a| jaro(target, a)).collect();
```
## [0.6.0] - (2016-12-26)
### Added
- Add optimal string alignment distance
### Fixed
- Fix Damerau-Levenshtein implementation (previous implementation was actually
optimal string alignment; see this [Damerau-Levenshtein explanation])
## [0.5.2] - (2016-11-21)
### Changed
- Remove Cargo generated documentation in favor of a [docs.rs] link
## [0.5.1] - (2016-08-23)
### Added
- Add Cargo generated documentation
### Fixed
- Fix panic when Jaro or Jaro-Winkler are given strings both with a length of
one
## [0.5.0] - (2016-08-11)
### Changed
- Make Hamming faster (thanks @IBUzPE9) when the two strings have the same
length but slower when they have different lengths
## [0.4.1] - (2016-04-18)
### Added
- Add Vagrant setup for development
- Add AppVeyor configuration for Windows CI
### Fixed
- Fix metrics when given strings with multibyte characters (thanks @WanzenBug)
## [0.4.0] - (2015-06-10)
### Added
- For each metric, add a function that takes a vector of strings and returns a
vector of results (thanks @ovarene)
## [0.3.0] - (2015-04-30)
### Changed
- Remove usage of unstable Rust features
## [0.2.5] - (2015-04-24)
### Fixed
- Remove unnecessary `Float` import from doc tests
## [0.2.4] - (2015-04-15)
### Fixed
- Remove unused `core` feature flag
## [0.2.3] - (2015-04-01)
### Fixed
- Remove now unnecessary `Float` import
## [0.2.2] - (2015-03-29)
### Fixed
- Remove usage of `char_at` (marked as unstable)
## [0.2.1] - (2015-02-20)
### Fixed
- Update bit vector import to match Rust update
## [0.2.0] - (2015-02-19)
### Added
- Implement Damerau-Levenshtein
- Add tests in docs
## [0.1.1] - (2015-02-10)
### Added
- Configure Travis for CI
- Add rustdoc comments
### Fixed
- Limit Jaro-Winkler return value to a maximum of 1.0
- Fix float comparisons in tests
## [0.1.0] - (2015-02-09)
### Added
- Implement Hamming, Jaro, Jaro-Winkler, and Levenshtein
[Unreleased]: https://github.com/dguo/strsim-rs/compare/0.9.2...HEAD

View File

@ -72,9 +72,11 @@ pub fn generic_jaro<'a, 'b, Iter1, Iter2, Elem1, Elem2>(a: &'a Iter1, b: &'b Ite
// The check for lengths of one here is to prevent integer overflow when
// calculating the search range.
if a_len == 0 && b_len == 0 {
return 1.0
} else if a_len == 0 || b_len == 0 || (a_len == 1 && b_len == 1) {
return 1.0;
} else if a_len == 0 || b_len == 0 {
return 0.0;
} else if a_len == 1 && b_len == 1 {
return if a.into_iter().eq(b.into_iter()) { 1.0} else { 0.0 };
}
let search_range = (max(a_len, b_len) / 2) - 1;
@ -491,6 +493,11 @@ mod tests {
assert_eq!(0.0, jaro("a", "b"));
}
#[test]
fn jaro_same_one_character() {
assert_eq!(1.0, jaro("a", "a"));
}
#[test]
fn generic_jaro_diff() {
assert_eq!(0.0, generic_jaro(&[1, 2], &[3, 4]));
@ -561,6 +568,11 @@ mod tests {
assert_eq!(0.0, jaro_winkler("a", "b"));
}
#[test]
fn jaro_winkler_same_one_character() {
assert_eq!(1.0, jaro_winkler("a", "a"));
}
#[test]
fn jaro_winkler_diff_no_transposition() {
assert!((0.840 - jaro_winkler("dwayne", "duane")).abs() < 0.001);