C++ 实现字符串差异比较

108 阅读2分钟

在某些场景中,我们需要比较两个字符串是否存在仅在数字子集结构上存在差异的情况,例如:

huake_00015_.jpg

varies_in_single_number_field('foo7bar', 'foo123bar')
# 返回值为 True,因为 7 != 123,且仅有一个数值区域存在差异。

在 Python 中,可以使用 difflib 库来实现字符串差异比较,它可以帮我们确定两个字符串是否仅存在数字子集结构上的差异。

2、解决方案

以下是 C++ 版本的 varies_in_single_number_field 函数实现:

#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <cctype>

bool starts_with(const std::string &s1, const std::string &s2) {
    return (s1.length() <= s2.length()) && (s2.substr(0, s1.length()) == s1);
}

bool ends_with(const std::string &s1, const std::string &s2) {
    return (s1.length() <= s2.length()) && (s2.substr(s2.length() - s1.length()) == s1);
}

bool is_numeric(const std::string &s) {
    for(std::string::const_iterator it = s.begin(); it != s.end(); ++it) {
        if(!std::isdigit(*it)) {
                return false;
        }
    }
    return true;
}

bool varies_in_single_number_field(std::string s1, std::string s2) {

    size_t index1 = 0;
    size_t index2 = s1.length() - 1;

    if(s1 == s2) {
        return false;
    }

    if((s1.empty() && is_numeric(s2)) || (s2.empty() && is_numeric(s1))) {
        return true;
    }

    if(s1.length() < s2.length()) {
        s1.swap(s2);
    }

    while(index1 < s1.length() && starts_with(s1.substr(0, index1), s2)) { index1++; }
    while(ends_with(s1.substr(index2), s2)) { index2--; }

    return is_numeric(s1.substr(index1 - 1, (index2 + 1) - (index1 - 1)));

}

int main() {
    std::cout << std::boolalpha << varies_in_single_number_field("foo7bar00", "foo123bar00") << std::endl;
    std::cout << std::boolalpha << varies_in_single_number_field("foo7bar00", "foo123bar01") << std::endl;
    std::cout << std::boolalpha << varies_in_single_number_field("foobar00", "foo123bar00") << std::endl;
    std::cout << std::boolalpha << varies_in_single_number_field("foobar00", "foobar00") << std::endl;
    std::cout << std::boolalpha << varies_in_single_number_field("7aaa", "aaa") << std::endl;
    std::cout << std::boolalpha << varies_in_single_number_field("aaa7", "aaa") << std::endl;
    std::cout << std::boolalpha << varies_in_single_number_field("aaa", "7aaa") << std::endl;
    std::cout << std::boolalpha << varies_in_single_number_field("aaa", "aaa7") << std::endl;
}

该函数的逻辑如下:

  1. 首先比较两个字符串的长度,如果长度不同,则将较长的字符串交换到 s1 中。
  2. 然后从左向右和从右向左比较两个字符串,直到发现第一个不相同的字符。
  3. 接着检查这两个不相同的字符之间的子字符串是否是数字,如果是,则返回 True,否则返回 False

以下是该函数的一些测试用例:

varies_in_single_number_field("foo7bar00", "foo123bar00")
# True

varies_in_single_number_field("foo7bar00", "foo123bar01")
# False

varies_in_single_number_field("foobar00", "foo123bar00")
# True

varies_in_single_number_field("foobar00", "foobar00")
# False

varies_in_single_number_field("7aaa", "aaa")
# True

varies_in_single_number_field("aaa7", "aaa")
# True

varies_in_single_number_field("aaa", "7aaa")
# True

varies_in_single_number_field("aaa", "aaa7")