Stanford CS106B 2022 Assignment 7. The Great Stanford Hash-Off解析

30 阅读9分钟

 Part Three: Implementing Linear Probing

线性探测思路比较简单,这里建议阅读一下HashFunction.h的源码,明白调用哈希类的实例时传入一个数据结构T是怎么返回一个int类型的数据,详细知识点注释在以下源码里了

cpp中类模板,泛型的使用,explicit关键字禁止隐式类型转换,std::function类型可以让函数作为参数来使用,函数调用运算符让实例可以像函数一样被调用,这里就是通过调用实例并传递一个T数据类型来获取散列值,它内部是通过callback函数来实际计算散列值

//类模板,它的定义中使用了占位符 T,可以表示任何数据类型,可以根据需要使用模板来创建具体的类
//例如,你可以实例化模板为 HashFunction<int> 来创建一个处理整数的类,或实例化为 HashFunction<string> 来创建一个处理字符串的类。
template <typename T> class HashFunction {
public:
    /**
     * Constructs a new HashFunction for the given number of slots. Each hash
     * function constructed this way will be initialized randomly.
     *
     * The second argument is a random seed. Setting this value is
     * useful if you'd like to have your hash function behave consistently
     * across runs of the program.
     */
    //explicit禁止隐式转换,有参构造函数用于明确配置哈希函数
    explicit HashFunction(int numSlots, int randomSeed = 0);

    /**
     * Constructs a new HashFunction. This HashFunction cannot be used bceause
     * it won't have been initialized with a number of buckets, and trying to
     * use it will cause a runtime error.
     *
     * You shouldn't directly use this constructor; it's only here so that
     * you can declare variables of type HashFunction and initialize them
     * later.
     */
    //无参构造函数用于允许用户在稍后初始化哈希函数对象,这里会返回错误,具体看实现
    HashFunction();

    /**
     * Constructs a hash function that specifically uses the underlying raw
     * hash code as its hash function. This is useful if you want to guarantee
     * predictable values for your hash function when testing.
     */
    //静态成员函数可以通过类名来调用,而无需创建类的对象
    //HashFunction: 这是返回类型,表示 wrap 函数会返回一个 HashFunction 对象
    /*
wrap 函数允许用户提供自定义的哈希函数,以替代默认的哈希函数。这在测试和调试过程中非常有用,
因为它允许你在测试时使用可预测的哈希函数。用户可以提供一个符合特定签名的哈希函数,
即接受一个 const T & 类型的参数(T 是哈希表中存储的元素类型),
并返回一个整数。这个哈希函数将被用于将元素映射到哈希表的槽中。
    */
    static HashFunction wrap(int numSlots,
                             std::function<int (const T &)> hashFn);
/*
 * std::function<int (const T &)> 表示一个可以接受 const T & 类型参数
 * 并返回 int 类型的可调用对象。这里的 T 是一个泛型类型,可以是任何数据类型,
 * 例如整数、字符串等。 std::function 的模板参数用于指定可调用对象的签名,
 * 即它的参数类型和返回类型。
*/
    /**
     * Returns the number of slots this hash function is designed to operate
     * over.
     */
    int numSlots() const;

    /**
     * Applies the hash function to the specified argument. The syntax for
     * using this function is
     *
     *     hashFn(argument)
     *
     * That is, you'll treat the variable of type HashFunction as though it's
     * an honest-to-goodness function rather than a variable of some type.
     */
    //定义了一个函数调用运算符 operator(),允许对象被使用类似函数的方式来调用
    /*
int: 这是函数调用运算符的返回类型,表示它将返回一个整数值,即哈希值。

operator(): 这是函数调用运算符的名称,用于表示对象可以像函数一样进行调用。

(const T& argument): 这是函数调用运算符的参数列表,它接受一个类型为 T 的参数(通常是哈希表中的元素),并将其标记为 const,以表明在调用期间不会修改参数的值。
    当实例化好一个哈希函数对象后,通过调用对象传入T就通过此方法返回哈希值
*/
    int operator() (const T& argument) const;

private:
    std::function<int(const T&)> callback;//它是一个 std::function 对象,用于存储用户自定义的哈希函数
    int mNumSlots;//用于存储哈希表的槽数数量

    /*
static_assert(stanfordcpplib::collections::IsHashable<T>::value, ...);:
这是一个 static_assert 断言,它在编译时检查某个条件是否为真。具体来说,它检查类型 T
是否符合 stanfordcpplib::collections::IsHashable 的要求。如果不符合,将引发编译时错误,
显示错误消息。

stanfordcpplib::collections::IsHashable 是一个模板元编程技术中的类型特性(type trait)。它用于检查类型是否具有哈希功能,即是否可以被哈希。

static_assert 用于在编译时进行条件检查,如果条件不满足,则会引发编译错误。在这里,它确保
只有可哈希的类型才能用于 HashFunction 类的模板参数 T。

错误消息 "Oops! You've tried to make a HashFunction for a type that isn't hashable
. ..." 是一个友好的编译时错误消息,它向开发人员指出他们尝试为不可哈希的类型创建 HashFunction,
并提供了进一步的细节信息。
    */
    static_assert(stanfordcpplib::collections::IsHashable<T>::value,
                  "Oops! You've tried to make a HashFunction for a type that isn't hashable. "
                  "Double-click this error message for more details.");

    /*
     * Hello CS106 students! If you got directed to this line of code in a compiler error,
     * it probably means that you tried making a HashFunction<T> with a custom struct or
     * class type.
     *
     * In order to have a HashFunction<T> for a type T, the type T needs to have a hashCode
     * function defined and be capable of being compared using the == operator. If you were
     * directed here, one of those two conditions wasn't met.
     *
     * There are two ways to fix this. The first option would simply be to not use your custom
     * type in conjunction with HashFunction<T>. This is probably the easiest option.
     *
     * The second way to fix this is to explicitly define a hashCode() and operator== function
     * for your type. To do so, first define hashCode as follows:
     *
     *     !!!stanfordcpplib::collections::hashCode 函数来帮助计算哈希值!!!
     *
     *     int hashCode(const YourCustomType& obj) {
     *         return stanfordcpplib::collections::hashCode(obj.data1, obj.data2, ..., obj.dataN);
     *     }
     *
     * where data1, data2, ... dataN are the data members of your type. For example, if you had
     * a custom type
     *
     *     struct MyType {
     *         int myInt;
     *         string myString;
     *     };
     *
     * you would define the function
     *
     *     int hashCode(const MyType& obj) {
     *         return stanfordcpplib::collections::hashCode(obj.myInt, obj.myString);
     *     }
     *
     * Second, define operator== as follows:
     *
     *     bool operator== (const YourCustomType& lhs, const YourCustomType& rhs) {
     *         return lhs.data1 == rhs.data1 &&
     *                lhs.data2 == rhs.data2 &&
     *                         ...
     *                lhs.dataN == rhs.dataN;
     *     }
     *
     * Using the MyType example from above, we'd write
     *
     *     bool operator== (const MyType& lhs, const MyType& rhs) {
     *         return lhs.myInt == rhs.myInt && lhs.myString == rhs.myString;
     *     }
     *
     * Hope this helps!
     */
};

namespace hashfunction_detail {
    std::function<int(int)> tabulationHashFunction(int seed);
}

/* * * * * Implementation Below This Point * * * * */
template <typename T>
HashFunction<T>::HashFunction(int numSlots, int seed) {
    if (numSlots <= 0) {
        error("HashFunction<T>::wrap(): numSlots must be positive.");
    }

    auto scrambler = hashfunction_detail::tabulationHashFunction(seed);
    mNumSlots = numSlots;
    /*
callback 是一个 lambda 函数,它接受一个参数 const T& key,其中 key 是要计算哈希值的输入。

[scrambler, numSlots]:这部分是 lambda 函数的捕获列表。它告诉 lambda 函数要捕获两个变量
,scrambler 和 numSlots。这些变量在 lambda 函数内部可以直接使用,而不需要显式传递它们。
const T& key:这是 lambda 函数的参数列表,表示 lambda 函数接受一个常量引用类型的参数 key,
该参数是要计算哈希值的输入。
    */
    callback = [scrambler, numSlots](const T& key) {
        return (scrambler(hashCode(key)) & 0x7FFFFFF) % numSlots;
    };
}

template <typename T>
HashFunction<T> HashFunction<T>::wrap(int numSlots,
                                      std::function<int (const T&)> hashFn) {
    if (numSlots <= 0) {
        error("HashFunction<T>::wrap(): numSlots must be positive.");
    }

    HashFunction result;
    result.callback = [hashFn, numSlots] (const T& key) {
        return (0x7FFFFFFF & hashFn(key)) % numSlots;
    };
    result.mNumSlots = numSlots;

    return result;
}

template <typename T> int HashFunction<T>::numSlots() const {
    return mNumSlots;
}

/* Default constructor sets up a hash function that always reports an error. */
template <typename T> HashFunction<T>::HashFunction() {
    //lambda表达式
    callback = [](const T&) -> int {
        error("Attempted to use an uninitialized HashFunction object.");
    };
    mNumSlots = 0;
}

/* Call operator forwards to the callback. */
template <typename T> int HashFunction<T>::operator()(const T& arg) const {
    return callback(arg);
}

#endif

实现线性探测表比较简单,可以借鉴思路

LinearProbingHashTable::LinearProbingHashTable(HashFunction<string> hashFn) {
    /* TODO: Delete this comment and the next line, then implement this function. */
    allocatedSize = hashFn.numSlots();
    elems=new Slot[allocatedSize];
    for(int i=0;i<allocatedSize;i++){
        elems[i].type =SlotType::EMPTY;
    }
    this->hashFn=hashFn;
}

LinearProbingHashTable::~LinearProbingHashTable() {
    /* TODO: Delete this comment, then implement this function. */
    delete []elems;
}

//const 关键字可以用于成员函数的声明,表示该函数不会修改对象的状态
bool LinearProbingHashTable::isEmpty() const{
    if (logicalSize==0) return true;
    return false;
}
int LinearProbingHashTable::size() const{
    return logicalSize;
}

bool LinearProbingHashTable::contains(const std::string& key) const{
    //根据散列值查找槽位,槽位为EMPTY返回false
    //遇到FILLED,检查是否相同,相同flag=1,break;不同继续勘察下一个槽位
    //遇到TOMBSTONE,不做任何操作,继续勘察下一个槽位
    int pos=hashFn(key);
    int flag=0;
    int count=0;//统计已经查找的元素个数
    while(count<allocatedSize){
        if (elems[pos].type==SlotType::FILLED){
            if(elems[pos].value==key){
                flag=1;
                break;
            }
        }
        else if(elems[pos].type==SlotType::EMPTY) {
            return false;
        }
        if(pos==allocatedSize-1)
        {
            count++;
            pos=0;
        }else{
            count++;
            pos++;
        }

    }
    if(flag==1)return true;
    return false;
}

bool LinearProbingHashTable::insert(const std::string& key){
    //将指定的元素插入此哈希表。如果元素已经存在,则保持表不变,并返回false表示存在,没有添加任何内容
    //如果表中没有可以插入元素的空间——即每个槽是满的-这应该返回false,表明没有更多的空间。
    //这个函数返回元素是否被插入到表中
    if(contains(key) || logicalSize == allocatedSize){
        return false;
    }
    int flag=0;
    int pos=hashFn(key);
    while(1){
        if(elems[pos].type==SlotType::EMPTY ||elems[pos].type==SlotType::TOMBSTONE ){
            elems[pos].value=key;
            elems[pos].type=SlotType::FILLED;
            logicalSize++;
            flag=1;
            break;
        }
        if(pos==allocatedSize-1){
            pos=0;
        }else{
            pos++;
        }
    }
    if (flag==1) return true;
    return false;
}


bool LinearProbingHashTable::remove(const string& elem) {
    if(!contains(elem)){
        return false;
    }
    int pos=hashFn(elem);

    while(1){

        if (elems[pos].value==elem){
            elems[pos].type=SlotType::TOMBSTONE;
            logicalSize--;
            return true;
        }
        if(pos==allocatedSize-1){
            pos=0;
        }else{
            pos++;
        }
    }

}

Part Five: Robin Hood Hashing 

这里光看文档的描述我没看懂。。但是看slidev就知道这个东西要怎么实现了,他把思路都直接给你了,在实现删除操作时记得一定要对边界值进行处理(遍历到数组尾部时)

RobinHoodHashTable::RobinHoodHashTable(HashFunction<string> hashFn) {
    /* TODO: Delete this comment, then implement this function. */
    allocatedSize=hashFn.numSlots();
    elems=new Slot[allocatedSize];
    for(int i=0;i<allocatedSize;i++){
        elems[i].distance=EMPTY_SLOT;
    }
    this->hashFn=hashFn;
}

RobinHoodHashTable::~RobinHoodHashTable() {
    delete[] elems;
}

int RobinHoodHashTable::size() const {

    return logicalSize;
}

bool RobinHoodHashTable::isEmpty() const {
    /* TODO: Delete this comment and the next line, then implement this function. */
    if(logicalSize==0)return true;
    return false;
}

bool RobinHoodHashTable::insert(const string& elem) {
    /* TODO: Delete this comment and the next lines, then implement this function. */
    if(contains(elem) || logicalSize == allocatedSize){
        return false;
    }
    int pos=hashFn(elem);
    int distance=0;
    string value=elem;
    while(1){
        if(elems[pos].distance==EMPTY_SLOT){
            elems[pos].value=value;
            elems[pos].distance=distance;
            logicalSize++;
            return true;
        }
        else if(elems[pos].distance<distance){
            string tempvalue=elems[pos].value;
            int tempdistance=elems[pos].distance;
            elems[pos].value=value;
            elems[pos].distance=distance;
            value=tempvalue;
            distance=tempdistance;
        }
        else{
            pos=(pos+1)%allocatedSize;
            distance++;
        }

    }
}

bool RobinHoodHashTable::contains(const string& elem) const {
    /* TODO: Delete this comment and the next lines, then implement this function. */
    int pos=hashFn(elem);
    int distance=0;
    while(1){
        if (elems[pos].distance==EMPTY_SLOT){
            return false;
        }
        else if(elems[pos].value==elem){
            return true;
        }
        else if(distance>elems[pos].distance){
            return false;
        }
        else{
            pos=(pos+1)%allocatedSize;
            distance++;
        }
    }
}

bool RobinHoodHashTable::remove(const string& elem) {
    /* TODO: Delete this comment and the next lines, then implement this function. */
    if(!contains(elem)){
        return false;
    }
    int pos=hashFn(elem);
    while(1){
        if(elems[pos].value==elem){
            elems[pos].distance=EMPTY_SLOT;
            pos=(pos+1)%allocatedSize;
            for (;elems[pos].distance!=0 && elems[pos].distance!=EMPTY_SLOT;pos=(pos+1)%allocatedSize){
                //注意当pos为0时的情况,pos-1为负值
                if(pos==0){
                    elems[allocatedSize-1].value=elems[pos].value;
                    elems[allocatedSize-1].distance=elems[pos].distance-1;
                    elems[pos].distance=EMPTY_SLOT;
                }else{
                    elems[pos-1].value=elems[pos].value;
                    elems[pos-1].distance=elems[pos].distance-1;
                    elems[pos].distance=EMPTY_SLOT;
                }
            }
            logicalSize--;
            return true;
        }
        else{
            pos=(pos+1)%allocatedSize;
        }
    }
}