oi~,让我告诉你如何实现哈希表

Ne0inhk

15 Mar 2026 — 19 min read

1.哈希概念

哈希（hash）又称散列，是一种组织数据的方式。从译名来看有散乱排列的意思，其本质是通过哈希函数将关键字key值与存储位置建立映射关系，查找时通过哈希函数计算出key值的位置，实现快速查找。

1.1直接定址法

当关键字key值比较集中时，直接定址法是最简单也是最有效的方法。比如一组数据集中在[ 0 - 99 ]，我们开一组大小为100的数组即可，数据直接作为下标来定位存储位置。再比如一组数据集中在[ a - z]，我们开一个大小为26的数组即可，让数据减去字符‘a'的结果来定位存储位置。
例题：387. 字符串中的第一个唯一字符 - 力扣（LeetCode）

class Solution { public: int firstUniqChar(string s) { int arr[26]={0}; //范围for 遍历 for(auto e: s) { arr[e-'a']++; } for (int i = 0; i < s.size(); i++) { if (arr[s[i] - 'a'] == 1) { return i; } } return -1; } };

1.2哈希冲突

直接定址法适用于数据比较集中的情况，当数据过于分散时这意味这我们将会开一个过大的数组来存储，而这会极大的浪费内存空间，不可取。

所以当面对这种情况的时候，我们会使用哈希函数:h(key)，通过哈希函数对关键字key值在存储空间的映射，来定位key值的存储位置。而这里存在一个问题：不同的值通过哈希函数的映射后的位置可能相同，这就会导致哈希冲突（哈希碰撞）。我们可以通过设计一个好的哈希函数来减少哈希冲突，但是在实际情况下，哈希冲突是不可避免的。

1.3负载因子

假设存储空间的大小为M，已经放入存储空间的数据个数是N，那么负载因子就是：N/M。负载因子也称载荷因子。负载因子越大，哈希冲突概率越大，空间利用率越高。负载因子越小，哈希冲突概率越小，空间利用率越低。

1.4哈希函数

1.4.1除法散列法/除留余数法

顾名思义就是通过取余，其余值就是映射的存储位置。假设存储空间的大小为M，关键字为key，哈希函数h(key)=key%M。

使用除法散列法时，对M有一个要求：要求M不能为2的幂、10的幂。假设M为 2^3，这相当于直接保留了2进制中的后3位，那么只要key值的后3位相同那么就一定冲突，例如：3，11，19，27......，它们二进制的后三位都是011，所以它们同时取余M都等于3。所以当M为2的幂时会导致很高的冲突概率，不可取。同理当M为10的幂时，假设为10^3，这相当于保留了10进制中的后3位，那么只要key值的后3位相同就一定冲突，例如：1001，2001，3001.....，所以也不可取。

综上所述，建议M的取值为不接近2的幂的值（素数）

但在实际操作中也有存在直接去2的幂，10的幂的情况。比如java中实现的哈希表就是直接取的2的幂。当然他不仅仅的取余这么简单，还有其他操作保证了较低的冲突概率。比如M为2^16，先取余得到2进制的后16位，再将key>>16，将取余的结果和右移的结果做异或运算。这样的操作将每个数都参与的运算，使其分布的更加的均匀。（了解，在后续的代码实现中我们还是以取素数为例）

1.4.2乘法散列法（了解）

使用乘法散列法对M没用要求，其大体思路为：关键值key乘上一个常量A(0<A<1)，然后在对其结果取余，得到小数部分。再将M乘上小数部分，再向下取整，得到最终结果。哈希函数为：h(key)= floor( M*( ( key*A ) %1.0) ) ,floor表示向下取整

A的取值范围为0~1，结论表明A的取值为：(根号5 - 1)/2 = 0.6180339887....（黄金分割点）比较好

1.5将关键字转化为整型

在哈希函数里面我们说到了，要通过哈希函数将key值映射到存储位置。不管是那种方法，我们可以看到key值都要参与具体的运算的。但是如果key不为整型时，这么办？

这里我们就需要将key值转化为整型，如果为负数、浮点数等等就强制类型转化为整型。如果为string类型，可以通过将每个字符的ascll码相加，来得到整型。其他的都类似，通过映射的值来转化为整型。

1.6解决哈希冲突

哈希表中使用的哈希函数主要还是除法散列法，但对于哈希冲突，不论选择什么哈希函数都不可避免。所以我们需要知道可以解决哈希冲突的方法，主要分为两种：开放地址法、链地址法。

1.6.1开放地址法

在开放地址法中，所有元素都存于哈希表中，当通过哈希函数映射key的存储位置发生冲突时，按照一定规则去找下一个空白位置进行存储。开放地址法的负载因子一定是小于1的。这里的规则一共有3种：线性探测、二次探测、双重探测（对于开放地址法，不论什么规则都是治标不治本，这里就主要讲讲线性探测，后面我会详细解释为什么说是治标不治本）

线性探测

发生冲突时，从映射位置开始向后遍历，找到第一个空白位置时就存储，当找到表尾就返回到表头继续向后找。

下面演示 {19,30,5,36,13,20,21,12} 等这一组值映射到M=11的表中

h(19) = 8，h(30) = 8，h(5) = 5，h(36) = 3，h(13) = 2，h(20) = 9，h(21) = 10，h(12) = 1

我们可以看到30与19发生冲突，30存储到了9这个位置。随后20与30也发生了冲突，20存储到了10这个位置。21又与20发生冲突，21返回到表头存储到了0这个位置。

这就是开发地址法的弊端，发生了冲突就去占别人的位置。当别人来找自己的位置时发现被占用了，又只能去占用其他人的位置。由此不断的恶性循环，这种现象叫做群集/堆积，会极大的影响插入和查找效率。

而二次探测法和双重探测法，都只是去减小群集效应，并不能真正的解决这个弊端。链地址法没有这个弊端，是相对而言更优的方法，也是我们实际上更常用的方法。

开放地址法的代码实现

哈希表的基本结构

//通过枚举哈希表的节点状态，以便于我们后面更好的管理 enum State { EXIST, EMPTY, DELETE }; template<class K, class V> struct HashData { pair<K, V> _kv; State _state = EMPTY; }; template<class K, class V> class HashTable { private: vector<HashData<K, V>> _tables; size_t _n = 0; // 表中存储数据个数 };

当我们删除节点时，为了不影响后续数据的查找，这里我们枚举哈希表节点的状态。如图，如果这里我们删除30这个值，后续查找时发现20这个值冲突，根据插入规则，我们会去往后面找，当找到空时查找停止，查找20就会失败。我们不用真的去删除30，只需将状态 EXIST修改为 DELETE，并不影响为 EMPTY时查找停止的条件

扩容

这里我们将负载因子控制到0.7左右，当负载因子超过0.7时我们就需要扩容。前面我们提到M的大小建议为素数，但扩容我们一般是扩2倍，扩2倍之后M就不再为素数了，这怎么办呢？这里c++库的解决方法是直接列出素数表

inline unsigned long __stl_next_prime(unsigned long n) { // Note: assumes long is at least 32 bits. static const int __stl_num_primes = 28; static const unsigned long __stl_prime_list[__stl_num_primes] = { 53, 97, 193, 389, 769, 1543, 3079, 6151, 12289, 24593, 49157, 98317, 196613, 393241, 786433, 1572869, 3145739, 6291469, 12582917, 25165843, 50331653, 100663319, 201326611, 402653189, 805306457, 1610612741, 3221225473, 4294967291 }; const unsigned long* first = __stl_prime_list; const unsigned long* last = __stl_prime_list + __stl_num_primes; const unsigned long* pos = lower_bound(first, last, n); return pos == last ? *(last - 1) : *pos; }

扩容后，我们需要重新遍历原表，重新计算映射位置，重新插入元数据。因为扩容后，M的大小改变了，通过哈希函数计算出的映射位置也改变了

如果key不能取余的问题

这里我们可以传一个仿函数来实现将key值转化为整型的功能，对于负数、浮点数这种可以直接转化为整型的类型，我们可以直接将其写成一个仿函数并作为缺省值传给模板，这样我们初始化的时候，就可以不用反复写仿函数了。

对于string类型，将每个字符的ascll码相加后返回，鉴于string类作为key就是很常用的情况，所以这里我们可以考虑特化。这样也避免就我们反复的传仿函数。

对于自定义类型，我们就只能自己单独写仿函数，并显示的传仿函数

template<class K> class HashFunc { public: size_t operator()(const K& key) { return (size_t)key; } }; //偏特化（可以单独写成一个仿函数，但是string作为key是常有的情况，为避免反复的传仿函数，写成偏特化） template<> class HashFunc<string> { public: size_t operator()(const string& key) { int ch = 0; for (auto& e : key) { ch += e; ch *= 131; //使用BKDR哈希的思路：每次结果乘131，避免“abcd” “dcba”这种不相同字符，转化为整型却相同的情况 } return ch; } }; template<class K,class V,class Hash = HashFunc<K>> //直接给出缺省值，避免常用的key值，反复传仿函数 class HashTable { public: private: vector<HashData<K, V>> _tables; size_t _n = 0; // 表中存储数据个数 };

完整代码实现

//HashTable.h #include<iostream> #include<vector> using namespace std; //使用开放地址法，实现HashTable //仅实现探测法 inline unsigned long __stl_next_prime(unsigned long n) { // Note: assumes long is at least 32 bits. static const int __stl_num_primes = 28; static const unsigned long __stl_prime_list[__stl_num_primes] = { 53, 97, 193, 389, 769, 1543, 3079, 6151, 12289, 24593, 49157, 98317, 196613, 393241, 786433, 1572869, 3145739, 6291469, 12582917, 25165843, 50331653, 100663319, 201326611, 402653189, 805306457, 1610612741, 3221225473, 4294967291 }; const unsigned long* first = __stl_prime_list; const unsigned long* last = __stl_prime_list + __stl_num_primes; const unsigned long* pos = lower_bound(first, last, n); return pos == last ? *(last - 1) : *pos; } //枚举状态 enum STATE { EMPTY, EXIST, DELETE }; template<class K,class V> struct HashDate { pair<K, V> _kv; STATE _state=EMPTY; }; template<class K> class HashFunc { public: size_t operator()(const K& key) { return (size_t)key; } }; //偏特化（可以单独写成一个仿函数，但是string作为key是常有的情况，为避免反复的传仿函数，写成偏特化） template<> class HashFunc<string> { public: size_t operator()(const string& key) { int ch = 0; for (auto& e : key) { ch += e; ch *= 131; //使用BKDR哈希的思路：每次结果乘131，避免“abcd” “dcba”这种不相同字符，转化为整型却相同的情况 } return ch; } }; template<class K,class V,class Hash = HashFunc<K>> //直接给出缺省值，避免常用的key值，反复传仿函数 class HashTable { public: HashTable() :_n(0) , _table(11) {} Hash hash; bool Insert(const pair<K, V>& kv) { //负载因子 控制到0.7左右 if (_n * 10 / _table.size() >= 7) { size_t size = __stl_next_prime(_table.size()); HashTable newHash; newHash._table.resize(size); for (auto& e : _table) { if(e._state==EXIST) newHash.Insert(e._kv); } _table.swap(newHash._table); } //找位置 size_t pos = hash(kv.first) % _table.size(); size_t i = 1; while (_table[pos]._state!= EMPTY)//探测 { pos = (pos + i) % _table.size(); //处理越界情况 i++; } //插入 _table[pos]._kv = kv; _table[pos]._state = EXIST; _n++; return true; } HashDate<K, V>* Find(const K& key) { size_t pos = hash(key) % _table.size(); size_t i = 1; while (_table[pos]._state != EMPTY) { if (_table[pos]._kv.first == key && _table[pos]._state != DELETE) { return &_table[pos]; } pos = (pos + i)% _table.size(); //处理越界情况 i++; } return nullptr; } //删除不是真正的删除，只需要将其状态修改为DELETE即可 bool Erase(const K& key) { HashDate<K, V>* ret = Find(key); if (ret) { ret->_state = DELETE; return true; } else return false; } private: //HashDate包含：pair、节点的状态 vector<HashDate<K, V>> _table; size_t _n;//记录元素个数 }; //test.cpp #include"HashTable.h" //处理key为string的仿函数，但作为常用的key类型，建议直接写成偏特化，避免反复传仿函数 struct StringHashFunc { size_t operator()(const string& key) { int ret = 0; for (auto& e : key) { //采用BKDR哈希思路 ret += e; ret *= 131; } return ret; } }; //日期类(自定义类型要支持 = 符号，用于Find函数) struct Date { bool operator==(Date& key) { return _year == key._year && _month == key._month && _day == key._day; } //重点！！！：这里给缺省值，是为了形成默认构造函数 //template<class K, class V> //struct HashDate //{ // pair<K, V> _kv; // STATE _state = EMPTY; //}; //在头文件的HashDate函数在初始化时，会调用自定义类型的默认构造！如果默认构造不存在就会报错 Date(const int year = 1, const int month = 1, const int day = 1) { _day = day; _month = month; _year = year; } int _day; int _month; int _year; }; //处理key为Date的仿函数 struct DateHashFunc { size_t operator()(const Date& key) { //采用BKDR哈希思路 int ret = 0; ret += key._day; ret *= 131; ret += key._month; ret *= 131; ret += key._year; ret *= 131; return ret; } }; int main() { //测试key为int的情况 HashTable<int,int> hash; for (int i = 0; i < 15; i++) { hash.Insert({ i,i }); } cout << hash.Find(10)<<endl; hash.Erase(10); cout << hash.Find(10)<<endl; //测试key为string的情况 HashTable<string, string,StringHashFunc> hash1; vector<string> arr = { "abc","asd","qwe","fgh" }; for (auto& e : arr) { hash1.Insert({ e,e }); } cout << HashFunc<string>()("abc") <<endl; cout << HashFunc<string>()("asd") << endl; cout << HashFunc<string>()("qwe") << endl; cout << HashFunc<string>()("fgh") << endl; //cout << StringHashFunc()("abc") << endl; //cout << StringHashFunc()("asd") << endl; //cout << StringHashFunc()("qwe") << endl; //cout << StringHashFunc()("fgh") << endl; //测试key为自定义的情况 HashTable<Date, Date, DateHashFunc> hash2; Date r1(2025, 4, 1); Date r2(2025, 1, 4); hash2.Insert({ r1, r1 }); hash2.Insert({ r2,r2 }); }

链地址法

链地址法中，哈希表中的每个节点不再存储某个具体的值，而是存储“链表”。当发生冲突时，不在去寻找，而是直接将其数据“挂”在当前冲突的位置

扩容

链地址法的负载因子我们控制到1左右，超过1就要扩容。而扩容的方法我们也使用上述的素数表。

扩容后，我们也需要将原数据重新映射到新表中。因为扩容后M的大小改变，通过哈希函数映射出的位置也改变了

链地址法代码实现

//HashTable.h #include<iostream> #include<vector> using namespace std; //实现链地址法 inline unsigned long __stl_next_prime(unsigned long n) { // Note: assumes long is at least 32 bits. static const int __stl_num_primes = 28; static const unsigned long __stl_prime_list[__stl_num_primes] = { 53, 97, 193, 389, 769, 1543, 3079, 6151, 12289, 24593, 49157, 98317, 196613, 393241, 786433, 1572869, 3145739, 6291469, 12582917, 25165843, 50331653, 100663319, 201326611, 402653189, 805306457, 1610612741, 3221225473, 4294967291 }; const unsigned long* first = __stl_prime_list; const unsigned long* last = __stl_prime_list + __stl_num_primes; const unsigned long* pos = lower_bound(first, last, n); return pos == last ? *(last - 1) : *pos; } template<class K,class V> struct Node { Node(const pair<K,V>& kv) :_kv(kv) ,next(nullptr) {} Node() {} pair<K, V> _kv; Node* next = nullptr; }; //仿函数（将key值转化为正整数） template<class K> struct HashFunc { size_t operator()(const K& key) { return size(key); } }; //偏特化(处理常用key值类型：string) template<> struct HashFunc<string> { size_t operator()(const string& key) { int ret = 0; for (auto& e : key) { //BKDR哈希的思路 ret += e; ret *= 131; } return ret; } }; template<class K,class V,class Hash = HashFunc<K>> class HashTable { public: //typedef Node<K> Node; using Node = Node<K,V>; HashTable() :_table(__stl_next_prime(0)) ,_n(0) {} ~HashTable() { for (int i = 0; i < _table.size(); i++) { Node* cur = _table[i].next; while (cur) { Node* next = cur->next; delete cur; cur = next; } } } Hash hash; bool Insert(const pair<K, V>& kv) { //不允许键值冗余 if (Find(kv.first)) { return false; } //扩容(链地址法的负载因子控制在1左右) //更好的扩容方法（将原节点移下来，避免不断的开空间） if (_n / _table.size() >= 1) { vector<Node> newtable(__stl_next_prime(_table.size() + 1)); for (int i = 0; i < _table.size(); i++) { Node* cur = _table[i].next; while (cur) { Node* next = cur->next; //映射新表位置 size_t pos = hash(cur->_kv.first) % newtable.size(); //头插到哈希 cur->next = newtable[pos].next; newtable[pos].next = cur; cur = next; } } _table.swap(newtable); } //有点拉 //复用Insert函数，再交换,我们不断的创建Node节点，需要显示的写出析构函数，避免内存泄露 if (_n / _table.size() >= 1) { HashTable<K, V,Hash> newHash; newHash._table.resize(__stl_next_prime(_table.size() + 1)); for (int i = 0; i < _table.size(); i++) { Node* cur = _table[i].next; while (cur) { //复用 newHash.Insert(cur->_kv); cur = cur->next; } } _table.swap(newHash._table); } //映射位置 size_t hash0 = hash(kv.first) % _table.size(); Node* cur = _table[hash0].next; Node* newNode = new Node(kv); //需要显示写出构造函数 //头插 _table[hash0].next = newNode; newNode->next = cur; //统计元素个数 _n++; return true; } Node* Find(const K& key) { size_t pos = hash(key) % _table.size(); Node* cur = _table[pos].next; while (cur) { if (cur->_kv.first == key) return cur; cur = cur->next; } return nullptr; } bool Erase(const K& key) { //映射位置 size_t pos = hash(key) % _table.size(); Node* cur = _table[pos].next; Node* per = &_table[pos]; //记录cur的前节点 while (cur) { if (cur->_kv.first == key) { per->next = cur->next; delete cur; return true; } per = cur; cur = cur->next; } return false; } private: vector<Node> _table; int _n = 0; //记录插入元素个数 }; //test.cpp #include"HashTable.h" struct Date { bool operator==(const Date& key) { return _day == key._day && _month == key._month && _year == key._year; } Date(const int year = 2025,const int month = 4,const int day = 2) :_day(day) ,_month(month) ,_year(year) {} int _day; int _month;; int _year; }; struct DateFunc { size_t operator()(const Date& key) { int ret = 0; ret += key._day; ret *= 131; ret += key._month; ret *= 131; ret += key._year; ret *= 131; return ret; } }; int main() { //测试key值为int //HashTable<int, int> Hash0; //Hash0.Insert({ 1,1 }); //Hash0.Insert({ 2,2 }); //Hash0.Insert({ 3,3 }); //Hash0.Insert({ 4,4 }); //cout << Hash0.Find(3) << endl; //cout << Hash0.Erase(3) << endl; //cout << Hash0.Find(3) << endl; //cout << Hash0.Erase(10) << endl; //测试key值为string //HashTable<string, string> Hash0; //Hash0.Insert({ "abc","abc"}); //Hash0.Insert({ "huang","huang"}); //Hash0.Insert({ "yu","yu"}); //Hash0.Insert({ "chi","chi"}); //cout << Hash0.Find("huang") << endl; //cout << Hash0.Erase("huang") << endl; //cout << Hash0.Find("huang") << endl; //cout << Hash0.Erase("asdf") << endl; //测试key值为自定义类型 HashTable<Date, Date, DateFunc> Hash0; Date r1({ 2025,4,2 }); Date r2({ 2025,2,4 }); Hash0.Insert({ r1,r1 }); Hash0.Insert({ r2,r2}); cout << Hash0.Find(r2) << endl; cout << Hash0.Erase(r2) << endl; cout << Hash0.Find(r2) << endl; /*cout << Hash0.Erase({ 2000,1,1 }) << endl;*/ }

以上就是本文的全部内容，如有错误还请大佬斧正。
点个赞再走吧~