Loading W Code...
Master the art of O(1) lookups - Hash functions, collisions, and interview problems.
11
Topics
60
Minutes
C++
Language
| Operation | Average | Worst Case | When Best? |
|---|---|---|---|
| Search | O(1) | O(n) | Low load factor |
| Insert | O(1) | O(n) | Good hash function |
| Delete | O(1) | O(n) | Few collisions |
| Space | O(n) | ||
Real-life example (Indian context): Think of your Aadhar number – a unique 12-digit ID. When you go to a government office, they don't search through millions of records one by one. Instead, they use your Aadhar to directly compute where your record is stored. That's hashing!
Key Points:
- Hash table stores (key, value) pairs
- Hash function converts key → array index
- Enables O(1) average time for insert, search, delete
- Much faster than linear search O(n) or binary search O(log n)
When to use Hashing:
- ✅ Need fast lookups by key
- ✅ Counting frequency of elements
- ✅ Checking if something exists
- ✅ Two Sum type problems
#include <iostream>
#include <unordered_map>
using namespace std;
int main() {
// Creating a hash table (unordered_map in C++)
unordered_map<string, int> studentMarks;
// Insert: O(1) average
studentMarks["Rahul"] = 85;
studentMarks["Priya"] = 92;
studentMarks["Amit"] = 78;
// Lookup: O(1) average
cout << "Priya's marks: " << studentMarks["Priya"] << endl; // 92
// Check if key exists
if (studentMarks.count("Rahul")) {
cout << "Rahul found!" << endl;
}
// Delete: O(1) average
studentMarks.erase("Amit");
return 0;
}h(key) = key % 7Division Method:
h(key) = key % m
where m is the table size (preferably a prime number).
Example:
- Table size m = 7
- Key = 25
- Hash = 25 % 7 = 4 (store at index 4)
Properties of a Good Hash Function:
- Fast - Should compute quickly
- Deterministic - Same key always gives same hash
- Uniform - Distributes keys evenly across table
- Minimizes collisions - Different keys should give different hashes
#include <iostream>
using namespace std;
class SimpleHashTable {
private:
int tableSize;
public:
SimpleHashTable(int size) : tableSize(size) {}
// Division method hash function
int hash(int key) {
return key % tableSize;
}
// For string keys - sum of ASCII values
int hashString(string key) {
int sum = 0;
for (char c : key) {
sum += c; // Add ASCII value
}
return sum % tableSize;
}
};Method 1: Separate Chaining
Each table slot holds a Linked List. Colliding keys are simply added to the list at that slot.
How it works:
1. Insert: Hash the key. Add it to the list at that index.
2. Search: Hash the key. Traverse the linked list at that index to find it.
3. Delete: Hash the key. Find and remove it from the linked list.
Pros & Cons:
- ✅ Simple to implement
- ✅ Table never "fills up"
- ❌ Extra memory for pointers
- ❌ Worst case O(n) if all keys hash to same index (long chain)
#include <iostream>
#include <vector>
#include <list>
using namespace std;
class HashTableChaining {
private:
int size;
vector<list<int>> table;
int hash(int key) { return key % size; }
public:
HashTableChaining(int s) : size(s), table(s) {}
void insert(int key) {
int idx = hash(key);
table[idx].push_back(key);
}
};Types of Probing:
1. Linear Probing:
If slot h(k) is full, try h(k)+1, h(k)+2, h(k)+3...
2. Quadratic Probing:
Try h(k)+1², h(k)+2², h(k)+3²...
3. Double Hashing:
Use second hash: h(k) + i * h₂(k)
Visual (Linear Probing):
Insert 10: hash=3 → slot 3 ✓
Insert 17: hash=3 → slot 3 occupied → try 4 ✓
Insert 24: hash=3 → slot 3,4 occupied → try 5 ✓
Pros:
- ✅ Better cache performance (contiguous memory)
- ✅ No extra pointers needed
Cons:
- ❌ Table can fill up completely
- ❌ Clustering - keys bunch together, slowing search
- ❌ Deletion is tricky (need "deleted" markers)
#include <iostream>
#include <vector>
using namespace std;
class HashTableOpenAddressing {
private:
int size;
vector<int> table;
vector<bool> occupied;
int hash(int key) {
return key % size;
}
public:
HashTableOpenAddressing(int s) : size(s), table(s, -1), occupied(s, false) {}
// Insert with linear probing
void insert(int key) {
int index = hash(key);
int originalIndex = index;
// Linear probing
while (occupied[index]) {
cout << "Collision at " << index << ", probing..." << endl;
index = (index + 1) % size; // Move to next slot
if (index == originalIndex) {
cout << "Table is full!" << endl;
return;
}
}
table[index] = key;
occupied[index] = true;
cout << "Inserted " << key << " at index " << index << endl;
}
// Search with linear probing
int search(int key) {
int index = hash(key);
int originalIndex = index;
while (occupied[index]) {
if (table[index] == key) {
return index; // Found!
}
index = (index + 1) % size;
if (index == originalIndex) break;
}
return -1; // Not found
}
void display() {
cout << "Hash Table: ";
for (int i = 0; i < size; i++) {
if (occupied[i]) {
cout << "[" << i << "]:" << table[i] << " ";
} else {
cout << "[" << i << "]:_ ";
}
}
cout << endl;
}
};
int main() {
HashTableOpenAddressing ht(7);
ht.insert(10); // 10 % 7 = 3
ht.insert(17); // 17 % 7 = 3 → collision → goes to 4
ht.insert(24); // 24 % 7 = 3 → collision → goes to 5
ht.insert(31); // 31 % 7 = 3 → collision → goes to 6
cout << "\n";
ht.display();
cout << "\nSearch 17: index " << ht.search(17) << endl; // 4
return 0;
}``
α = (number of elements) / (table size)
`
Why it matters:
- Low α (< 0.5): Lots of empty space, very fast O(1)
- High α (> 0.7): More collisions, slower performance
- α = 1: Table is completely full!
Threshold Rule:
Keep α ≤ 0.7 (or 0.75) for good performance.
Rehashing:
When α exceeds threshold:
1. Create a new table (usually 2x the size)
2. Re-hash ALL existing elements into new table
3. Delete old table
Example:
`
Table size: 7, Elements: 5
α = 5/7 = 0.71 > 0.7 → TRIGGER REHASH!
New size: 14 (or next prime like 17)
Re-insert all 5 elements with new hash
``
Cost of Rehashing:
- Single rehash is O(n)
- But done rarely, so amortized O(1) per insert
#include <iostream>
#include <vector>
#include <list>
using namespace std;
class HashTableWithRehash {
private:
int size;
int count;
double maxLoadFactor;
vector<list<int>> table;
int hash(int key) {
return key % size;
}
// Rehash to a bigger table
void rehash() {
cout << "\n⚡ REHASHING: Load factor exceeded!" << endl;
cout << "Old size: " << size << ", Elements: " << count << endl;
int oldSize = size;
vector<list<int>> oldTable = table;
// Double the size (or use next prime)
size = size * 2;
table = vector<list<int>>(size);
count = 0;
// Re-insert all elements
for (int i = 0; i < oldSize; i++) {
for (int key : oldTable[i]) {
insert(key); // Re-hash into new table
}
}
cout << "New size: " << size << endl << endl;
}
public:
HashTableWithRehash(int s = 7, double mlf = 0.7)
: size(s), count(0), maxLoadFactor(mlf), table(s) {}
void insert(int key) {
int index = hash(key);
table[index].push_back(key);
count++;
// Check load factor
double loadFactor = (double)count / size;
cout << "Inserted " << key << " | Load Factor: " << loadFactor << endl;
if (loadFactor > maxLoadFactor) {
rehash();
}
}
double getLoadFactor() {
return (double)count / size;
}
};
int main() {
HashTableWithRehash ht(7, 0.7); // Size 7, max load 0.7
// Insert elements until rehash triggers
ht.insert(10);
ht.insert(20);
ht.insert(30);
ht.insert(40);
ht.insert(50); // This might trigger rehash!
ht.insert(60);
return 0;
}unordered_map
- Stores key → value pairs
- Each key is unique
- Use when you need to associate data with keys
unordered_set
- Stores only unique keys
- No associated value
- Use when you just need to check existence
When to use which?
- Counting frequency → unordered_map
- Two Sum → unordered_map (need index)
- Check if seen before → unordered_set
- Remove duplicates → unordered_set
| Feature | unordered_map | unordered_set |
|---|---|---|
| Stores | Key-Value pairs | Only Keys |
| Duplicates | No duplicate keys | No duplicates |
| Access | map[key] | — |
| Insert | map[k] = v | set.insert(k) |
| Check | map.count(k) | set.count(k) |
| Use Case | Frequency, Lookup | Existence Check |
#include <iostream>
#include <unordered_map>
#include <unordered_set>
#include <vector>
using namespace std;
int main() {
// ========== UNORDERED_MAP ==========
cout << "=== unordered_map (HashMap) ===" << endl;
unordered_map<string, int> freq;
// Insert / Update
freq["apple"] = 5;
freq["banana"] = 3;
freq["apple"]++; // Now apple = 6
// Access
cout << "apple count: " << freq["apple"] << endl; // 6
// Check existence (SAFE way)
if (freq.count("mango") == 0) {
cout << "mango not found" << endl;
}
// Iterate
cout << "All items: ";
for (auto& p : freq) {
cout << p.first << ":" << p.second << " ";
}
cout << endl << endl;
// ========== UNORDERED_SET ==========
cout << "=== unordered_set (HashSet) ===" << endl;
unordered_set<int> seen;
// Insert
seen.insert(10);
seen.insert(20);
seen.insert(10); // Duplicate - ignored!
cout << "Size: " << seen.size() << endl; // 2 (not 3!)
// Check existence
if (seen.count(20)) {
cout << "20 exists!" << endl;
}
// Remove duplicates from array using set
vector<int> arr = {1, 2, 2, 3, 3, 3, 4};
unordered_set<int> unique(arr.begin(), arr.end());
cout << "Unique elements: ";
for (int x : unique) cout << x << " ";
cout << endl;
return 0;
}Example:
``
Input: arr = [2, 7, 11, 15], target = 9
Output: [0, 1] (because arr[0] + arr[1] = 2 + 7 = 9)
``
Brute Force: Check all pairs → O(n²) ❌
Hashing Approach: For each number x, check if (target - x) exists in map → O(n) ✅
Algorithm:
1. Create empty hash map
2. For each element x at index i:
- Calculate complement = target - x
- If complement exists in map → Found!
- Else, add x to map with its index
3. Return indices or -1 if not found
#include <iostream>
#include <vector>
#include <unordered_map>
using namespace std;
// Returns indices of two numbers that add up to target
vector<int> twoSum(vector<int>& nums, int target) {
unordered_map<int, int> seen; // value -> index
for (int i = 0; i < nums.size(); i++) {
int complement = target - nums[i];
// Check if complement was seen before
if (seen.count(complement)) {
return {seen[complement], i}; // Found!
}
// Store current number with its index
seen[nums[i]] = i;
}
return {}; // No solution found
}
// Variation: Just check if pair exists (no indices needed)
bool hasTwoSum(vector<int>& nums, int target) {
unordered_set<int> seen;
for (int x : nums) {
if (seen.count(target - x)) {
return true;
}
seen.insert(x);
}
return false;
}
int main() {
vector<int> arr = {2, 7, 11, 15};
int target = 9;
vector<int> result = twoSum(arr, target);
if (!result.empty()) {
cout << "Indices: [" << result[0] << ", " << result[1] << "]" << endl;
cout << "Values: " << arr[result[0]] << " + " << arr[result[1]]
<< " = " << target << endl;
} else {
cout << "No solution found" << endl;
}
return 0;
}
/*
Output:
Indices: [0, 1]
Values: 2 + 7 = 9
*/Example:
``
Input: "aabbccd"
Output: 'd' (first char with frequency 1)
Input: "aabbcc"
Output: '$' or -1 (no non-repeating char)
``
Approach (Two Pass):
1. First pass: Count frequency of each character using hash map
2. Second pass: Find first character with count == 1
#include <iostream>
#include <unordered_map>
#include <string>
using namespace std;
char firstNonRepeating(const string& s) {
// Step 1: Count frequency of each character
unordered_map<char, int> freq;
for (char c : s) {
freq[c]++;
}
// Step 2: Find first character with frequency 1
for (char c : s) {
if (freq[c] == 1) {
return c;
}
}
return '$'; // No non-repeating character
}
// Optimized: Using array instead of map (for lowercase only)
char firstNonRepeatingOptimized(const string& s) {
int freq[26] = {0}; // For 'a' to 'z'
// Count frequencies
for (char c : s) {
freq[c - 'a']++;
}
// Find first with count 1
for (char c : s) {
if (freq[c - 'a'] == 1) {
return c;
}
}
return '$';
}
int main() {
cout << firstNonRepeating("aabbccd") << endl; // d
cout << firstNonRepeating("leetcode") << endl; // l
cout << firstNonRepeating("aabbcc") << endl; // $
cout << firstNonRepeating("z") << endl; // z
return 0;
}Example:
``
Input: arr = [1, 2, 3, -3, 1, 1, 1, 4, 2, -3], K = 3
Output: 7 subarrays sum to 3
``
Key Insight: Use Prefix Sum + HashMap
If prefixSum[i] - prefixSum[j] = K, then sum of elements from j+1 to i equals K!
Algorithm:
1. Keep running prefix sum
2. Store count of each prefix sum in map
3. At each index, check if (currentSum - K) was seen before
4. Initialize map with {0: 1} to handle subarrays starting at index 0
#include <iostream>
#include <vector>
#include <unordered_map>
using namespace std;
int subarraySum(vector<int>& nums, int k) {
unordered_map<int, int> prefixCount;
prefixCount[0] = 1; // Important! Empty prefix has sum 0
int currentSum = 0;
int count = 0;
for (int num : nums) {
currentSum += num; // Running prefix sum
// Check if (currentSum - k) was seen before
// If yes, those many subarrays end here with sum k
int need = currentSum - k;
if (prefixCount.count(need)) {
count += prefixCount[need];
}
// Add current prefix sum to map
prefixCount[currentSum]++;
}
return count;
}
// Returns true if ANY subarray sums to k (simpler version)
bool hasSubarraySum(vector<int>& nums, int k) {
unordered_set<int> prefixSums;
prefixSums.insert(0); // Empty prefix
int currentSum = 0;
for (int num : nums) {
currentSum += num;
if (prefixSums.count(currentSum - k)) {
return true;
}
prefixSums.insert(currentSum);
}
return false;
}
int main() {
vector<int> arr1 = {1, 1, 1};
cout << "Subarrays summing to 2: " << subarraySum(arr1, 2) << endl; // 2
vector<int> arr2 = {1, 2, 3, -3, 1, 1, 1, 4, 2, -3};
cout << "Subarrays summing to 3: " << subarraySum(arr2, 3) << endl; // 7
return 0;
}Example:
``
Input: ["eat", "tea", "tan", "ate", "nat", "bat"]
Output: [["eat","tea","ate"], ["tan","nat"], ["bat"]]
``
Key Insight: All anagrams have the same sorted string!
- "eat" → "aet"
- "tea" → "aet"
- "ate" → "aet"
Approach:
1. For each string, sort it to create a "key"
2. Use this key in a hash map
3. All strings with same key are anagrams
#include <iostream>
#include <vector>
#include <string>
#include <unordered_map>
#include <algorithm>
using namespace std;
vector<vector<string>> groupAnagrams(vector<string>& strs) {
unordered_map<string, vector<string>> groups;
for (string& s : strs) {
// Create key by sorting the string
string key = s;
sort(key.begin(), key.end());
// Add to the group with this key
groups[key].push_back(s);
}
// Collect all groups
vector<vector<string>> result;
for (auto& p : groups) {
result.push_back(p.second);
}
return result;
}
// Faster approach: Use character count as key (O(n*k) vs O(n*k*log(k)))
vector<vector<string>> groupAnagramsFast(vector<string>& strs) {
unordered_map<string, vector<string>> groups;
for (string& s : strs) {
// Create key from character counts
int count[26] = {0};
for (char c : s) {
count[c - 'a']++;
}
// Convert counts to string key
string key;
for (int i = 0; i < 26; i++) {
key += "#" + to_string(count[i]);
}
groups[key].push_back(s);
}
vector<vector<string>> result;
for (auto& p : groups) {
result.push_back(p.second);
}
return result;
}
int main() {
vector<string> words = {"eat", "tea", "tan", "ate", "nat", "bat"};
auto result = groupAnagrams(words);
cout << "Grouped Anagrams:" << endl;
for (auto& group : result) {
cout << "[ ";
for (string& s : group) {
cout << s << " ";
}
cout << "]" << endl;
}
return 0;
}
/*
Output:
[ bat ]
[ tan nat ]
[ eat tea ate ]
*/1. Poor Hash Selection
- Using non-prime table size
- Example: m = 10 with keys ending in 0 → all collide!
2. Ignoring Load Factor
- Not resizing leads to slow performance
- Keep α ≤ 0.7
3. Wrong Modulo in C++
``cpp
// WRONG: -3 % 7 = -3 in C++!
int bad = key % m;
// CORRECT: Always positive
int good = ((key % m) + m) % m;
`
4. Mixing map and unordered_map
- map is BST (ordered, O(log n))
- unordered_map is hash (unordered, O(1))
5. Not Handling Key Absence
`cpp
// DANGEROUS: Creates entry if key doesn't exist!
int val = myMap[key];
// SAFE: Check first
if (myMap.count(key)) {
int val = myMap[key];
}
``
6. Two Sum Mistake
- Checking complement AFTER inserting current element
- Can match an element with itself!
7. Assuming Order
- Elements in unordered containers have NO guaranteed order
- Don't rely on iteration order!
#include <iostream>
#include <unordered_map>
using namespace std;
int main() {
// Mistake 1: Using [] on non-existent key creates it!
unordered_map<string, int> mp;
// BAD: Creates "ghost" entry with value 0
if (mp["ghost"] == 0) {
cout << "This creates the key!" << endl;
}
cout << "Size after bad check: " << mp.size() << endl; // 1!
mp.clear();
// GOOD: Use count() or find()
if (mp.count("ghost") == 0) {
cout << "Key doesn't exist (no side effect)" << endl;
}
cout << "Size after good check: " << mp.size() << endl; // 0
// Mistake 2: Negative modulo
int key = -15;
int m = 7;
cout << "Bad mod: " << key % m << endl; // -1 (wrong!)
cout << "Good mod: " << ((key % m) + m) % m << endl; // 6 (correct!)
// Interview Tip: Always clarify
// - Can there be duplicates?
// - What to return if not found?
// - Range of input values?
return 0;
}Hash Table: Maps keys to values via hash function → O(1) average
Hash Function: h(k) = k % m (m should be prime)
Collision: Resolve with chaining or open addressing
Load Factor: α = n/m, keep ≤ 0.7, rehash when exceeded
unordered_map: Key→Value pairs, use for frequency counting
unordered_set: Only keys, use for existence checking
Two Sum Pattern: Check if (target - x) seen before
Subarray Sum: Prefix sum + hash map