Explanation
- A **Bloom Filter** is a space-efficient probabilistic data structure used to test whether an element is a member of a set. It can produce false positives but never false negatives.
-
Steps
- Hash the element using multiple hash functions.
- Set bits at the positions returned by the hash functions.
- Query: Check if all bits corresponding to the hash values are set. If yes, the element is possibly in the set; otherwise, it’s definitely not.
Time Complexity
- O(k) for insertions and queries, where
kis the number of hash functions.
from bitarray import bitarray
import hashlib
class BloomFilter:
def __init__(self, size, num_hashes):
self.size = size
self.num_hashes = num_hashes
self.bit_array = bitarray(size)
self.bit_array.setall(0)
def _hash(self, item, i):
return int(hashlib.md5(f"{item}{i}".encode()).hexdigest(), 16) % self.size
def insert(self, item):
for i in range(self.num_hashes):
idx = self._hash(item, i)
self.bit_array[idx] = 1
def query(self, item):
return all(self.bit_array[self._hash(item, i)] for i in range(self.num_hashes))
# Example usage
bf = BloomFilter(1000, 3)
bf.insert("apple")
print(bf.query("apple")) # True
print(bf.query("banana")) # False