HyperLogLog API
The HyperLogLog class provides functionality for cardinality estimation and similarity measurements using HyperLogLog sketches. This class supports operations such as updating, merging, and comparing sketches.
C++ Interface
-
HyperLogLog(int np)
Constructs a HyperLogLog object with the specified precision.
- Parameters:
np (int): The number of bits used for precision. (2^{np}) is the size of the sketch.
-
void update(char *seq)
Updates the HyperLogLog sketch with a given sequence.
- Parameters:
seq (char**): A pointer to the sequence for updating the sketch.
-
HyperLogLog merge(const HyperLogLog &other) const
Merges the current sketch with another HyperLogLog sketch.
- Parameters:
other (const HyperLogLog&): The other HyperLogLog object to merge with.
- Returns:
HyperLogLog: A new HyperLogLog object resulting from the merge.
-
void printSketch()
Prints the content of the HyperLogLog sketch for debugging purposes.
-
double distance(const HyperLogLog &h2) const
Computes the distance between two HyperLogLog sketches. The distance is defined as (1.0 - text{Jaccard index}).
- Parameters:
h2 (const HyperLogLog&): The other HyperLogLog object to compare.
- Returns:
double: The distance score.
-
double jaccard_index(const HyperLogLog &h2) const
Computes the Jaccard index between two HyperLogLog sketches.
- Parameters:
h2 (const HyperLogLog&): The other HyperLogLog object to compare.
- Returns:
double: The Jaccard index.
—
Python Interface
- class HyperLogLog(np: int = 10)
Represents the Python interface for the HyperLogLog sketch.
- Parameters:
np (int): The number of bits used for precision. Default is 10, resulting in a sketch size of (2^{10}).
- update(seq: str)
Updates the HyperLogLog sketch with a given sequence.
- Parameters:
seq (str): The sequence to update the sketch.
- merge(other: HyperLogLog) HyperLogLog
Merges the current sketch with another HyperLogLog sketch.
- Parameters:
other (HyperLogLog): The other HyperLogLog object to merge with.
- Returns:
HyperLogLog: A new HyperLogLog object resulting from the merge.
- jaccard(other: HyperLogLog) float
Computes the Jaccard index between two HyperLogLog sketches.
- Parameters:
other (HyperLogLog): The other HyperLogLog object to compare.
- Returns:
float: The Jaccard index.
- distance(other: HyperLogLog) float
Computes the distance between two HyperLogLog sketches. The distance is defined as (1.0 - text{Jaccard index}).
- Parameters:
other (HyperLogLog): The other HyperLogLog object to compare.
- Returns:
float: The distance score.