OrderMinHash API
The OrderMinHash class provides functionality for sketching sequences and computing similarities or distances between them. This class is particularly suited for biological sequence analysis.
C++ Interface
-
OrderMinHash()
Constructs an OrderMinHash object with default parameters.
-
void buildSketch(char *seqNew)
Builds the OrderMinHash sketch. If seqNew is NULL, the sketch is rebuilt using the existing data. This is useful when parameters are changed, and a new sketch needs to be created.
- Parameters:
seqNew (char**): A pointer to the sequence for sketching. If NULL, rebuilds the sketch with existing data.
-
double similarity(OrderMinHash &omh2)
Computes the similarity between two OrderMinHash sketches. This is a proxy for edit distance and does not calculate Jaccard similarity.
- Parameters:
omh2 (OrderMinHash&): The other OrderMinHash object to compare.
- Returns:
double: The similarity score.
-
double distance(OrderMinHash &omh2)
Computes the distance between two OrderMinHash sketches. The distance is defined as (1.0 - text{similarity}).
- Parameters:
omh2 (OrderMinHash&): The other OrderMinHash object to compare.
- Returns:
double: The distance score.
-
void setK(int k)
Sets the k-mer size parameter. Default is 21.
- Parameters:
k (int): The k-mer size.
-
void setL(int l)
Sets the l parameter, typically between 2 and 5. Default is 2.
- Parameters:
l (int): The l parameter value.
-
void setM(int m)
Sets the m parameter. Default is 500.
- Parameters:
m (int): The m parameter value.
-
void setSeed(uint64_t seedNew)
Sets the seed value for the random generator. Default is 32.
- Parameters:
seedNew (uint64_t): The new seed value.
-
void setReverseComplement(bool isRC)
Specifies whether to deal with reverse complement sequences. Default is false.
- Parameters:
isRC (bool): true to consider reverse complements, false otherwise.
-
int getK()
Returns the k-mer size parameter.
- Returns:
int: The k-mer size.
-
int getL()
Returns the l parameter value.
- Returns:
int: The l parameter value.
-
int getM()
Returns the m parameter value.
- Returns:
int: The m parameter value.
-
uint64_t getSeed()
Returns the random generator seed value.
- Returns:
uint64_t: The seed value.
-
bool isReverseComplement()
Indicates whether reverse complement sequences are considered.
- Returns:
bool: true if reverse complements are considered, false otherwise.
—
Python Interface
- class OrderMinHash
Represents the Python interface for OrderMinHash.
- build_sketch(seqNew: Optional[str])
Builds the OrderMinHash sketch. If seqNew is None, rebuilds the sketch using existing data.
- Parameters:
seqNew (Optional[str]): The sequence to sketch. If None, rebuilds the sketch with existing data.
- similarity(omh2: OrderMinHash) float
Computes the similarity between two OrderMinHash sketches.
- Parameters:
omh2 (OrderMinHash): The other OrderMinHash object to compare.
- Returns:
float: The similarity score.
- distance(omh2: OrderMinHash) float
Computes the distance between two OrderMinHash sketches.
- Parameters:
omh2 (OrderMinHash): The other OrderMinHash object to compare.
- Returns:
float: The distance score.
- set_k(k: int)
Sets the k-mer size parameter. Default is 21.
- Parameters:
k (int): The k-mer size.
- set_l(l: int)
Sets the l parameter, typically between 2 and 5. Default is 2.
- Parameters:
l (int): The l parameter value.
- set_m(m: int)
Sets the m parameter. Default is 500.
- Parameters:
m (int): The m parameter value.
- set_seed(seed: int)
Sets the seed value for the random generator. Default is 32.
- Parameters:
seed (int): The new seed value.
- set_reverse_complement(isRC: bool)
Specifies whether to deal with reverse complement sequences. Default is False.
- Parameters:
isRC (bool): True to consider reverse complements, False otherwise.