OrderMinHash API

The OrderMinHash class provides functionality for sketching sequences and computing similarities or distances between them. This class is particularly suited for biological sequence analysis.

C++ Interface

OrderMinHash()

Constructs an OrderMinHash object with default parameters.

void buildSketch(char *seqNew)

Builds the OrderMinHash sketch. If seqNew is NULL, the sketch is rebuilt using the existing data. This is useful when parameters are changed, and a new sketch needs to be created.

Parameters:
  • seqNew (char**): A pointer to the sequence for sketching. If NULL, rebuilds the sketch with existing data.

double similarity(OrderMinHash &omh2)

Computes the similarity between two OrderMinHash sketches. This is a proxy for edit distance and does not calculate Jaccard similarity.

Parameters:
  • omh2 (OrderMinHash&): The other OrderMinHash object to compare.

Returns:
  • double: The similarity score.

double distance(OrderMinHash &omh2)

Computes the distance between two OrderMinHash sketches. The distance is defined as (1.0 - text{similarity}).

Parameters:
  • omh2 (OrderMinHash&): The other OrderMinHash object to compare.

Returns:
  • double: The distance score.

void setK(int k)

Sets the k-mer size parameter. Default is 21.

Parameters:
  • k (int): The k-mer size.

void setL(int l)

Sets the l parameter, typically between 2 and 5. Default is 2.

Parameters:
  • l (int): The l parameter value.

void setM(int m)

Sets the m parameter. Default is 500.

Parameters:
  • m (int): The m parameter value.

void setSeed(uint64_t seedNew)

Sets the seed value for the random generator. Default is 32.

Parameters:
  • seedNew (uint64_t): The new seed value.

void setReverseComplement(bool isRC)

Specifies whether to deal with reverse complement sequences. Default is false.

Parameters:
  • isRC (bool): true to consider reverse complements, false otherwise.

int getK()

Returns the k-mer size parameter.

Returns:
  • int: The k-mer size.

int getL()

Returns the l parameter value.

Returns:
  • int: The l parameter value.

int getM()

Returns the m parameter value.

Returns:
  • int: The m parameter value.

uint64_t getSeed()

Returns the random generator seed value.

Returns:
  • uint64_t: The seed value.

bool isReverseComplement()

Indicates whether reverse complement sequences are considered.

Returns:
  • bool: true if reverse complements are considered, false otherwise.

Python Interface

class OrderMinHash

Represents the Python interface for OrderMinHash.

build_sketch(seqNew: Optional[str])

Builds the OrderMinHash sketch. If seqNew is None, rebuilds the sketch using existing data.

Parameters:
  • seqNew (Optional[str]): The sequence to sketch. If None, rebuilds the sketch with existing data.

similarity(omh2: OrderMinHash) float

Computes the similarity between two OrderMinHash sketches.

Parameters:
  • omh2 (OrderMinHash): The other OrderMinHash object to compare.

Returns:
  • float: The similarity score.

distance(omh2: OrderMinHash) float

Computes the distance between two OrderMinHash sketches.

Parameters:
  • omh2 (OrderMinHash): The other OrderMinHash object to compare.

Returns:
  • float: The distance score.

set_k(k: int)

Sets the k-mer size parameter. Default is 21.

Parameters:
  • k (int): The k-mer size.

set_l(l: int)

Sets the l parameter, typically between 2 and 5. Default is 2.

Parameters:
  • l (int): The l parameter value.

set_m(m: int)

Sets the m parameter. Default is 500.

Parameters:
  • m (int): The m parameter value.

set_seed(seed: int)

Sets the seed value for the random generator. Default is 32.

Parameters:
  • seed (int): The new seed value.

set_reverse_complement(isRC: bool)

Specifies whether to deal with reverse complement sequences. Default is False.

Parameters:
  • isRC (bool): True to consider reverse complements, False otherwise.