Kssd API
The Kssd class provides functionalities for processing and analyzing k-mer based sketches, with methods for updating, storing, comparing, and managing sketches.
C++ Interface
-
kssd_parameter_t(int half_k_, int half_subk_, int drlevel_, string shuffle_file)
Constructs a kssd_parameter_t object with the specified parameters.
- Parameters:
half_k_ (int): Half of the k-mer size used for hashing.
half_subk_ (int): Half of the sub k-mer size for dimensionality reduction.
drlevel_ (int): Dimensionality reduction level.
shuffle_file (string): Path to the file containing the shuffled dimension map.
-
Kssd(kssd_parameter_t params)
Constructs a Kssd object with the specified parameters.
- Parameters:
params (kssd_parameter_t): A parameter object containing configuration for Kssd.
-
void update(const char *seq)
Updates the Kssd object with the provided sequence.
- Parameters:
seq (const char*): A pointer to the sequence to be processed.
-
vector<uint32_t> storeHashes()
Retrieves the stored 32-bit hashes from the Kssd object.
- Returns:
vector<uint32_t>: A vector of 32-bit hashes.
-
vector<uint64_t> storeHashes64()
Retrieves the stored 64-bit hashes from the Kssd object.
- Returns:
vector<uint64_t>: A vector of 64-bit hashes.
-
void loadHashes(std::vector<uint32_t> hashArr)
Loads a list of 32-bit hashes into the Kssd object.
- Parameters:
hashArr (std::vector<uint32_t>): A vector of 32-bit hashes to load.
-
void loadHashes64(std::vector<uint64_t> hashArr)
Loads a list of 64-bit hashes into the Kssd object.
- Parameters:
hashArr (std::vector<uint64_t>): A vector of 64-bit hashes to load.
-
double jaccard(Kssd *kssd)
Computes the Jaccard similarity between this Kssd object and another.
- Parameters:
kssd (Kssd**): A pointer to another Kssd object to compare against.
- Returns:
double: The Jaccard similarity value.
-
double distance(Kssd *kssd)
Computes the Mash distance between this Kssd object and another.
- Parameters:
kssd (Kssd**): A pointer to another Kssd object to compare against.
- Returns:
double: The Mash distance value.
-
void transSketches(vector<KssdLite> &sketches, sketchInfo_t &info, string dictFile, string indexFile, int numThreads)
Transforms KssdLite sketches for indexing purposes.
- Parameters:
sketches (vector<KssdLite>&): A reference to the vector of KssdLite sketches.
info (sketchInfo_t&): Metadata associated with the sketches.
dictFile (string): Path to the dictionary file.
indexFile (string): Path to the index file.
numThreads (int): The number of threads to use.
-
void index_tridist(vector<KssdLite> &sketches, sketchInfo_t &info, string refSketchOut, string outputFile, int kmer_size, double maxDist, int isContainment, int numThreads)
Computes the sketch index using a distance-based method.
- Parameters:
sketches (vector<KssdLite>&): A reference to the vector of KssdLite sketches.
info (sketchInfo_t&): Metadata associated with the sketches.
refSketchOut (string): Path to save the reference sketches.
outputFile (string): Path to the output file.
kmer_size (int): Size of the k-mers.
maxDist (double): Maximum allowed distance.
isContainment (int): Whether to use containment comparison.
numThreads (int): The number of threads to use.
-
void saveSketches(vector<KssdLite> &sketches, sketchInfo_t &info, string outputFile)
Saves the KssdLite sketches to a specified file.
- Parameters:
sketches (vector<KssdLite>&): A reference to the vector of KssdLite sketches.
info (sketchInfo_t&): Metadata associated with the sketches.
outputFile (string): Path to the output file.
-
bool isSketchFile(string inputFile)
Checks if the given file has a .sketch extension.
- Parameters:
inputFile (string): The file name to check.
- Returns:
bool: true if the file is a .sketch file, otherwise false.
-
void printInfos(vector<Sketch::Kssd*> &sketches, string outputFile)
Prints basic information about the sketches to the specified file.
- Parameters:
sketches (vector<Sketch::Kssd>&*): A reference to the vector of Sketch::Kssd objects.
outputFile (string): Path to the output file.
-
void printSketches(vector<Sketch::Kssd*> &sketches, string outputFile)
Prints detailed information about the sketches, including all stored hashes.
- Parameters:
sketches (vector<Sketch::Kssd>&*): A reference to the vector of Sketch::Kssd objects.
outputFile (string): Path to the output file.
-
KssdLite toLite() const
Converts the current Kssd object into a lightweight KssdLite representation.
- Returns:
KssdLite: A lightweight representation of the Kssd object.
—
Python Interface
-
Kssd(kssd_parameter_t params)
Constructs a Kssd object with the specified parameters.
- Parameters:
params (kssd_parameter_t): A parameter object containing configuration for Kssd.
- class Kssd(params: kssd_parameter_t)
Constructs a Kssd object in Python.
- Parameters:
params (kssd_parameter_t): Configuration parameters for the Kssd object.
-
void update(const char *seq)
Updates the Kssd object with the provided sequence.
- Parameters:
seq (const char*): A pointer to the sequence to be processed.
- update(seq: str)
Updates the sketch with the given sequence.
- Parameters:
seq (str): The sequence to be processed.
-
double jaccard(Kssd *kssd)
Computes the Jaccard similarity between this Kssd object and another.
- Parameters:
kssd (Kssd**): A pointer to another Kssd object to compare against.
- Returns:
double: The Jaccard similarity value.
- jaccard(other: Kssd) float
Computes the Jaccard similarity between this sketch and another.
- Parameters:
other (Kssd): The other Kssd object to compare.
- Returns:
float: The Jaccard similarity value.
-
double distance(Kssd *kssd)
Computes the Mash distance between this Kssd object and another.
- Parameters:
kssd (Kssd**): A pointer to another Kssd object to compare against.
- Returns:
double: The Mash distance value.
- distance(other: Kssd) float
Computes the Mash distance between this sketch and another.
- Parameters:
other (Kssd): The other Kssd object to compare.
- Returns:
float: The Mash distance value.
-
vector<uint32_t> storeHashes()
Retrieves the stored 32-bit hashes from the Kssd object.
- Returns:
vector<uint32_t>: A vector of 32-bit hashes.
- store_hashes() list[int]
Retrieves the stored 32-bit hashes.
- Returns:
list[int]: A list of 32-bit hashes.
-
KssdLite toLite() const
Converts the current Kssd object into a lightweight KssdLite representation.
- Returns:
KssdLite: A lightweight representation of the Kssd object.
- toLite() KssdLite
Converts the current Kssd object to a lightweight KssdLite representation.
- Returns:
KssdLite: A lightweight representation of the Kssd object.
-
kssd_parameter_t(int half_k_, int half_subk_, int drlevel_, string shuffle_file)
Constructs a kssd_parameter_t object with the specified parameters.
- Parameters:
half_k_ (int): Half of the k-mer size used for hashing.
half_subk_ (int): Half of the sub k-mer size for dimensionality reduction.
drlevel_ (int): Dimensionality reduction level.
shuffle_file (string): Path to the shuffle file containing dimension map.
- class kssd_parameter_t(half_k: int, half_subk: int, drlevel: int, shuffle_file: str)
Represents the parameter set used for configuring the Kssd object.
- Parameters:
half_k (int): Half of the k-mer size used for hashing.
half_subk (int): Half of the sub k-mer size for dimensionality reduction.
drlevel (int): Dimensionality reduction level.
shuffle_file (str): Path to the shuffle file containing dimension map.
- save_sketches(sketches: list[KssdLite], info: SketchInfo, filename: str)
Saves a list of KssdLite sketches to a specified file.
- Parameters:
sketches (list[KssdLite]): A list of sketches to save.
info (SketchInfo): Metadata associated with the sketches.
filename (str): Path to the file where the sketches will be saved.
- trans_sketches(sketches: list[KssdLite], info: SketchInfo, dict_file: str, index_file: str, num_threads: int)
Transforms KssdLite sketches into a format suitable for indexing.
- Parameters:
sketches (list[KssdLite]): A list of sketches to transform.
info (SketchInfo): Metadata associated with the sketches.
dict_file (str): Path to the dictionary file used for transformation.
index_file (str): Path to the index file used for transformation.
num_threads (int): Number of threads for parallel processing.
- index_dict(sketches: list[KssdLite], info: SketchInfo, ref_sketch_out: str, output_file: str, kmer_size: int, max_dist: float, is_containment: bool, num_threads: int)
Computes the sketch index using a distance-based dictionary approach.
- Parameters:
sketches (list[KssdLite]): A list of sketches to process.
info (SketchInfo): Metadata associated with the sketches.
ref_sketch_out (str): Path to save the reference sketches.
output_file (str): Path to the file where results will be saved.
kmer_size (int): Size of the k-mers used for sketching.
max_dist (float): Maximum allowed distance for comparisons.
is_containment (bool): Whether to use containment comparisons.
num_threads (int): Number of threads for parallel processing.