Skip to content

Easy way to retrieve the groupNames associated with kmer's color? #92

@mr-eyes

Description

@mr-eyes

While I am in the wrapping process, I tried to retrieve the groups associated with a kmer color, but I couldn't find a direct way.

So, I will put what I understood so far, and please correct me if I'm wrong.

After indexing, we will have a kDataFrame with key(hashVal):Val(kmerOrder). Then we can get the color associated with that kmer through the following getKmerColumn function getKmerColumn("color", hashVal)

T getKmerColumnValue(const string& columnName,uint64_t kmer);

Or by kmer Order like here,

T getKmerColumnValueByOrder(const string& columnName,uint64_t kmerOrder);

Now I have the color. How can I get to the color->group_IDs through the kDataFrame in an easy way, if possible?


Here's the corresponding Python code for this.

import kProcessor as kp

kf_map = kp.kDataFramePHMAP(21)

fasta_file = "seq.fa"
names_file = "seq.fa.names"

kp.index(kf_map, {"kSize": 21}, fasta_file, 1, names_file)

print(f"total size: {kf_map.size()}")
print(f"Column names: {kf_map.getColumnNames()}")

hash_to_color = dict()

it = kf_map.begin()
while it != kf_map.end():
    kmer_hash = it.getHashedKmer()
    kmer_color = kf_map.getKmerColumnValue_int("color", it.getHashedKmer())
    hash_to_color[kmer_hash] = kmer_color
    it.next()


print("kmer to colors")
for _hash, color in hash_to_color.items():
    print(f"hash({_hash}) : color({color})")

cc @drtamermansour @shokrof

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requestedv2kProcessor version 2

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions