Hi, I have the following three questions for CreateNetwork: #265

hyBio · 2024-04-20T04:09:00Z

          Hi, I have the following three questions for CreateNetwork:

The two columns in the motif2gene_mapping.txt file are supposed to be the motif name \t gene name or the gene name \t gene product name as shown below, which is very confusing to me.
does the first column of motif2gene_mapping.txt need to match with the fourth column of TFBS, and do I need to adjust accordingly if I customize the motif name?
If it is a non-model species, how should I get the motif and its regulated gene set, can I just use the motif2gene_mapping.txt in test data?
Looking forward to your reply, thanks a lot.

Originally posted by @hyBio in #260 (comment)

The text was updated successfully, but these errors were encountered:

mohobein · 2024-04-22T12:11:23Z

Hey @hyBio,

thank you for using TOBIAS.

The image you refer to from the wiki does not depict the motif2gene_mapping.txt file, but rather how CreateNetwork operates to build TF binding networks. Basically, you need to tell the the tool which TF is expressed by which gene. Then, the tool checks which TFBS identified by BINDetect are associated with the expression of these genes. That way, we see if one TF is influencing the expression of another TF further down the network, because we know which gene expresses it and which TFs bind to the promoter of this TFs gene. The figure your provided image stems from depicts this concept, not the format of the --origin file, though I can see why this can be confusing.
So to answer the question about the columns of the motif2gene_mapping.txt file: First column is the TF/motif name, second column is the ID of the gene that encodes it.
Yes, they have to match. Otherwise, the tool cannot find any TFBS associated with your given TFs and genes. If you did adjustments to the motif names before, you have to use these adjusted names in the --origin file as well. Perhaps the --naming parameter of TOBIAS BINDetect is of interest to you in this context.
The motif2gene_mapping.txt file in test_data is for human genes, they will not work for other species. This issue describes how to create the file from scratch. You can just take the genes.gtf file for your organism and get all lines where gene_name matches one of your TF motif names from your JASPAR file. Each line left contains both the gene_name and gene_id, which you can then use to fill your two columns for you --origin file.

I hope this clears up your questions. If you are in need of further assistance, let me know!

Best regards,
Moritz

github-actions · 2024-07-16T07:24:48Z

No activity for at least 30 days. Marking issue as stale. Stale issues are closed after one week.

hyBio · 2024-07-16T07:38:29Z

Hey @hyBio,

thank you for using TOBIAS.

The image you refer to from the wiki does not depict the motif2gene_mapping.txt file, but rather how CreateNetwork operates to build TF binding networks. Basically, you need to tell the the tool which TF is expressed by which gene. Then, the tool checks which TFBS identified by BINDetect are associated with the expression of these genes. That way, we see if one TF is influencing the expression of another TF further down the network, because we know which gene expresses it and which TFs bind to the promoter of this TFs gene. The figure your provided image stems from depicts this concept, not the format of the --origin file, though I can see why this can be confusing.
So to answer the question about the columns of the motif2gene_mapping.txt file: First column is the TF/motif name, second column is the ID of the gene that encodes it.

Yes, they have to match. Otherwise, the tool cannot find any TFBS associated with your given TFs and genes. If you did adjustments to the motif names before, you have to use these adjusted names in the --origin file as well. Perhaps the --naming parameter of TOBIAS BINDetect is of interest to you in this context.

The motif2gene_mapping.txt file in test_data is for human genes, they will not work for other species. This issue describes how to create the file from scratch. You can just take the genes.gtf file for your organism and get all lines where gene_name matches one of your TF motif names from your JASPAR file. Each line left contains both the gene_name and gene_id, which you can then use to fill your two columns for you --origin file.

I hope this clears up your questions. If you are in need of further assistance, let me know!

Best regards, Moritz

Hi @mohobein,

Thank you very much for your reply, your advice is very useful for my project, but other work has made me stop exploring TFBS for a while, maybe I'll ask you about TOBIAS usage when I restart this project.

Once again, my sincere thanks!
Yan Hu

github-actions bot added the Stale label Jul 16, 2024

mohobein closed this as completed Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hi, I have the following three questions for CreateNetwork: #265

Hi, I have the following three questions for CreateNetwork: #265

hyBio commented Apr 20, 2024

mohobein commented Apr 22, 2024

github-actions bot commented Jul 16, 2024

hyBio commented Jul 16, 2024

Hi, I have the following three questions for CreateNetwork: #265

Hi, I have the following three questions for CreateNetwork: #265

Comments

hyBio commented Apr 20, 2024

mohobein commented Apr 22, 2024

github-actions bot commented Jul 16, 2024

hyBio commented Jul 16, 2024