Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hi, I have the following three questions for CreateNetwork: #265

Closed
hyBio opened this issue Apr 20, 2024 · 3 comments
Closed

Hi, I have the following three questions for CreateNetwork: #265

hyBio opened this issue Apr 20, 2024 · 3 comments
Labels

Comments

@hyBio
Copy link

hyBio commented Apr 20, 2024

          Hi, I have the following three questions for CreateNetwork: 
  1. The two columns in the motif2gene_mapping.txt file are supposed to be the motif name \t gene name or the gene name \t gene product name as shown below, which is very confusing to me.
    image
  2. does the first column of motif2gene_mapping.txt need to match with the fourth column of TFBS, and do I need to adjust accordingly if I customize the motif name?
  3. If it is a non-model species, how should I get the motif and its regulated gene set, can I just use the motif2gene_mapping.txt in test data?
    Looking forward to your reply, thanks a lot.

Originally posted by @hyBio in #260 (comment)

@mohobein
Copy link
Collaborator

Hey @hyBio,

thank you for using TOBIAS.

  1. The image you refer to from the wiki does not depict the motif2gene_mapping.txt file, but rather how CreateNetwork operates to build TF binding networks. Basically, you need to tell the the tool which TF is expressed by which gene. Then, the tool checks which TFBS identified by BINDetect are associated with the expression of these genes. That way, we see if one TF is influencing the expression of another TF further down the network, because we know which gene expresses it and which TFs bind to the promoter of this TFs gene. The figure your provided image stems from depicts this concept, not the format of the --origin file, though I can see why this can be confusing.
    So to answer the question about the columns of the motif2gene_mapping.txt file: First column is the TF/motif name, second column is the ID of the gene that encodes it.
  2. Yes, they have to match. Otherwise, the tool cannot find any TFBS associated with your given TFs and genes. If you did adjustments to the motif names before, you have to use these adjusted names in the --origin file as well. Perhaps the --naming parameter of TOBIAS BINDetect is of interest to you in this context.
  3. The motif2gene_mapping.txt file in test_data is for human genes, they will not work for other species. This issue describes how to create the file from scratch. You can just take the genes.gtf file for your organism and get all lines where gene_name matches one of your TF motif names from your JASPAR file. Each line left contains both the gene_name and gene_id, which you can then use to fill your two columns for you --origin file.

I hope this clears up your questions. If you are in need of further assistance, let me know!

Best regards,
Moritz

Copy link

No activity for at least 30 days. Marking issue as stale. Stale issues are closed after one week.

@github-actions github-actions bot added the Stale label Jul 16, 2024
@hyBio
Copy link
Author

hyBio commented Jul 16, 2024

Hey @hyBio,

thank you for using TOBIAS.

  1. The image you refer to from the wiki does not depict the motif2gene_mapping.txt file, but rather how CreateNetwork operates to build TF binding networks. Basically, you need to tell the the tool which TF is expressed by which gene. Then, the tool checks which TFBS identified by BINDetect are associated with the expression of these genes. That way, we see if one TF is influencing the expression of another TF further down the network, because we know which gene expresses it and which TFs bind to the promoter of this TFs gene. The figure your provided image stems from depicts this concept, not the format of the --origin file, though I can see why this can be confusing.
    So to answer the question about the columns of the motif2gene_mapping.txt file: First column is the TF/motif name, second column is the ID of the gene that encodes it.
  2. Yes, they have to match. Otherwise, the tool cannot find any TFBS associated with your given TFs and genes. If you did adjustments to the motif names before, you have to use these adjusted names in the --origin file as well. Perhaps the --naming parameter of TOBIAS BINDetect is of interest to you in this context.
  3. The motif2gene_mapping.txt file in test_data is for human genes, they will not work for other species. This issue describes how to create the file from scratch. You can just take the genes.gtf file for your organism and get all lines where gene_name matches one of your TF motif names from your JASPAR file. Each line left contains both the gene_name and gene_id, which you can then use to fill your two columns for you --origin file.

I hope this clears up your questions. If you are in need of further assistance, let me know!

Best regards, Moritz

Hi @mohobein,

Thank you very much for your reply, your advice is very useful for my project, but other work has made me stop exploring TFBS for a while, maybe I'll ask you about TOBIAS usage when I restart this project.

Once again, my sincere thanks!
Yan Hu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants