Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BINDdetect vs plotHeatmap #247

Open
mbergsland opened this issue Dec 13, 2023 · 3 comments
Open

BINDdetect vs plotHeatmap #247

mbergsland opened this issue Dec 13, 2023 · 3 comments

Comments

@mbergsland
Copy link

Hi
First of all, thanks a lot for such a nice and useful tool!

I have two sets of ATAC-samples; control cells and cells over-expressing TF A. I have ChIP-seq peaks for TF A which are the regions that I have investigated for footprints with TOBIAS. Importantly, centrally enriched TF Motif analysis of ChIP-seq peaks showed centrally enriched motif of TF A, but also a very strong centrally enriched motif of non-related TF B. With this background, my main question to solve with TOBIAS was whether TF A and TF B bind together or exclusive of each other (TF A is not expressed in control condition).
From BINDdetect, volcano plot shows TF A footprints and TF B footprints on each side which would, I guess, suggest that the 2 factors do not bind together but rather exclusive of each other (fig1). Considering the strong centrally enriched motifs of both factors within the examined peaks, I assume that the majority of the signals come from the center of these peaks
Higher score is shown for TF A in control sample (sample 204) and for TF B in sample with overexpression of TF A (sample 201) in the volcano plot (fig1). This does not make sense since TF A is overexpressed in sample 201 and should show stronger footprints than control sample 204 (binding of TF A is also confirmed from ChIP-seq). Moreover, in the plots from plotHeatmap, footprints (fig2) from TF A is seen in sample 201 and not in sample 204. For TF B the footprint seem stronger for 201. These results seem to be opposite to what is seen in the volcano plot (TF A and TF B footprints increased in each sample, not the same), why is that? It would be very helpful to understand these data properly.
I also have a question regarding plotAggregate. Fig3 shows one single region with all footprint signals (bed file with one region), is zero referring to the middle of the peak in this case? What is zero referring to when several regions are analyzed by this function?
Untitled-2

@mohobein
Copy link
Collaborator

Hey @mbergsland,

thank you for your issue. At first glance, this does indeed look unexpected. In order to help you properly, it would be great if we could know how exactly you generated these results. Could you please post the commands you used? Especially the input files would be of great interest.

As for your question about figure 3, you are right that 0 is referencing the center of the input regions. The command is supposed to visualize the aggregated signal of all provided binding sites around the motif. Therefore, the meaning of 0 on the x axis does not change with varying amounts of input regions. Instead, more signal tracks are aggregated and their mean is displayed on the y axis of the plot.
If you want to take a look at larger individual peaks, perhaps a genome viewer like IGV would be better suited for the task than plotAggregate.

Best regards,
the TOBIAS team

@mbergsland
Copy link
Author

mbergsland commented Dec 20, 2023

Thank you for your response.

I have been using the following commands,
For Tn5 corr and making bw files:

TOBIAS ATACorrect \ --bam 204_REP1.mLb-clN.sorted.bam --genome /Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa --peaks ChIPSeq_peaks.bed --cores 16 --outdir /Tobias/Footprint204/

TOBIAS ScoreBigwig \ --signal /Tobias/Footprint204/204_REP1.mLb-clN.sorted_corrected.bw --regions ChIPSeq_peaks.bed --cores 16 --output /Tobias/Footprint204/204_REP1.mLb-clN.sorted_footprints.bw

And for BINDdetect and plotheatmap:

TOBIAS BINDetect \ --motifs /Tobias/non_redundant_motifs/all_motifs.txt --signals /Tobias/Footprint204/204_REP1.mLb-clN.sorted_footprints.bw /Tobias/Footprint201_Sox21ChIP/201_REP1.mLb-clN.sorted_footprints.bw --genome /Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa --peaks ChIPSeq_peaks.bed --cores 16 --outdir /Tobias/204_vs_201/BINDdetect

TOBIAS PlotHeatmap --TFBF /Tobias/204_vs_201_Sox21ChIP/BINDdetect/TF_B/beds/TF_B_all.bed --TFBS /Tobias/204_vs_201_Sox21ChIP/BINDdetect/TF_B/beds/TF_B_all.bed --signals /Tobias/Footprint204_Sox21ChIP/204_REP1.mLb-clN.sorted_corrected.bw /Tobias/Footprint201_Sox21ChIP/201_REP1.mLb-clN.sorted_corrected.bw --output TFB_heatmap.png --signal_labels nonDOX_204 DOX_201 --share_colorbar --sort_by -1

plotHeatmap for TF A was done in the same way.

Since TF A and TF B have motifs close to each other (both are centrally enriched in peaks from ChIP-seq of TF A), could it be possible that the stronger footprint in 201 for TF B is in fact due to binding of TF A I this area (TF A is overexpressed in 201 and not expressed in 204)? However, this still would not explain why the two footprints show up on each side in the volcano plot whereas both of them seem enhanced in sample 201 with plotHeatmap function.

Best,
mbergs

@msbentsen
Copy link
Member

Hi @mbergsland

Thank you for sending the commands, that's very helpful! I see that the commands were all run on the peaks of "ChIPSeq_peaks.bed", which only contain the chipseq peaks, correct? I am asking because during the TOBIAS BINDetect command, the input of all conditions is normalized towards each other to make all values fit an equal distribution. Since the input is only TF A locations, which might be largely closed in sample 204, the normalization might be skewed towards increasing scores in sample 204 artificially.

Can you try to run the commands on all open ATAC-seq peaks? For ATACorrect and Scorebigwig you can set --peaks/--regions atac_peaks.bed. In the final step of BINDetect, you can set --peaks atac_peaks.bed --output-peaks chipseq_peaks.bed, which will normalize based on all peaks but only build the volcano-plot based on the chipseq peaks. This will likely create a volcano plot heavy on the TF's enriched within the peaks, but might also show TF's underrepresented.

I hope this is helpful - otherwise please feel free to update here, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants