Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

output file #20

Open
Nuturetree opened this issue Apr 23, 2024 · 21 comments
Open

output file #20

Nuturetree opened this issue Apr 23, 2024 · 21 comments

Comments

@Nuturetree
Copy link

Hi author:
The diffDomain is a useful tool. But there seems to be something strange about my results, all the TADs have no p-values, is it possible that the input file should provide the original matrix not the iced matrix.
thanks.
Uploading Snipaste_2024-04-23_11-33-10.png…

@Nuturetree
Copy link
Author

Snipaste_2024-04-23_11-33-10

@Tian-Dechao
Copy link
Owner

Using the input file in .cool format should be sufficient. I recommend trying lower resolutions, such as 40kb. If the issue persists, please provide screenshots of any warning messages from DiffDomain and of the output file displaying all columns.

@Nuturetree
Copy link
Author

Thank you for your reply, the last issue has been resolved, which result from our err option. But there is a new problem that has arisen. When I use raw matrix as input I get an error with the following message:
python ~/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py dvsd multiple J668_mock_0min_Ghjin_D05.matrix J668_Fov7_720min_Ghjin_D05.matrix Ghjin_D05_TAD_region.bed --reso 20000
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 66, in
comp2domins_by_twtest_parallel(0)
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 59, in comp2domins_by_twtest_parallel
fhic0=opts[''], fhic1=opts[''],min_nbin=int(opts['--min_nbin']),f=opts['--f'])
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/utils.py", line 338, in comp2domins_by_twtest
mat0 = contact_matrix_from_hic(chrn, start, end, reso, fhic0, hicnorm)
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/utils.py", line 209, in contact_matrix_from_hic
k=domwin_dict[bin0]
KeyError: 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 59, in comp2domins_by_twtest_parallel
fhic0=opts[''], fhic1=opts[''],min_nbin=int(opts['--min_nbin']),f=opts['--f'])
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/utils.py", line 338, in comp2domins_by_twtest
mat0 = contact_matrix_from_hic(chrn, start, end, reso, fhic0, hicnorm)
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/utils.py", line 209, in contact_matrix_from_hic
k=domwin_dict[bin0]
KeyError: 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 76, in
result.append(i.get())
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
KeyError: 1

I'm guessing this could be a problem with the matrix to cool or hic conversion process, but am not sure exactly why. So I called hicexplorer to convert to cool format and then do the calculations, but it also reported an error.

@Nuturetree
Copy link
Author

python /public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py dvsd multiple J668_mock_0min_Ghjin_D05.cool J668_Fov7_720min_Ghjin_D05.cool Ghjin_D05_TAD_region.bed --reso 20000

@Nuturetree
Copy link
Author

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 59, in comp2domins_by_twtest_parallel
fhic0=opts[''], fhic1=opts[''],min_nbin=int(opts['--min_nbin']),f=opts['--f'])
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/utils.py", line 379, in comp2domins_by_twtest
Diffmatnorm = normDiffbyMeanSD(D=Diffmat)
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/utils.py", line 272, in normDiffbyMeanSD
b[k] = np.max(val1)
File "<array_function internals>", line 6, in amax
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 2755, in amax
keepdims=keepdims, initial=initial, where=where)
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation maximum which has no identity
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 76, in
result.append(i.get())
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
ValueError: zero-size array to reduction operation maximum which has no identity

@Nuturetree
Copy link
Author

@Nuturetree
Copy link
Author

I'm guessing if it's due to unequal lines in the input file

@Nuturetree
Copy link
Author

This is the number of hic for the same material at different times

@Nuturetree
Copy link
Author

@
Ghjin_D05_abs.zip

@Nuturetree
Copy link
Author

Since the two matrix matrices did not contain bin interactions equal to 0, the rows were not equal, which made it impossible to compare the two matrices after generating the cool. I used a script in hicpro to generate a symmetric N*N matrix and converted the lower triangular matrix to three columns "bin1 bin2 reads" and then used hicexplorer to convert the matrix to cool and then compared them but the fourth and fifth columns did not have any values.
Script:
python ~/biosoft/HiC-Pro-master/bin/utils/sparseToDense.py ${out_dir}/${c}.matrix -o ${out_dir}/${c}_Symmetries.matrix
Get lower triangular interactions
def extract_lower_triangle(df).
# Assume the DataFrame is N x N
N = df.shape[0]
# Get the row and column indices of the lower triangle
lower_tri_indices = np.tri_indices(N)

# Extract the values from the DataFrame using these indices
row_indices = lower_tri_indices[0]
col_indices = lower_tri_indices[1]
values = df.values[lower_tri_indices] # Extract values from the DataFrame

# Create a new DataFrame to store these values with proper labelling
lower_tri_df = pd.DataFrame({ 'Row': row_indices] # Create a new DataFrame to store these values with proper labelling
    'Row': row_indices + 1, # Convert to 1-based indexing
    'Column': col_indices + 1, # Convert to 1-based indexing
    'Value': values
})  
return lower_tri_df

Convert to cool
hicConvertFormat -m ${mtx_f1} --bedFileHicpro ${abs_f} --inputFormat hicpro --outputFormat cool -o ${cool_f1} --resolutions 20000
run diffdomain
python ~/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py dvsd multiple ${cool_f1} ${cool_f2} Ghjin_D05_TAD_region.bed --reso 20000

@Nuturetree
Copy link
Author

I can get results, but the fifth and sixth columns don't have any values. cool1,cool2, tad and results files in the attachment
input_result.zip

@Tian-Dechao
Copy link
Owner

Thank you for your reply, the last issue has been resolved, which result from our err option. But there is a new problem that has arisen. When I use raw matrix as input I get an error with the following message: python ~/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py dvsd multiple J668_mock_0min_Ghjin_D05.matrix J668_Fov7_720min_Ghjin_D05.matrix Ghjin_D05_TAD_region.bed --reso 20000 multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 66, in comp2domins_by_twtest_parallel(0) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 59, in comp2domins_by_twtest_parallel fhic0=opts[''], fhic1=opts[''],min_nbin=int(opts['--min_nbin']),f=opts['--f']) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/utils.py", line 338, in comp2domins_by_twtest mat0 = contact_matrix_from_hic(chrn, start, end, reso, fhic0, hicnorm) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/utils.py", line 209, in contact_matrix_from_hic k=domwin_dict[bin0] KeyError: 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, **kwds)) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 59, in comp2domins_by_twtest_parallel fhic0=opts[''], fhic1=opts[''],min_nbin=int(opts['--min_nbin']),f=opts['--f']) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/utils.py", line 338, in comp2domins_by_twtest mat0 = contact_matrix_from_hic(chrn, start, end, reso, fhic0, hicnorm) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/utils.py", line 209, in contact_matrix_from_hic k=domwin_dict[bin0] KeyError: 1 """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 76, in result.append(i.get()) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/multiprocessing/pool.py", line 657, in get raise self._value KeyError: 1

I'm guessing this could be a problem with the matrix to cool or hic conversion process, but am not sure exactly why. So I called hicexplorer to convert to cool format and then do the calculations, but it also reported an error.

Let's address the issue with this specific usage first. Thank you for providing the example data. Format in Glr19_mock_0min_Ghjin_D05.matrix and Glr19_Fov7_720min_Ghjin_D05.matrix does not meet the requirements of DiffDomain. In the three-column format input file for DiffDomain, the first two columns document the exact genomic locations (bin ID * reso) of two bins in a chromatin interaction, similar to the outputs from straw function.

For example, the first two lines in the Glr19_mock_0min_Ghjin_D05.matrix should be

20000       20000       57
20000       40000       58  

rather than

1       1       57
1       2       58

Please revise the format of the .matrix files and try this usage again. Kindly let us know if the issue is solved.

@Nuturetree
Copy link
Author

Thank you very much for your answer, I followed your suggestion on bin_id*reso, but still got the error message as below:
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 66, in
comp2domins_by_twtest_parallel(0)
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 59, in comp2domins_by_twtest_parallel
fhic0=opts[''], fhic1=opts[''],min_nbin=int(opts['--min_nbin']),f=opts['--f'])
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/utils.py", line 338, in comp2domins_by_twtest
mat0 = contact_matrix_from_hic(chrn, start, end, reso, fhic0, hicnorm)
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/utils.py", line 209, in contact_matrix_from_hic
k=domwin_dict[bin0]
KeyError: 20000

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 59, in comp2domins_by_twtest_parallel
fhic0=opts[''], fhic1=opts[''],min_nbin=int(opts['--min_nbin']),f=opts['--f'])
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/utils.py", line 338, in comp2domins_by_twtest
mat0 = contact_matrix_from_hic(chrn, start, end, reso, fhic0, hicnorm)
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/utils.py", line 209, in contact_matrix_from_hic
k=domwin_dict[bin0]
KeyError: 20000
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 76, in
result.append(i.get())
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
KeyError: 20000

@Nuturetree
Copy link
Author

@Nuturetree
Copy link
Author

Really think diffdomain is a very useful tool, and thanks for your prompt reply!

@Tian-Dechao
Copy link
Owner

A quick response first. There is a bug when loading three-column files as the input. Current version assigns every chromatin interaction in the input file for every TAD, which is wrong. For example, assigning the interaction 20000 20000 49 to the TAD Ghjin_D05 320000 500000 will obviously reaise the KeyError: 20000, since bin 20000 does not belong to the TAD region 320000-500000. A quick fix is on the way and will be uploaded soon.

Meanwhile, the input file in .hic or .cool/.mcool format is fine, because DiffDomain leverages straw for .hic file or fetch for .cool/.mcool files to first extract the subset of chromatin interactions that are within a given TAD.

@Tian-Dechao
Copy link
Owner

We have fixed the bug in reading chromatin interactions with three-column sparse format. Please follow the instruction in Method1: to install the conda environment to install DiffDomain and rerun the command.

This version has been tested on Macos.
Code
python3 diffDomain/diffdomain-py3/diffdomains.py dvsd multiple J668_Fov7_720min_Ghjin_D05_reso.matrix J668_mock_0min_Ghjin_D05_reso.matrix Ghjin_D05_TAD_region.bed --reso 20000 --ofile test.tsv

Output
Screenshot 2024-04-28 at 9 46 17 PM

Full results here
test.tsv.zip

@Nuturetree
Copy link
Author

Thank you for your reply, I would like to ask if the files are normalized (KR or ICE) when using three-column sparse format as input, as I found that a normalization COOL file is generated when using COOL as input, whereas there is no such file when using three-column sparse format as input!

@Tian-Dechao
Copy link
Owner

Tian-Dechao commented Apr 29, 2024

Thank you for your reply, I would like to ask if the files are normalized (KR or ICE) when using three-column sparse format as input, as I found that a normalization COOL file is generated when using COOL as input, whereas there is no such file when using three-column sparse format as input!

The three-column sparse format is used as is, with no normalization performed by DiffDomain. .hic format or .cool/.mcool format are highly recommended for using normalized Hi-C interactions.

@Nuturetree
Copy link
Author

accroding your suggestion, I want to using the cool file as input file,but generated err:
test_input.zip

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/public/home/xhhuang/miniconda3/envs/diffdomain2/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/public/home/xhhuang/biosoft/diffDomain/diffdomain-py3/diffdomains.py", line 59, in comp2domins_by_twtest_parallel
fhic0=opts[''], fhic1=opts[''],min_nbin=int(opts['--min_nbin']),f=opts['--f'])
File "/public/home/xhhuang/biosoft/diffDomain/diffdomain-py3/utils.py", line 385, in comp2domins_by_twtest
Diffmatnorm = normDiffbyMeanSD(D=Diffmat)
File "/public/home/xhhuang/biosoft/diffDomain/diffdomain-py3/utils.py", line 266, in normDiffbyMeanSD
b[k] = np.max(val1)
File "<array_function internals>", line 6, in amax
File "/public/home/xhhuang/miniconda3/envs/diffdomain2/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 2755, in amax
keepdims=keepdims, initial=initial, where=where)
File "/public/home/xhhuang/miniconda3/envs/diffdomain2/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation maximum which has no identity
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/public/home/xhhuang/biosoft/diffDomain/diffdomain-py3/diffdomains.py", line 76, in
result.append(i.get())
File "/public/home/xhhuang/miniconda3/envs/diffdomain2/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
ValueError: zero-size array to reduction operation maximum which has no identity

@Nuturetree
Copy link
Author

the cool file generated by using the hicConvertFormat

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants