Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyzing Chip-seq data on genome graph #22

Open
yihangs opened this issue Oct 2, 2023 · 13 comments
Open

Analyzing Chip-seq data on genome graph #22

yihangs opened this issue Oct 2, 2023 · 13 comments

Comments

@yihangs
Copy link

yihangs commented Oct 2, 2023

The Nature paper says that by using the software graph_peak_caller, we can do ChiP-Seq analysis on genome graphs. However, the original graph_peaker_caller paper says "Graph Peak Caller was developed by extending the methodologies and concepts from MACS2 to directed acyclic graphs (DAGs). The MACS2 algorithm can be divided into five steps... We have adopted each of these steps to work on DAGs", indicating that graph_peak_caller will only work on DAGs. Therefore, I have two questions:

  1. Did you try graph_peak_caller on DAG genome graphs or non-DAG genome graphs? If it is used on non-DAG genome graphs, do you think graph_peak_caller still works well?
  2. If it is used on DAG genome graphs, what kind of methods you used for converting original non-DAG genome graphs to DAGs?

Thanks!

@glennhickey
Copy link
Collaborator

Pinging @cgroza who did this work.

@cgroza
Copy link

cgroza commented Oct 2, 2023

Hi,

Please see

cgroza/personalized_genomes_gbio#1

@yihangs
Copy link
Author

yihangs commented Oct 2, 2023

Thank you so much! I have two follow-up questions:

  1. Did you use a decomposed VCF file of minigraph-cactus graph or Raw VCF? What are their differences?
  2. Did you use any filtering steps on mapped reads? I checked the code call_peaks.nf, and it seems that you didn't use vg filter, but instead used graph_peak_caller count_unique_reads. I don't quite understand this function, is it another way to do filtering?

@cgroza
Copy link

cgroza commented Oct 2, 2023

Hi,

  1. I used the raw VCF. For the meaning of decomposed VCF, see https://github.com/human-pangenomics/hpp_pangenome_resources#vcf-decomposition

  2. We chose not to filter the reads in this case, since many of the SVs have lower mappability with short reads libraries. For your use case, it may make sense to filter the reads. The number of unique reads is a parameter in GraphPeakCaller. Sometimes ChIP-seq will produce two identical reads, which this number excludes.

@yihangs
Copy link
Author

yihangs commented Oct 2, 2023

Thanks! I will try it. One more question:

  1. It seems that vg mod --unfold can also help convert graphs to DAG, Have you tried that? If so, did you meet any issues?

@cgroza
Copy link

cgroza commented Oct 2, 2023 via email

@yihangs
Copy link
Author

yihangs commented Oct 4, 2023

Hi,

I am reproducing the results according to https://github.com/cgroza/personalized_genomes_gbio/issues/1 via the following steps:

  1. download the raw vcf file, hprc-v1.1-mc-grch38.raw.vcf.gz.
  2. run the script python strip-nested.py hprc-v1.1-mc-grch38.raw.vcf.gz > hprc-v1.1-mc-grch38.raw.filter.vcf
  3. bgzip and index hprc-v1.1-mc-grch38.raw.filter.vcf
  4. run vg construct -S -f -r hg38.fa -v hprc-v1.1-mc-grch38.raw.filter.vcf > graph.vg

However, an error occurs in the last step, with error messages: https://github.com/vgteam/vg/issues/4109.

Did you meet this error? According to the error messages, I suspect that it is because vg construct cannot read super large SVs, hence, when you run strip-nested.py, did you provide a threshold on the variable max_len?

Thank you!

@cgroza
Copy link

cgroza commented Oct 5, 2023

Yes I did.

Try 10 kbp.

@yihangs
Copy link
Author

yihangs commented Oct 6, 2023

Hi,

I use 10kbp to filter the vcf file, python strip-nested.py hprc-v1.1-mc-grch38.raw.vcf.gz 10000 > hprc-v1.1-mc-grch38.raw.filter.10kb.vcf

The vg construct step works fine, however, an error occurs when I do the index vg index -x hprc-v1.1-mc-grch38-filter-10kb.xg -g hprc-v1.1-mc-grch38-filter-10kb.gcsa -b ./tmp/ -p hprc-v1.1-mc-grch38-filter-10kb.vg -t 20 with error message:

Building XG index
Saving XG index to hprc-v1.1-mc-grch38-filter-10kb.xg
Generating kmer files...
Building the GCSA2 index...
InputGraph::InputGraph(): 10793035616 kmers in 1 file(s)
InputGraph::read(): Read 10793035616 16-mers from ./tmp//vg-OBo2ha/vg-kmers-tmp-5QqyQ1
InputGraph::readKeys(): 1730614098 unique keys
InputGraph::read(): Read 10793035616 16-mers from ./tmp//vg-OBo2ha/vg-kmers-tmp-5QqyQ1
InputGraph::readFrom(): 8906588953 unique start nodes
InputGraph::read(): Read 10793035616 16-mers from ./tmp//vg-OBo2ha/vg-kmers-tmp-5QqyQ1
PathGraph::PathGraph(): 10793035616 paths with 21586071232 ranks
PathGraph::PathGraph(): 321.658 GB in 1 file(s)
GCSA::GCSA(): Preprocessing: 4328.06 seconds, 394.435 GB
GCSA::GCSA(): Prefix-doubling from path length 16
GCSA::GCSA(): Step 1 (path length 16 -> 32)
PathGraph::prune(): 10793035616 -> 10777627852 paths (1728649963 ranges)
PathGraph::prune(): 700183376 unique, 0 redundant, 10077437423 unsorted, 7053 nondeterministic paths
PathGraph::prune(): 321.198 GB in 1 file(s)
PathGraph::read(): File 0: Read 10777627852 order-16 paths
PathGraphBuilder::write(): Size limit exceeded, construction aborted

Looks like it is still a memory issue. Did you do any special hyper-parameter settings on the index step?

Thanks!

@cgroza
Copy link

cgroza commented Oct 6, 2023

You may have more luck with vg autoindex, which is a built in function that will take a VCF and construct and index a graph for you.

@yihangs
Copy link
Author

yihangs commented Oct 6, 2023

Thanks! Which workflow do you use, giraffe or map? would giraffe index .gbz more memory efficient?

@cgroza
Copy link

cgroza commented Oct 6, 2023 via email

@yihangs
Copy link
Author

yihangs commented Oct 9, 2023

Hi,

I tried autoindex but still get an error when constructing GCSA:

[IndexRegistry]: Constructing GCSA/LCP indexes.
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
━━━━━━━━━━━━━━━━━━━━
Crash report for vg v1.49.0-9-ge61a8a475 "Peschici"
Stack trace (most recent call last):
#23   Object "", at 0xffffffffffffffff, in 
#22   Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff383818f9, in _start
#21   Object "/lib/x86_64-linux-gnu/libc-2.27.so", at 0x7f2d4bf17b96, in __libc_start_main
      Source "../csu/libc-start.c", line 310, in __libc_start_main [0x7f2d4bf17b96]
#20   Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff38355bea, in main
      Source "src/main.cpp", line 88, in main [0x55ff38355bea]
#19   Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff38ac5c97, in vg::subcommand::Subcommand::operator()(int, char**) const
    | Source "src/subcommand/subcommand.cpp", line 75, in operator()
      Source "/usr/include/c++/9/bits/std_function.h", line 688, in operator() [0x55ff38ac5c97]
        685:     {
        686:       if (_M_empty())
        687: 	__throw_bad_function_call();
      > 688:       return _M_invoker(_M_functor, std::forward<_ArgTypes>(__args)...);
        689:     }
        690: 
        691: #if __cpp_rtti
#18   Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff38af3c5c, in main_autoindex(int, char**)
      Source "src/subcommand/autoindex_main.cpp", line 360, in main_autoindex [0x55ff38af3c5c]
#17   Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff38fc398e, in vg::IndexRegistry::make_indexes(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)
      Source "src/index_registry.cpp", line 4253, in make_indexes [0x55ff38fc398e]
#16   Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff38facc49, in vg::IndexRegistry::execute_recipe(std::pair<std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, unsigned long> const&, vg::IndexingPlan const*, vg::AliasGraph&)
    | Source "src/index_registry.cpp", line 5204, in execute
    | Source "src/index_registry.cpp", line 5369, in operator()
      Source "/usr/include/c++/9/bits/std_function.h", line 688, in execute_recipe [0x55ff38facc49]
        685:     {
        686:       if (_M_empty())
        687: 	__throw_bad_function_call();
      > 688:       return _M_invoker(_M_functor, std::forward<_ArgTypes>(__args)...);
        689:     }
        690: 
        691: #if __cpp_rtti
#15   Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff38fa7759, in std::_Function_handler<std::vector<std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > (std::vector<vg::IndexFile const*, std::allocator<vg::IndexFile const*> > const&, vg::IndexingPlan const*, vg::AliasGraph&, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&), vg::VGIndexes::get_vg_index_registry()::{lambda(std::vector<vg::IndexFile const*, std::allocator<vg::IndexFile const*> > const&, vg::IndexingPlan const*, vg::AliasGraph&, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)#51}>::_M_invoke(std::_Any_data const&, std::vector<vg::IndexFile const*, std::allocator<vg::IndexFile const*> > const&, vg::IndexingPlan const*&&, vg::AliasGraph&, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)
    | Source "/usr/include/c++/9/bits/std_function.h", line 286, in operator()
    |   284:       {
    |   285: 	return (*_Base::_M_get_pointer(__functor))(
    | > 286: 	    std::forward<_ArgTypes>(__args)...);
    |   287:       }
    |   288:     };
      Source "src/index_registry.cpp", line 3651, in _M_invoke [0x55ff38fa7759]
#14   Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff38fa7397, in vg::VGIndexes::get_vg_index_registry()::{lambda(std::vector<vg::IndexFile const*, std::allocator<vg::IndexFile const*> > const&, vg::IndexingPlan const*, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)#50}::operator()(std::vector<vg::IndexFile const*, std::allocator<vg::IndexFile const*> > const&, vg::IndexingPlan const*, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) const [clone .isra.0]
      Source "src/index_registry.cpp", line 3601, in operator() [0x55ff38fa7397]
#13   Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff38f96f25, in vg::execute_in_fork(std::function<void ()> const&)
    | Source "src/index_registry.cpp", line 408, in operator()
      Source "/usr/include/c++/9/bits/std_function.h", line 688, in execute_in_fork [0x55ff38f96f25]
        685:     {
        686:       if (_M_empty())
        687: 	__throw_bad_function_call();
      > 688:       return _M_invoker(_M_functor, std::forward<_ArgTypes>(__args)...);
        689:     }
        690: 
        691: #if __cpp_rtti
#12   Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff38f94c98, in vg::VGIndexes::get_vg_index_registry()::{lambda(std::vector<vg::IndexFile const*, std::allocator<vg::IndexFile const*> > const&, vg::IndexingPlan const*, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)#50}::operator()(std::vector<vg::IndexFile const*, std::allocator<vg::IndexFile const*> > const&, vg::IndexingPlan const*, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) const::{lambda()#1}::operator()() const
      Source "src/index_registry.cpp", line 3613, in operator() [0x55ff38f94c98]
#11   Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff39247b28, in gcsa::GCSA::GCSA(gcsa::InputGraph&, gcsa::ConstructionParameters const&)
      Source "src/gcsa.cpp", line 505, in GCSA [0x55ff39247b28]
#10   Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff3927da61, in gcsa::PathGraph::extend(unsigned long)
      Source "src/path_graph.cpp", line 1011, in extend [0x55ff3927da61]
#9    Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff3927cc60, in gcsa::PathGraphBuilder::sort(unsigned long)
      Source "src/path_graph.cpp", line 405, in sort [0x55ff3927cc60]
#8    Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff39279794, in gcsa::PathGraph::read(std::vector<gcsa::PathNode, std::allocator<gcsa::PathNode> >&, std::vector<unsigned int, std::allocator<unsigned int> >&, unsigned long) const
    | Source "src/path_graph.cpp", line 1079, in resize
      Source "/usr/include/c++/7/bits/stl_vector.h", line 692, in read [0x55ff39279794]
        689:       resize(size_type __new_size)
        690:       {
        691: 	if (__new_size > size())
      > 692: 	  _M_default_append(__new_size - size());
        693: 	else if (__new_size < size())
        694: 	  _M_erase_at_end(this->_M_impl._M_start + __new_size);
        695:       }
#7    Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff38cca438, in std::vector<unsigned int, std::allocator<unsigned int> >::_M_default_append(unsigned long)
    | Source "/usr/include/c++/9/bits/vector.tcc", line 635, in _M_allocate
    |   633: 	      const size_type __len =
    |   634: 		_M_check_len(__n, "vector::_M_default_append");
    | > 635: 	      pointer __new_start(this->_M_allocate(__len));
    |   636: 	      if _GLIBCXX17_CONSTEXPR (_S_use_relocate())
    |   637: 		{
    | Source "/usr/include/c++/9/bits/stl_vector.h", line 343, in allocate
    |   341:       {
    |   342: 	typedef __gnu_cxx::__alloc_traits<_Tp_alloc_type> _Tr;
    | > 343: 	return __n != 0 ? _Tr::allocate(_M_impl, __n) : pointer();
    |   344:       }
    | Source "/usr/include/c++/9/bits/alloc_traits.h", line 443, in allocate
    |   441:       _GLIBCXX_NODISCARD static pointer
    |   442:       allocate(allocator_type& __a, size_type __n)
    | > 443:       { return __a.allocate(__n); }
    |   444: 
    |   445:       /**
      Source "/usr/include/c++/9/ext/new_allocator.h", line 114, in _M_default_append [0x55ff38cca438]
        111: 	    return static_cast<_Tp*>(::operator new(__n * sizeof(_Tp), __al));
        112: 	  }
        113: #endif
      > 114: 	return static_cast<_Tp*>(::operator new(__n * sizeof(_Tp)));
        115:       }
        116: 
        117:       // __p is not permitted to be a null pointer.
#6    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.29", at 0x7f2d4c7decea, in 
#5    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.29", at 0x7f2d4c7ea7f4, in __cxa_throw
#4    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.29", at 0x7f2d4c7ea570, in std::terminate()
#3    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.29", at 0x7f2d4c7ea505, in 
#2    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.29", at 0x7f2d4c7df0a8, in 
#1    Object "/lib/x86_64-linux-gnu/libc-2.27.so", at 0x7f2d4bf36800, in abort
      Source "/build/glibc-OTsEL5/glibc-2.27/stdlib/abort.c", line 79, in abort [0x7f2d4bf36800]
#0    Object "/lib/x86_64-linux-gnu/libc-2.27.so", at 0x7f2d4bf34e97, in raise
      Source "../sysdeps/unix/sysv/linux/raise.c", line 51, in raise [0x7f2d4bf34e97]
ERROR: Signal 6 occurred. VG has crashed. Visit https://github.com/vgteam/vg/issues/new/choose to report a bug.
Please include this entire error log in your bug report!
━━━━━━━━━━━━━━━━━━━━

But then the program captures this error, does the pruning step and constructs the index again.

warning:[IndexRegistry] Child process 37623 failed with status 34304 representing exit code 134
[IndexRegistry]: Exceeded disk use limit while performing k-mer doubling steps. Rewinding to pruning step with more aggressive pruning to simplify the graph.
[IndexRegistry]: Pruning complex regions of VG to prepare for GCSA indexing with GBWT unfolding.
[IndexRegistry]: Constructing GCSA/LCP indexes.

It runs slow, and haven't finish with three days.

Did you also meet this situation? Or maybe you set some special hyper-parameters for vg autoindex to overcome this error? (one hyper-parameter that might be related is -M --target-mem, I use the default setting:1/2 of available, which is around 500G in my case, I am not sure if I need to increase it)

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants