Skip to content

Commit 260d44c

Browse files
committed
update readme with python stuff
1 parent a558aa5 commit 260d44c

File tree

1 file changed

+35
-32
lines changed

1 file changed

+35
-32
lines changed

README.md

+35-32
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,7 @@ cargo test --release --target $target \
1010

1111
# bedder (tools)
1212

13-
This is an early release of the library for feedback, especially from rust practitioners. If interested,
14-
read below and then, for example, have a look at [issue 2](https://github.com/quinlan-lab/bedder-rs/issues/2) and the associated [discussion](https://github.com/quinlan-lab/bedder-rs/discussions/3)
13+
This is an early release of the library for feedback, especially from rust practitioners.
1514

1615
## Problem statement
1716

@@ -22,46 +21,50 @@ We want a library (in bedder) that is:
2221
2. fast enough
2322
3. extensible so that we don't need a custom tool for every possible use-case.
2423

25-
## Solution
24+
### Solution
2625

2726
To do this, we provide the machinery to intersect common genomics file formats (and more can be added by implementing a simple trait)
28-
and we allow the user to write custom python snippets that are applied to that intersection.
27+
and we allow the user to write python functions that then write columns to the output.
2928
As a silly example, the user may want to count overlaps but only if the start position of the overlapping interval is even; that could be
3029
done with this expression:
3130

32-
```Python
33-
len(o for o in intersection.overlapping if o.start % 2 == 0])
31+
```python
32+
def bedder_odd(fragment) -> int:
33+
"""return odd if the start of the query interval is odd. otherwise even"""
34+
return sum(1 for _ in fragment.b if b.start % 2 == 0)
3435
```
3536

36-
It is common to require certain *constraints* on the intersections like a percent or number of bases of overlap.
37-
We can get those with:
37+
There are several things to note here:
3838

39-
```python
40-
a_mode = PyIntersectionMode.default() # report the full interval like -v in bedtools
41-
b_part = PyIntersectionPart.inverse() # report the part of the b-intervals that do not overlap the a-interval
42-
a_requirements = PyOverlapAmount.fraction(0.5) # require at least 50% of the a_interval to be covered
43-
report = intersection.report(a_mode, None, None, b_part, a_requirements, None)
44-
result = []
45-
for ov in report:
46-
line = [f"{ov.a.chrom}\t{ov.a.start}\t{ov.a.stop}"]
47-
for b in ov.b:
48-
line.append(f"{b.start}\t{b.stop}")
49-
result.append("\t".join(line))
50-
"\n".join(result)
39+
+ The function name must start with `bedder_` what follows that will be used as the name in the command-line and as the output e.g. in the VCF INFO field.
40+
+ The function must have a return type annotation (`int`, `str`, `float`, `bool` are supported)
41+
+ The docstring will be used as the description for VCF output, if appropriate
42+
+ The function must accept a fragment--that is, a piece of an alignment.
43+
44+
This function, if placed in a file named `example.py` could be used as:
45+
46+
```bash
47+
bedder -a some.bed -b other.bed -P example.py -c 'py:odd`
5148
```
5249
50+
where `odd` matches the function name above.
51+
52+
There is more info on the use of python and the various intersection modes in [the examples](tests/examples/README.md)
53+
54+
## Library
55+
5356
This library aims to provide:
5457
55-
- [x] an abstraction so any interval types from sorted sources can be intersected together
56-
- [x] the rust implementation of the heap and Queue to find intersections with minimal overhead
57-
- [ ] bedder wrappers for:
58-
- [x] bed
59-
- [x] vcf/bcf
60-
- [ ] sam/bam/cram
61-
- [ ] gff/gtf
62-
- [ ] generalized tabixed/csi files
63-
- [ ] downstream APIs to perform operations on the intersections
64-
- [ ] a python library to interact with the intersections
58+
+ [x] an abstraction so any interval types from sorted sources can be intersected together
59+
+ [x] the rust implementation of the heap and Queue to find intersections with minimal overhead
60+
+ [ ] bedder wrappers for:
61+
+ [x] bed
62+
+ [x] vcf/bcf
63+
+ [ ] sam/bam/cram
64+
+ [ ] gff/gtf
65+
+ [ ] generalized tabixed/csi files
66+
+ [ ] downstream APIs to perform operations on the intersections
67+
+ [ ] a python library to interact with the intersections
6568
6669
The API looks as follows
6770
@@ -116,5 +119,5 @@ We use `Rc` because each database interval may be attached to more than one quer
116119

117120
# Acknowledgements
118121

119-
- We received very valuable `rust` feedback and code from @sstadick.
120-
- We leverage the excellent [noodles](https://github.com/zaeleus/noodles) library.
122+
+ We received very valuable `rust` feedback and code from @sstadick.
123+
+ We leverage the excellent [noodles](https://github.com/zaeleus/noodles) library.

0 commit comments

Comments
 (0)