You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+35-32
Original file line number
Diff line number
Diff line change
@@ -10,8 +10,7 @@ cargo test --release --target $target \
10
10
11
11
# bedder (tools)
12
12
13
-
This is an early release of the library for feedback, especially from rust practitioners. If interested,
14
-
read below and then, for example, have a look at [issue 2](https://github.com/quinlan-lab/bedder-rs/issues/2) and the associated [discussion](https://github.com/quinlan-lab/bedder-rs/discussions/3)
13
+
This is an early release of the library for feedback, especially from rust practitioners.
15
14
16
15
## Problem statement
17
16
@@ -22,46 +21,50 @@ We want a library (in bedder) that is:
22
21
2. fast enough
23
22
3. extensible so that we don't need a custom tool for every possible use-case.
24
23
25
-
## Solution
24
+
###Solution
26
25
27
26
To do this, we provide the machinery to intersect common genomics file formats (and more can be added by implementing a simple trait)
28
-
and we allow the user to write custom python snippets that are applied to that intersection.
27
+
and we allow the user to write python functions that then write columns to the output.
29
28
As a silly example, the user may want to count overlaps but only if the start position of the overlapping interval is even; that could be
30
29
done with this expression:
31
30
32
-
```Python
33
-
len(o for o in intersection.overlapping if o.start %2==0])
31
+
```python
32
+
defbedder_odd(fragment) -> int:
33
+
"""return odd if the start of the query interval is odd. otherwise even"""
34
+
returnsum(1for _ in fragment.b if b.start %2==0)
34
35
```
35
36
36
-
It is common to require certain *constraints* on the intersections like a percent or number of bases of overlap.
37
-
We can get those with:
37
+
There are several things to note here:
38
38
39
-
```python
40
-
a_mode = PyIntersectionMode.default() # report the full interval like -v in bedtools
41
-
b_part = PyIntersectionPart.inverse() # report the part of the b-intervals that do not overlap the a-interval
42
-
a_requirements = PyOverlapAmount.fraction(0.5) # require at least 50% of the a_interval to be covered
line = [f"{ov.a.chrom}\t{ov.a.start}\t{ov.a.stop}"]
47
-
for b in ov.b:
48
-
line.append(f"{b.start}\t{b.stop}")
49
-
result.append("\t".join(line))
50
-
"\n".join(result)
39
+
+ The function name must start with `bedder_` what follows that will be used as the name in the command-line and as the output e.g. in the VCF INFO field.
40
+
+ The function must have a return type annotation (`int`, `str`, `float`, `bool` are supported)
41
+
+ The docstring will be used as the description for VCF output, if appropriate
42
+
+ The function must accept a fragment--that is, a piece of an alignment.
43
+
44
+
This function, if placed in a file named `example.py` could be used as:
45
+
46
+
```bash
47
+
bedder -a some.bed -b other.bed -P example.py -c 'py:odd`
51
48
```
52
49
50
+
where `odd` matches the function name above.
51
+
52
+
There is more info on the use of python and the various intersection modes in [the examples](tests/examples/README.md)
53
+
54
+
## Library
55
+
53
56
This library aims to provide:
54
57
55
-
-[x] an abstraction so any interval types from sorted sources can be intersected together
56
-
-[x] the rust implementation of the heap and Queue to find intersections with minimal overhead
57
-
-[ ] bedder wrappers for:
58
-
-[x] bed
59
-
-[x] vcf/bcf
60
-
-[ ] sam/bam/cram
61
-
-[ ] gff/gtf
62
-
-[ ] generalized tabixed/csi files
63
-
-[ ] downstream APIs to perform operations on the intersections
64
-
-[ ] a python library to interact with the intersections
58
+
+ [x] an abstraction so any interval types from sorted sources can be intersected together
59
+
+ [x] the rust implementation of the heap and Queue to find intersections with minimal overhead
60
+
+ [ ] bedder wrappers for:
61
+
+ [x] bed
62
+
+ [x] vcf/bcf
63
+
+ [ ] sam/bam/cram
64
+
+ [ ] gff/gtf
65
+
+ [ ] generalized tabixed/csi files
66
+
+ [ ] downstream APIs to perform operations on the intersections
67
+
+ [ ] a python library to interact with the intersections
65
68
66
69
The API looks as follows
67
70
@@ -116,5 +119,5 @@ We use `Rc` because each database interval may be attached to more than one quer
116
119
117
120
# Acknowledgements
118
121
119
-
- We received very valuable `rust` feedback and code from @sstadick.
120
-
- We leverage the excellent [noodles](https://github.com/zaeleus/noodles) library.
122
+
+ We received very valuable `rust` feedback and code from @sstadick.
123
+
+ We leverage the excellent [noodles](https://github.com/zaeleus/noodles) library.
0 commit comments