Add guessparse function #269

jakobnissen · 2023-02-27T10:12:39Z

This function is a quick-and-dirty parser function from AbstractString to LongSequence, with autodetection of the alphabet. It's meant to be used in ephemeral REPL work, and very clearly documented to be unstable and subject to change.

See #268

This is just a draft. The implementation is straightforward, but we might want to think about whether we want this, and what it should be called.
Preferably, the name should be:

Short, since it's meant to be used in the REPL
Very clear in that this function is based on guesswork, and is therefore not suitable for long-lasting code.

TODO:

This function is a quick-and-dirty parser function from `AbstractString` to `LongSequence`, with autodetection of the alphabet. It's meant to be used in ephemeral REPL work, and very clearly documented to be unstable and subject to change. See BioJulia#268

codecov · 2023-02-27T10:17:22Z

Codecov Report

❌ Patch coverage is 4.54545% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.58%. Comparing base (4a31474) to head (fdbb494).
⚠️ Report is 46 commits behind head on master.

Files with missing lines	Patch %	Lines
src/longsequences/constructors.jl	5.26%	18 Missing ⚠️
src/alphabet.jl	0.00%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #269      +/-   ##
==========================================
- Coverage   91.20%   90.58%   -0.63%     
==========================================
  Files          31       31              
  Lines        2400     2421      +21     
==========================================
+ Hits         2189     2193       +4     
- Misses        211      228      +17

Flag	Coverage Δ
unittests	`90.58% <4.54%> (-0.63%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

cjprybol · 2023-02-27T13:11:40Z

src/alphabet.jl

+    possible_encodings(b::UInt8)::UInt8
+
+Returns a `UInt8` with any of the 4 lower bits set:
+* Bit 0: Valid `RNA`


Valid DNA?

cjprybol · 2023-02-27T13:15:04Z

This looks great, thanks @jakobnissen! I like having both the guessparse and guess_alphabet functions, as you've done here.

This seems much more efficient than I was initially picturing (using regex or the tryparse approach), which is excellent.

kescobo · 2023-02-28T02:18:19Z

I wonder if it would be worth looking at CSV.read() for inspiration here. There, the function goes through X lines and tries to guess the type of each column. X is user-selectable but has a sensible default.

I'm wondering if we want to always return a type, or throw an error if it's ambiguous... Maybe as an alternative the the later, offer an optional type to use as a default, like guessparse(seq, LongDNA{2})

For bikeshedding, I quite liked swagparse, but that's not very discoverable... then again, might be worth being silly if we want to signal that this is experimental/unstable

Add guessparse function

fdbb494

This function is a quick-and-dirty parser function from `AbstractString` to `LongSequence`, with autodetection of the alphabet. It's meant to be used in ephemeral REPL work, and very clearly documented to be unstable and subject to change. See BioJulia#268

jakobnissen linked an issue Feb 27, 2023 that may be closed by this pull request

Record type inference #268

Closed

cjprybol mentioned this pull request Feb 27, 2023

consider removing kmer/sequence type inference from strings in favor of generalized version in BioSequences BioJulia/Kmers.jl#31

Closed

cjprybol reviewed Feb 27, 2023

View reviewed changes

cjprybol mentioned this pull request Mar 3, 2025

readgff fails with protein sequences BioJulia/GenomicAnnotations.jl#20

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add guessparse function #269

Add guessparse function #269

Uh oh!

jakobnissen commented Feb 27, 2023

Uh oh!

codecov bot commented Feb 27, 2023 •

edited

Loading

Uh oh!

cjprybol Feb 27, 2023

Uh oh!

cjprybol commented Feb 27, 2023

Uh oh!

kescobo commented Feb 28, 2023

Uh oh!

Uh oh!

Add guessparse function #269

Are you sure you want to change the base?

Add guessparse function #269

Uh oh!

Conversation

jakobnissen commented Feb 27, 2023

Uh oh!

codecov bot commented Feb 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cjprybol Feb 27, 2023

Choose a reason for hiding this comment

Uh oh!

cjprybol commented Feb 27, 2023

Uh oh!

kescobo commented Feb 28, 2023

Uh oh!

Uh oh!

codecov bot commented Feb 27, 2023 •

edited

Loading