ExoLabel
has a new parameter to tune the performance of hop-length attenuation.- Documentation and formatting updates.
ExoLabel
now uses hop-length attenuation to mitigate formation of massive communities.ExoLabel
no longer supportsinflation
, since attentuation does a better job handling this without introducing additional parameters.ExoLabel
will now print a lot less when running withverbose=TRUE
in non-interactive mode -- just as informative, but less junk caused by lots of unrendered carriage returns.
- fixes multiple bugs in EstimRearrScen
- fixes bug in GeneVector.EvoWeaver that could affect DNA-based analyses
- fixes bug that prevented building on Windows
- adds multiple clustering support for ExoLabel
ExoLabel
will no longer crash when given a network lacking a trailing newline- Various internal improvements and code refinements
- Lots of bug fixes for
ExoLabel
ExoLabel
now reports disk consumption during execution
ExoLabel
is even faster due to in-place external sort for faster file I/O- Other quality of life improvements to
ExoLabel
- First development version of Bioconductor 3.21
- Official Bioconductor 3.20 release
ExoLabel
is much much faster and does a better job cleaning up when aborted earlyExoLabel
now has fewer arguments- Updates to man pages
- Lots of internal improvements to
ExoLabel
to increase computational speed and decrease disk usage. ExoLabel
will no longer crash if given relative paths.- Adds more internal error checking to prevent some rare bugs.
- Updates man pages to reflect new changes.
- Updates
EstimateExoLabel
to reflect new changes.
ExoLabel
will no longer brick R during sorts on large files.ExoLabel
reports more progress during some lengthy processing sections whenverbose=TRUE
- Known issue: "Copying source file" step is still non-interruptable, will be fixed in a later update
ExoLabel
now allows aninflation
argument to control application of inflation
predict.EvoWeaver
now supports returning p-values separately from raw score for some algorithms.- OpenMP implementation for
EvoWeaver
algorithms that support it has been fixed
RandForest
function added to train random forest models- Associated man pages for
RandForest
andDecisionTree
objects - New methods for
DecisionTree
objects to plot and coerce todendrogram
- Small bugfix to
subset.dendrogram
- Major updates to
EvoWeaver
:predict.EvoWeaver
now returns adata.frame
by defaultMethod
arguments are updated to match their names in the associated EvoWeaver manuscript- Above changes have propagated to documentation files
- New Phylogenetic Profiling methods with improved accuracy
- New meta-methods
PhylogeneticProfiling
,PhylogeneticStructure
,GeneOrganization
,SequenceLevel
forpredict.EvoWeaver
- New pre-trained Ensemble models have been included
- Updates to
ExoLabel
for better status printing
- Bioc 3.19 release
- Addition of
FastLabelOOM
function to find communities in graphs/networks on disk space.
- Addition of
PrepareSeqs
function, beginning the process of deprecatingPairSummaries
in favor of more cohesive and user friendly functions.
- Fixes bug in JRF distance causing scores to be higher than expected.
- Updates to all EvoWeaver documentation files
- Fixed small bug in
PhyloDistance
causingMethod='JRF'
to return similarity rather than the distance - Fixed small bug in
TreeDistance.EvoWeaver
resulting in an inconsistent calculation of score when usingTreeMethods='JRF'
- Small fixes
ProtWeaver
andProtWeb
have been renamed toEvoWeaver
andEvoWeb
, respectively- New sequence level method for
EvoWeaver
- Various small internal updates to
EvoWeaver
- Minor changes to
SelectByK
and vignette
- New predictor
PAPV.ProtWeaver
to calculate p-values for presence/absence profiles. ContextTree
now usesMirrorTree
with species tree correction and p/a overlap correction- Updates to documentation
predict.ProtWeaver
now supports multiple algorithms at once (ex.predict(ew, Method=c("Jaccard", "Hamming"))
)- Documentation for
ProtWeaver
and associated methods has been updated to match recent updates.
FastQFromSRR
function added as a convenience wrapper for the SRAtoolkit functionfastq-dump
.
SuperTree
now works directly withdist
objects, providing better performance and scaling- Updates to
simMat
objects- No longer throw a warning when initialized in RStudio
- Formatting is cleaner and supports larger object names
- Updates to
NVDC.ProtWeaver
- Now supports amino acid sequences using the
DNAseqs=FALSE
argument - Now calculates a p-value-weighted score
- Now supports amino acid sequences using the
- Adds
MakeBlastDb
function to create a BLAST database from R, plus associated documentation updates - Smaller fixes to some
ProtWeaver
methodspredict.ProtWeaver
no longer returns usinginvisible
(this was annoying and unneccessary)- APC correction for
MutualInformation.ProtWeaver
removed to allow for parallelization MirrorTree.ProtWeaver
now works correctly withMTCorrection="speciestree"
CorrGL.ProtWeaver
now uses Fisher's Exact Test for p-values rather than the R value of spearman correlation
- Many internal performance improvements
ProtWeaver
almost entirely usesdist
objects rather thanmatrix
, saving significantly on memory- faster
Cophenetic
function implemented internally - Copied internal
.Call('cophenetic')
fromDECIPHER
toSynExtend
to avoid potential namespace issues - Small fixes to remove some notes from
BiocCheck::BiocCheck()
- Variety of small updates to pass
BiocCheck
- Official Bioconductor 3.17 release (even with SynExtend 1.11.8)
- Fixes various small bugs in
MoransI
- Adds some multiprocessing support (more will be added in the future)
- Slight rework to species trees and their interaction with
ProtWeaver
objectsProtWeaver
has new attributespeciesTree
, can be initialized with adendrogram object
- New method
SpeciesTree
to get species tree from aProtWeaver
object (or compute one, if it doesn't exist)
- Various internal improvements for Bioc style consistency
- Various documentation updates
- Adds new optimized
dendrapply
implementation (overloadsstats::dendrapply
) - Adds
HungarianAlgorithm
for optimal solving of the linear assignment problem (O(n^3) complexity) - Adds new C code for fast computation of Pearson's R and p-value
- Adds new
Ancestral.ProtWeaver
algorithm for calculating coevolution from correlated residue changes - Supporting code and documentation for
Ancestral.ProtWeaver
- Other new internal methods
- Various updates and optimizations to internal methods and documentations
- Updates
GRF
method to be calledCI
(for Clustering Information Distance) Method="CI"
inPhyloDistance
now calculates an approximate p-value using simulated data from Smith (2020)
- Adds new Residue method
NVDT
using gene sequence Natural Vector with Dinucleotide and Trinucleotide frequency - Adds some new C methods to speed up calculation of NVDT
- Fixes
.Call()
not usingPACKAGE="SynExtend"
- Updates to documentation
- Adds new colocalization algorithm
ColocMoran
, usesColoc
withMoransI
to correct for phylogenetic signal - Adds new colocalization algorithm
TranscripMI
, uses mutual information of transcriptional direction - Adds new corrections/checks to allow for transcriptional direction to be in labels
- Various bugfixes to support new four number labelling scheme
- Various documentation updates
- Adds new function
MoransI
to calculate Moran's I for a set of spatially distributed signals
- Internal code refactor
ShuffleC
now supports reproducibility using R'sset.seed
ShuffleC
now support sampling with replacement, performance is around 2.25x faster thansample
- Internal bugfixes for JRF Distance--previous commit was incorrectly calculating values
- Adds new
TreeDistance
predictor forProtWeaver
, incorporating all tree distance metrics; these metrics are bundled due to some backend optimizations that improve performance - Bugfixes for
PhyloDistance
- Adds Random Projection for
MirrorTree
predictor to solve memory problems and increase accuracy - New internal random number generator using xorshift, significantly faster than
sample()
HammingGL
changed toCorrGL
, now uses Pearson's R weighted by p-value- Refactors internal predictors to reduce size of codebase and remove redundancies
- Internal
ShuffleC
function to replicatesample
functionality with 2-6x speedup - Method
GainLoss
now uses bootstrapping to estimate a p-value - Updates to documentation files
- Adds KF Distance for trees
- Adds Jaccard Robinson Foulds Distance for trees
- Reworks tree distances into
PhyloDistance
function - Numerous new documentation pages
- Updates internal functions to use
rapply
instead ofdendrapply
to avoid stack overflow issues due to R recursion
- Minor bugfix to RF distance
- updates gitignore for workflows
- Memory leak bugfix
- Adds new
RFDist
function to calculate Robinson-Foulds Distance - Adds normalization for
GeneralizedRF
to make the distance between 0 and 1 - Minor bugfixes
- Documentation for new functions
- Adds new
GeneralizedRF
function to calculated information-theoretic Generalized Robinson-Foulds distance between two dendrograms. - Documentation for new function
- New ProtWeaver predictor based off of
GeneralizedRF
metric - New internal C source code for
GeneralizedRF
- Adds new
DPhyloStatistic
function to calculate the D-statistic for a binary state against a phylogeny following Fritz and Purvis (2009). - Documentation for new function
- new internal C source code for
DPhyloStatistic
- new internal C source code for random utility functions, currently only has functions to generate random numbers
- Various internal improvements to presence/absence profile methods
- Adds new prediction algorithm
GainLoss
- Adds new internal C implementation of dendrograms, significantly faster than R dendrograms
ProtWeaver
methodsBehdenna
andGainLoss
can now infer a species tree when possible- Updates
Jaccard
andHamming
methods to use C implementations for distance calculation - Adds
HammingGL
method to calculate Hamming distance of gain/loss events - Minor bugfixes to
ProtWeaver
methods relating to subsetting - Updates to various
man
pages
- Removes
flatdendrapply
, function was already included in SynExtend - minor bugfixes to
ProtWeaver
- Edits to
SelectByK
, function can work as intended, but is still too conservative at false positive removal.
- Adds new function
flatdendrapply
for more options to apply functions to dendrograms. Function is used inSuperTree
. - Adds new function
SuperTree
to construct a species tree from a set of gene trees. - Adds new dataset
SuperTreeEx
forSuperTree
andflatdendrapply
examples.
SelectByK
function argumentClusterSelect
switched toClusterScalar
. Cluster number selection now performed by fitting sum of total within cluster sum of squares to a right hyperbola and taking the ceiling of the half-max. Scalar allows a user to pick different tolerances to select more, or less clusters. Plotting behavior updated.
simMat
class now supports empty indexing (s[]
)simMat
class now supports logical accession (s[c(T,F,T),]
)
- Added the function
SelectByK
that allows for quick removal of false positive predicted pairs based on a relatively simple k-means approach. Function is currently designed for use on the single genome-to-genome pairwise comparison, and not on an all-vs-all many genomes scale, though it may provide acceptable results on that scale.
- New
simMat
class fordist
-like similarity matrices that can be manipulated like base matrices - Major update to
ProtWeaver
internals - All internal calls use
simMat
objects whenever possible to decrease memory footprint- Note
ContextTree
andProfDCA
require matrices internally
- Note
ProtWeb
objects now inherit fromsimMat
ProtWeb.show
andProtWeb.print
now display predictions in a more natural wayGetProtWebData()
deprecated;ProtWeb
now inheritsas.matrix.simMat
andas.data.frame.simMat
- New documentation pages for
simMat
class GetProtWebData
documentation page reworked intoProtWeb
documentation file.- Fixes new bug in
Method='Hamming'
introduced in SynExtend 1.9.9
- Fixes minor bug in
Method='Hamming'
- Moves some code around
- Major refactor to file structure of
ProtWeaver
to make individual files more manageable - Adds new documentation files for individual prediction streams of
predict.ProtWeaver
BlockReconciliation
now returns a an object of classPairSummaries
.
- Fixes an error where warnings were mistakenly output to the user
- Moves platform-specific files in
src/
(originally added by mistake)
- Lots of bugfixes to
ResidueMI.ProtWeaver
predict.ProtWeaver
now correctly labels rows/columns with gene names, not numberspredict.ProtWeaver
now correctly handlesSubset
argumentspredict.ProtWeaver(..., Subset=3)
will correctly predict for all pairs involving gene3
(or for any genex
, as long asSubset
is a length 1 character or integer vector).
- Adds residue MI method to
ProtWeaver
- Various bugfixes for
ProtWeaver
- Various improvements for
GenRearrScen
, improves consistency and output formatting - Major bugfix for
ProtWeaver
methods using dendrogram objects ProtWeaver
now correctly guards against non-bifurcating dendrograms in methods that expect it
- Introduces new
ProtWeaver
class to predict functional association of genes from COGs or gene trees. This implements many algorithms commonly used in the literature, such as MirrorTree and Inverse Potts Models. predict(ProtWeaverObject)
returns aProtWeb
class with information on predicted associations.- Adds
BlastSeqs
to run BLAST queries on sequences stored as anXStringSet
orFASTA
file.
- Updates to
ExtractBy
function. Methods and inputs simplified and adjusted, and significant improvements to speed.
- Updated
NucleotideOverlaps
to now correctly registers hits in genes with a large degree of overlap with the immediately preceding gene. - Fixed aberrant behavior in
BlockExpansion
where contigs with zero features could cause an error in expansion attempts.
BlockReconciliation
now allows for setting either block size or mean PID for reconciliation precedence.
- Added retention thresholds to
BlockReconciliation
.
BlockExpansion
cases corrected for zero added rows.
- Improvements to
BlockExpansion
andBlockReconciliation
functions.
- Began integration of
DECIPHER
'sScoreAlignment
function.
- Fixed a bug in
PairSummaries
function.
- Added
BlockExpansion
function.
- Adjustment in how
PairSummaries
handles default translation tables and GFF derived gene calls.
- Large changes under the hood to
PairSummaries
. - Failure to accurately assign neighbors in some cases should now be fixed.
- Extraction of genomic features is now faster.
OffSetsAllowed
argument now defaults toFALSE
. This argument may be dropped in the future in favor of a more complex function post-summary.- Small edits to
SequenceSimilarity
- Added the function
SubSetPairs
that allows for easy trimming of predicted pairs based on conflicting predictions and / or prediction statistics. - Added the function
EstimageGenomeRearrangements
that generates rearrangement scenarios of large scale genomic events using the double cut and join model.
- Added the function
SequenceSimilarity
and made improvements to runtime inDisjointSet
.
- Fixed a small bug in consensus scores in
PairSummaries
where features facing on different strands had their score computed incorrectly.
- Changes to concensus score in
PairSummaries
.
- Major changes to the
PairSummaries
function and minor changes toNucleotideOverlaps
,ExtractBy
, andFindSets
. Adjustments to the model thatPairSummaries
calls on to predict PIDs.
ExtractBy
function has been added. Allows extraction of feature sequences intoXStingSet
s organized by the aPairSummaries
object or the single linkage clusters implied by pairings within thePairSummaries
objects.DisjointSet
function added to extract single linkage clusters from aPairSummaries
object.
PairSummaries
now computes 4-mer distance between predicted pairs.
PairSummaries
now returns a column titled Adjacent that provides the number of directly adjacent neighbor pairs to a predicted pair. Gap filling code adjusted.- The function
FindSets
has been added and performs single linkage clustering on a pairs list as represented by vectors of integers using the Union-Find algorithm. Long term this function will have a larger wrapper function for user ease of access but will remain exposed.
NucleotideOverlap
now passes it's GeneCalls object forward, allowingPairSummaries
to forego inclusion of that object as an argument.
- Minor vignette and suggested package changes.
PairSummaries
now allows users to fill in specific matching gaps in blocks of predicted pairs with the argumentsAllowGaps
andOffSetsAllowed
.
- Adjustments to progress bars in both
PairSummaries
andNucleotideOverlap
. - PID prediction models in
PairSummaries
adjusted.
- Contig name matching has been implemented. Scheme expects users to follow NCBI contig naming and gff formats, accepting contig names from gffs directly, and removing the first whitespace and everything thereafter from FASTA headers. Contig name matching can be disabled if users wish, using the argument
AcceptContigNames
, but ensuring that the correct contigs in GeneCalls objects are matched to the appropriate contigs in Synteny objects are then the user's responsibility.
PairSummaries
now translates sequences based ontransl_table
attributes provided by gene callsPairSummaries
now uses a generic model for predicting PIDgffToDataFrame
now parses out thetransl_table
attribute
- Minor changes to
NucleotideOverlap
- Major changes to
PairSummaries
- can now take in objects of classGenes
build by the DECIPHER functionFindGenes()
- Vignette and help files edited for clarity
SynExtend
submitted to Bioconductor- Added function
gffToDataFrame
- Added function
NucleotideOverlap
- Added function
PairSummaries