Releases: tjparnell/biotoolbox
Releases · tjparnell/biotoolbox
BioToolBox-v1.44
- Added new function to GeneTools for exporting to GTF format.
- Added new function to filter transcript subfeatures in a gene SeqFeature object by available Ensembl Transcript Support Level tags.
- Fixed critical bug with collapsing multiple transcripts in GeneTools function that resulted in too many overlapping exons.
- Fixed bug in exporting non-coding gene models to UCSC refFlat format.
- Other minor bug fixes.
BioToolBox-v1.43
- Fix bug with unique option in script get_gene_regions where too many regions were being discarded. Thanks to Mengyao.
- Fix bug with generating bigWig files in script bam2wig, and restore option to prefer bedGraphToBigWig if so desired
- Add option to ignore extraneous attribute tags when parsing GFF and GTF files to reduce memory (simplify). Enable this option by default when parsing annotation files when loading a table in Bio::ToolBox::Data.
BioToolBox-v1.42
- Changed bigWig convertor method to use primarily the wigToBigWig
utility for simplicity - Introduced new method to open a wigToBigWig utility filehandle to
"print" wig files directly to a bigWig - Updated bam2wig and data2wig scripts to write directly to the
bigWig utility and skip writing temporary intermediate wig file - Added functionality to bam2wig to record stranded shifted counts
- Fixed a critical bug in script get_gene_regions where transcripts
weren't being filtered - Improved file format taste testing to avoid GFF false positives
- Improved UCSC gene table parser behavior
BioToolBox-v1.41
- Added no header option when loading text files missing a column header row. Updated script manipulate_datasets to take advantage of the feature.
- Added option to combine multiple score columns into a single score when converting a file to a wig file in script data2wig
- Added option to split gff or vcf data files by an attribute tag in script split_data_file
- Improve handling of writing vcf files
- Fix critical errors with calculating cdsStart and cdsEnd in the GeneTools library
- Fix bugs in gff parser to continue when encountering errors in parsing and interpret transcript biotype gtf attributes
- Fix bug in properly handling start coordinates in script data2wig
BioToolBox-v1.40
- Major update introduces new SeqFeature object Bio::ToolBox::SeqFeature that is a little faster and more compact than equivalent BioPerl objects. This is the default object used in gene table parsers.
- New Module Bio::ToolBox::GeneTools for working with SeqFeature objects representing traditional nested feature gene, transcript, exon models. The script get_gene_regions now uses this module, as do other scripts.
- Expunged many scripts that are no longer considered part of the primary mission of the BioToolBox distribution. These are now available in a separate repository located at https://github.com/tjparnell/HCI-Scripts.
- Bio::ToolBox::Data objects can now parse all gene tables into memory and store the features in the object. This allows gene tables to be used without requiring a database to be setup.
- Added a file tasting method to determine whether a file looks like a specific file format, e.g. gff, UCSC gene table, etc.
- Added numerous little methods and method aliases here and there to improve functionality
- Added attribute rewrite functions for both GFF and VCF files
- Improved file format testing
- Numerous little optimizations in loading files
BioToolBox-v1.36
- added new option to script get_relative_data to allow user to specify what feature types to avoid
- fix bugs in scripts manipulate_datasets when exporting log2 treeview files and defining x axes in graph_profile
- fix annoying bug where manipulate_datasets will not re-show column list
- improve data file summarization
- some library method optimizations
BioToolBox-v1.35
- Add new options for setting dimensions and linear regression lines in script graph_data.
- Restored unique option in script data2gff.
- New convenience methods for Feature objects.
- Fixed bug with smoothing interpolation in get_relative_data
- Numerous other bug fixes regarding bed files, column names, file support, warnings.
BioToolBox-v1.34
- Changed the behavior of automatically converting interbase coordinates
to base coordinates upon loading a file, and converting back as necessary
when writing. This had the side effect of effectively changing coordinates
when writing out nonstandard text files. Conversion is now done on the fly
when using the start method of row Features. Start interbase coordinates
are now recognized by appending a 0 to the column name. Output files should
now look like the input files. - Strand values are not automatically converted upon loading; They are
converted as necessary on the fly using the row Feature strand method. - Null values are not automatically converted to internal '.' null values.
They are converted as necessary using the row Feature value method to
maintain backward compatibility. - Scripts data2bed and data2wig go back to using a Stream input to avoid
high memory usage. - Script data2wig now has a fast option to skip lots of checks on values
and intervals. This speeds up conversion considerably at the risk of
making improper wig files if the source file has issues. - Script join_data_file is considerably faster by simply concatenating
data lines without processing or checking. - Script bam2wig has new recording option, mid extend, to record the
middle portion of alignments or proper paired-end alignments. Credit to
Ohad for recommending. - Add explicit interbase support to scripts data2gff and data2fasta.
- Fix critical bug were extensions were not scored properly for coordinate
features in script get_binned_data. Thanks to Mengyao. - Fix bam2wig alignment alignment illustrations in POD. Thanks to Ohad.
- Bug fixes regarding bed file integrity checking that were introduced in
the previous release.
BioToolBox-v1.33
- Removed legacy_helper module. All scripts now properly updated to
use Bio::ToolBox::Data and related objects. This was the last step of
a long process to modernize all of the scripts to use the new libraries. - All data collection modules are now chromosome naming-scheme agnostic,
meaning that "chr1" and "1" for chromosome can be used equally, regardless
of what the annotation or big data file uses. - Minimal VCF file support is added, including the ability to parse INFO
and SAMPLE attributes, and verify some file format integrity. - Significantly improve GTF file parsing.
- Improve file format verification, including printing error messages.
This should alleviate cryptic reasons for automatic file extension changes. - Tons of bug fixes. See GitHub for a full change log.
v1.32
No new features, just lots of bug fixes introduced in last version.
- Fix bug with adding a new column to Data object, particularly when selected from a database.
- Fix bugs related to adding, deleting, or modifying columns for a specific file format, such as BED or GFF
- Introduce additional Data structure verification tests, including proper strand information, to verify correct file formatting, such as BED and GFF
- Fix bugs when writing data files that incorrectly maintained file extensions for a given format even when the structure was no longer valid.
- Add support for .bigwig and .bigbed file extensions.
- Fix bug with opening fai fasta index and forked databases in script CpG_calculator.