Genoogle

Fundamental Information

Genoogle is software for similar DNA sequences searching developed by Felipe Albrecht: home page, email contact.

Genoogle uses indexing and parallel processing techniques and it is developed in Java. Genoogle is free and open source. The name Genoogle comes from Genes + Google, the final world domination plan is to develop a software to locate genes likes Google is to locate information in the Web and Genoogle does not have any affiliation with Google Inc and I hope its name will not cause problem.

It is a Beta Version of Genoogle, it means: it lacks some features and have some know and a lot of unknow bugs. So, I hope that the users (YOU!) will inform me about bugs and features which you will like to have.

If you really want to develop something in to Genoogle, contact me.

Features

Actual Features:

Fast similar sequences searching.
Really good sensibility.
Text mode interfaces.
Web Services Interface.
Very Simple Web interface, but support for JSP.
Good memory requirements. (For a 4 gigabytes data bank, it is necessary not more than 4 gigabytes of RAM memory).
Working (and tested) at Windows and Linux.
Data banks with more than 8 gigabytes.
Console and batch interfaces.

Missing and planned features:

Better web interface.
RNA indexing and searching sequences.

Missing and not (for so soon) planned features:

Proteins indexing and searching. (It will be a big work to implement it, but it is possible)
Clusters implementation. (May be my Ph.D. project)

Intallation

Requirements

To run Genoogle it is necessary:

JRE >= 1.6 and the environment variable JAVA_HOME should inform where the JRE is, by example: JAVA_HOME="/usr/lib/jvm/java-6-sun"
Ram Memory: The Genoogle memory requirement is approximately 80% of the data bank size more approximately 40Mbs for Java run time.

Installation process

Download the package here (TBA)
Unpack
Copy the fasta files data banks into the fasta/files folder.
Configure the conf/genoogle.xml file and insert the copied files at the <genoogle:split-databanks> section as new <genoogle:databank>:

<genoogle:split-databanks name="RefSeq" path="files/fasta" mask="111010010100110111" number-of-sub-databanks="1" sub-sequence-length="11"> <genoogle:databank name="Cow" path="cow.rna.fna" /> <genoogle:databank name="Frog" path="frog.rna.fna" /> </genoogle:split-databanks> ```

Run the format_db.sh script.
Wait while the data bank is formatted and the inverted index processed.
Execute :
- run_web.sh, for webservices, web page and col),
- or run_standalone_web.sh, for web page that will access Genoogle by webservice,
- or run_console.sh, for console only interface.
Have fun!

Searching

Genoogle has two interfaces: a very simple web page, text mode console, and WebServices interface.

To do the search using the web page is very simples. Open the address localhost:8080 at your browser and put the query sequence in the input box and click Search sequence button. Wait and the results will be shown. The console interface is much better!

To use the WebServices, please check their wiki.

Console interface

The console interface has the following commands:

search : does the search.
list : lists the data banks.
parameters : shows the search parameters and their values.
set = : set the parameter value.
gc : executes the java garbage collection.
prev or l : executes the last command.
batch : runs the commands listed in this batch file.
exit : finish Genoogle execution.

The search parameters are:

MaxSubSequenceDistance : maximum index entries distance to be considered in the same HSPs.
SequencesExtendDropoff : drop off for sequence extension.
MaxHitsResults : maximum quantity of returned results.
QuerySplitQuantity : how many slices the input query will be divided.
MinQuerySliceLength : minimum size of each input query slice.
MaxThreadsIndexSearch : quantity of threads which will be used to index search. ( Should be MaxThreadsIndexSearch <= QuerySplitQuantity * 2).
MaxThreadsExtendAlign : quantity of threads which will be used to extend and align the HSPs.
MatchScore : score when has a match at the alignment.
MismatchScore : score when has a mismatch at the alignment.

An example of search is shown bellow:

search Genomes_RefSeq BA000002 result_file QuerySplitQuantity=2 MaxThreadsIndexSearch=2 MaxHitsResults=20

This search, make a search at the Genomes_RefSeq databank, using as input the file BA000002 and the results will be saved at "result_file.xml" file. The input query will be split in to 2 parts and will be used 2 threads to do the search of the input query sub-sequences at the inverted index. At the end will be returned to the user, the 20 better scores.

Dependencies

Genoogle uses:

Dom4j for XML parsing.
JUnit4 for unit testing.
Easy Mock and CgLib for Mock creation at the JUnits.
Google Collections
Protocol Buffers for Data bank and index serialization.
Log4J for loggin.
Jetty for embedded web server.
Jax-WS for WebServices implementation.

All these libraries are in the directory https://github.com/felipealbrecht/Genoogle/tree/master/lib

Name		Name	Last commit message	Last commit date
Latest commit History 258 Commits
.settings		.settings
3rt-party		3rt-party
conf		conf
data/populator		data/populator
files		files
images		images
inputs		inputs
lib		lib
manifest		manifest
proto		proto
src/bio/pih		src/bio/pih
tests/bio/pih/genoogle/tests		tests/bio/pih/genoogle/tests
webapps		webapps
.classpath		.classpath
.gitignore		.gitignore
.project		.project
GPL		GPL
LICENSE		LICENSE
README.md		README.md
build.xml		build.xml
format_db.sh		format_db.sh
run_console.sh		run_console.sh
run_standalone_web.sh		run_standalone_web.sh
run_web.sh		run_web.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Genoogle

Fundamental Information

Features

Actual Features:

Missing and planned features:

Missing and not (for so soon) planned features:

Intallation

Requirements

Installation process

Searching

Console interface

Dependencies

About

Releases

Packages

Contributors 2

Languages

License

felipealbrecht/Genoogle

Folders and files

Latest commit

History

Repository files navigation

Genoogle

Fundamental Information

Features

Actual Features:

Missing and planned features:

Missing and not (for so soon) planned features:

Intallation

Requirements

Installation process

Searching

Console interface

Dependencies

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages