This GUI tool is designed to facilitate the preprocessing of sequencing data. It provides a user-friendly interface to perform various operations on sequencing files, such as adapter trimming, mapping, removing unwanted reads, converting SAM to BAM, removing PCR duplicates, and converting to bedGraph and bigWig formats.
This program has a standard naming rule. Cell line name-number_Read number.format or other informations
Ex) HCT116-1_R1.fastq.gz, HEK293T-1_R2.fastq.gz, HCT116-1.sam.
If the Cell line name contains '-', '_', '.', an error may occur, so pre-processing is required.
Make sure you have the following software installed on your system:
- Python (>= 3.7)
- Tkinter (included with Python)
- Cutadapt
- Bowtie2
- Samtools
- picard-tools;MarkDuplicates.jar (optional, for remove PCR duplicates)
- bam2bed_shift.pl (optional)
- norm_bedGraph.pl (optional)
- genomeCoverageBed (optional)
- GenomeCoverageBed (optional)
- CRISPResso2 (optional, for CRISPResso)
- Clone the repository from GitHub:
git clone https://github.com/Operon1226/GUI_sequencing_preprocessor.git
- Navigate to the cloned directory:
cd GUI_sequencing_preprocessor
- Run the GUI:
python3 gui.py
Please set the location of Python to use when you run this program in shebang. Default is '#!/usr/bin/env python3'
In Python, installed as Anaconda, we found that changing the font or size of the letters did not apply well. Please set original interpreters; ex) #!/usr/bin/env python3 or #!/usr/bin/python3
Traceback (most recent call last):
File "gui.py", line 108, in <module>
main()
File "gui.py", line 102, in main
mainframe = tk.Tk()
File "/home/intern/anaconda3/lib/python3.7/tkinter/__init__.py", line 2023, in __init__
self.tk = _tkinter.create(screenName, baseName, className, interactive, wantobjects, useTk, sync, use)
_tkinter.TclError: no display name and no $DISPLAY environment variable
This error may occur if X11 forwarding is disabled during the SSH process. To resolve it:
- Right-click on the server icon.
- Click "Edit Session."
- Click "Advanced SSH settings."
- Check "X11-forwarding."
- Reconnect to the server.
Add the '-X' option when connecting to the server using SSH: `ssh -X -p port ID@IP``
connect /tmp/.X11-unix/X0: No such file or directory
connect /tmp/.X11-unix/X0: No such file or directory
Traceback (most recent call last):
File "gui.py", line 782, in <module>
main()
File "gui.py", line 776, in main
mainframe = tk.Tk()
File "/home/intern/anaconda3/lib/python3.7/tkinter/__init__.py", line 2023, in __init__
self.tk = _tkinter.create(screenName, baseName, className, interactive, wantobjects, useTk, sync, use)
_tkinter.TclError: couldn't connect to display "localhost:12.0"
If you run a program using a different tkinter on the same account, you may have problems.
- Open the GUI application.
- Use the 'File' menu to import your sequencing files.
- Select the preprocessing operation you want to perform from the 'Operation' menu.
- Follow the prompts and provide any required input.
- Click on the operation to execute it, and view the progress and results in the display.
This function uses the 'cutadapt' command for adapter trimming. The adapter sequence is set to 'CTGTCTCTTATACACATCT'.
This function uses the 'bowtie2' command for mapping. mapping option is 'very sensitive' This program also implemented to take the current CPU usage and show the number of threads available so that more than one thread can be used when mapping.
This function uses awk command and samtools. Using awk, they filtered chrM and chrEBV. And also, using samtools view, selects only reads with a mapping quality score greater than or equal to 10. output files are "name.pe.q10.sort.bam"
This function uses samtools. They will convert sam files to bam files.
This function uses picard-tools;MarkDuplicates.jar, and samtools
This function uses genomeCoverageBed, bam2bed_shift.pl, norm_bedGraph.pl, bedGraphToBigWig
If a file already exists, there is a function to ask if you want to overwrite it.
If you already have a directory, this program has the ability to ask if you want to save outputs to that directory. It also checks that the directory entered is valid. And if the input files' directory is different, a warning popup will occur and they will request setting the output directory to users
When operations are executed, log files are created at same directory as output files. And if, error were occured during executing, error messages are also saved as "Error_message.txt". Since samtools and bowtie2 return values as stderr, It is not created when you run both tools.
If input files are not proper to executing that operation, warning message will printed to GUI display.
The 'PopupWindow' class is used to control popup windows. It has three methods:
This function is designed to receive input from the user and returns the user's input. The 'title' parameter indicates the title of the popup window, and the 'detail' parameter provides a description displayed below the title.
This function creates a popup window with the title "Running command" to indicate that a command is executing. The 'detail' parameter works similarly to the 'popup_input' function.
This function is used to destroy popup windows.
The 'TextModule' class is used to control the text displayed in the GUI. It has three methods:
This function is designed to indicate imported file paths on the display. It takes a list of file paths as a parameter.
This function is used to print text on the display. The 'text' parameter must be a string.
This function is used to clear all information displayed on the GUI.
The 'FileMenu' class is responsible for controlling the 'File' menu in the menubar. It has three functions:
This function is designed to open the file explorer and save the selected file paths to another variable.
This function is used to exit the GUI program and terminate all subprocesses.
This function is used to receive the output directory from the user and set it as the output directory. It also checks whether the entered directory is valid.
The 'OperationMenu' class is responsible for controlling various preprocessing operations. It includes methods for adapter trimming, mapping, removing unwanted reads, converting SAM to BAM, removing PCR duplicates, and converting to bedGraph and bigWig formats.
This function performs adapter trimming on the imported ATAC-seq files. It uses the 'cutadapt' command with the default adapter sequence 'CTGTCTCTTATACACATCT'.
This function maps the preprocessed reads to a reference genome using Bowtie2.
This function removes unwanted reads based on chromosome(chrM,chrEBV). And also, using samtools view, selects only reads with a mapping quality score greater than or equal to 10.
This function converts the mapped reads from SAM to BAM format using Samtools.
This function removes PCR duplicates from the BAM files using picard-tools.
This function converts the BAM files to bedGraph format and bigwig format. This function converts BAM files to bed files using bam2bed_shift.pl And using genomeCoverageBed, convert bed to bedGraph. Using norm_bedGraph.pl, bedGraph were normalized. Using bedGraphToBigWig, convert bedGraph to bigwig
The 'MainMenu' class is the main entry point for the GUI application. It sets up the main window and composes the other classes to create the GUI elements.
If you encounter any errors, have inquiries, or want to suggest improvements, please contact:
Your feedback is appreciated, and it will help me enhance the tool further. Thank you for using the Sequencing Preprocessor GUI!