Home

Welcome to the PDB2MovieWeb wiki!

PDB2MovieWeb Is a web service designed to take user-submitted files and parameters and use them as arguments when running the PDB2Movie scripts.

Requirements

Latest LTS Debian Linux server
Apache 2.4.6 or greater
MariaDB 5.5.56 or greater
PHP 5.4.16 or greater
Access to Processing server:
1. php-ssh2 PECL Library
2. Server with Torque job scheduler
OR Same Webserver
PDB2Movie[]

Installation

The application can be installed in two ways. With a remote processing server, or without. Please keep in mind that this application was primarily built to work with a processing server. If ran locally on the same webserver, the processes will be run in individual bash shells and will be very resource intensive. It is highly recommended to handle processing on an external server.

For web Server

clone this repository into your httpd or apache folder, with the html/ directory (default /var/www/).
cd into PDB2MovieWeb and perform the command "mysql -u <YOUR_SQL_USER> -p < ./database.sql"

For Processing Server

clone the repository again in any location that the user you plan to use for the remote connection can access.
Clone the PDB2movie repository into ./script/proc/ within PDB2MovieWeb
Follow the installation for PDB2Movie in its README.md If you are installing PDB2Movie on the same server as the web application, there is no need to clone the same repository twice, simply go to ./script/proc/ within PDB2MovieWeb and follow the same steps.

The configuration file

cd into the location of PDB2MovieWeb and perform the comman "cp config-template.conf config.conf".
fill in the required fields, the main sections to fill out are:
1. Local server details, this is used for self referential information and to share the information with the external processing server.
2. Remote server details, details where to look for the files and information to connect to the remote server.
3. SSH information, details the location of the ssh public and private keys to use for the remote connection
4. SQL information, details the information needed to connect to the SQL database
5. RemoteProcessing, this value excepts 1 or 0 for if it uses remote processing or not, respectively. The configuration file comes with comments that detail further the information required.

Project Overview

Web front-end

The web front-end is written in HTML/CSS/JavaScript. I have used Bootstrap 4 for a CSS framework[] to handle liquid layout design and fundemental assets of the website. The JavaScript Jquery library to handle dynamic object manipulation, minor animations and ajax HTTP requests. The HTTP requests sent from the website are formatted through a form data object. The code to bind it is:

$.each(params, function(key, value) { fData.append(key, value); })

where params is a JSON list containing tuples of a named key and a value. If additional data is to be added to the HTTP Requests, the params value is set at the end of the function check(), where all values go through the first round of sanitising and error checking on the front-end. In terms of responses, all responses come in three items categorised as "status", "title" "text". Status will be either "success" of "failure", title will be the first item of text which typically will be added to a title attribute in the DOM. "text" will be the second item of text, which will go below the title. An exemption of this is when getting a response from the removal of a processing request, as this only uses the title asset to write "This has been removed.", the item "text" is still present in the response, but is an empty string.

Web Server-side

There are two programming languages on the server-side, the PHP scripts which handle the HTTP requests and responses as well as sanitisation and parsing of files and arguments, and the bash scripts that handle messages from the processing server after the request has been passed over to it when the PHP scripts finish.

PHP Scripts

These files are located in ./html/php/. Most of the scripts handle the HTTP Requests, these are similarly structured there process follows:

Load config file, sanitise all parameters, if files are included, sanitise them.
Make interaction with database, establish connection then query, add or delete.
Make interaction with the processing server, connect and authenticate, perform command with script to run and parameters in the arguments.

The files that handle the requests are index, review and delete.

index
review
delete

The other files are PHP classes. These are FileChecker, PDBChecker, PythonChecker and RemoteConnection.

FileChecker, checks for typical errors that can occur when transferring a file from client to the server. Returns a fail or pass value. extended by PDBChecker and PythonChecker.
PDBChecker, extending FileChecker, after receiving a Success from the base checking, it then goes line-by-line and flags every file that contains a special character
PythonChecker, exending FileChecker after receiving a Success from the base checking, it then goes and checks the formatting of the file against the whitelist of commands it allows. So far, the only thing allowed is the repositioning of the camera EDIT: NOT YET IMPLEMENTED, if you use this program, for security reason, you may want to disable the video file capability, check config files.
RemoveConnection, this class simply loads the config files, and uses the SSH information to connect to the remote server and go through the stages of authentication before returning an SSH2 session object that can be used to interact with the remote server. If the value returned is a string, it will be a JSON formatted string, detailing the failure point in the authentication process.

Bash Scripts

These files are located within ./script/web/. The bash scripts typically handle the interactions after the processing request has been given to the processing server. These files are mailer and updateFinishedTask

mailer handles all the emails sent to the user. This script interacts with a directory adjecent to it called email-texts/. This directory contains text files that contain the bulk of the written word used in the email. Each text file is titled within the type of email it uses. There are four emailing scenarios:
- submitted.txt, where the request is first sent by the user.
- processing.txt, where the queued request has began processing.
- complete.txt, when the request has finished processing and a download link is required by the user to get their contents off of the web server.
- removed.txt, when a request is deleted, an email clarifying the last action is sent to the user via email.
The arguments that the mailer takes are the recipient email, the subject of the email, the name of the txt file to be used, the download link, "NULL" if it is not the complete.txt, and then the rest of the arguments are all of the parameters sent by the user to relay back to them so they understand which request in question the email refers to. These requests are printed out in the order they are arranged in when submitting the bash command. Check the outputted email or the script itself to see what order these parameters are in if you wish to add additional values or call this script in a different context.
updateFinishTask takes the the information about the request in question and changed the complete attribute from a 0 to a 1. This marks it in the database as complete which is used when formatting the query section of the web app and during pre-queries before submission when the user is submitting another request.

The other bash script is GarbageCollection. Once a day every file in the download folder is viewed, if the date of the last modified timestamp is more than seven days old, the file is removed. This ensures that the space on the server should not be exceed and that the user has a seven day window.

Processing Server-side

The processing server is supported by the Centre of Scientific Computing, as mentioned above, their Linux server uses the Torque Job Scheduler to run processing requests on their server. This is done with bash files with unique parameter declarations at the start marked by #PBS where the number of nodes, clusters, memory, walltime, and error log formatting is stated. In all requests, 2GB of memory is given, 1 node and 20 clusters are used with a walltime of four hours. The main two files used are the submit and remove file. Each of these are used in different requests from the user, one for submitting a job to the processing server, and the other to remove that previous job.

submit takes all of the arguments and file locations of request and constructs the flags and syntax used for the qsub Torque command. It then processes that command - and when it is complete and the files are generate - the compression type is checked as well as to what files to compress, and the files are formatted and compressed accordingly. After this, unneeded files are removed from the processing server and the compressed zip is moved from the processing server to the download section.
the remove file takes simply takes the unique filename generated for each request and finds a similarly named job on the Torque system. It finds the associated job ID and then deletes it. After this, it looks for directories made with the same jobname and removes all of them. Something to note is that the jobname is only 16 characters long, whislt the filename is a 40 character hash. This means that there is a chance of collisions where two jobs have the same name. It is also worth noting that the odds of two independant jobs having the first 16 characters exactly the same and being submitted and active at within the space of the seven day download time window is incredibly low. If this has been found to be a problem, there is a historical rewrite of this code.

Test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly