- Nginx (Other possible when modifying the sample download section)
- Python 3 (Flask and other dependencies)
- MySQL
- Pure-FTPD with mysql
To configure the GCP for the platform, see the installation guide.
Automated install only works for the platform section; make sure to have configured the GCP before continuing the platform installation.
For deployment of the platform on a Google Cloud VM instance, one would require to create an instance and configure some network firewall settings. If you are deploying the platform on a local instance, you can skip these two steps.
-
Creating a VM instance
- Open the Google Cloud console, navigate to Compute Engine -> VM instances section, and click on "Create Instance".
- The following are the details of the VM instance to be entered (most of the default configuration below can be changed as per the requirements):
- Region: us-central1 (Iowa)
- Zone: us-central1-a
- Machine Family: General Purpose
- Series: E2
- Machine Type: e2-small
- Boot Disk
- For Linux:
- OS: Ubuntu
- Version: Ubuntu 22.04 LTS (x86)
- Boot type disk: Balanced persistent disk
- Size: 10GB
- For Windows:
- OS: Windows Server
- Version: Windows Server 2019 Datacenter
- Boot type disk: Balanced persistent disk
- Size: 50GB
- For Linux:
- Choose the service account as the service account you just created for the platform.
- Select the "Allow HTTP traffic" and "Allow HTTPS traffic" checkboxes.
- Navigate to Advanced options -> Networking -> Network Interfaces -> External IPv4 address, and click on Create IP Address and reserve a new static external IP address for the platform.
-
Setting up firewall settings
To allow access to the platform through an external IPv4 address just created, there are some firewall configurations to be made:
- Navigate to VPC network -> Firewall and click on "Create Firewall Rule".
- Set the rule name as "default-allow-https" and enter the following details for this rule:
- Priority: 1000
- Direction of Traffic: Ingress
- Action on the match: Allow
- Target type: Specified target tags -> Target Tags: "https-server"
- Source Filter: IPv4 ranges
- Source IPv4 ranges: 0.0.0.0/0
- Protocols and ports -> Specified protocols and ports -> TCP -> Port: 443 (Nginx server is configured on this port)
- Now click on "Save"
- Now create another firewall rule for HTTP as "default-allow-http" with the following changes in the above configuration:
- Target type: Specified target tags -> Target Tags: "http-server"
- Protocols and ports -> Specified protocols and ports -> TCP -> Port: 80
Clone the latest sample-platform repository from
https://github.com/CCExtractor/sample-platform.
Note that root (or sudo) is required for both installation and running the program.
The sample-repository
directory needs to be accessible by www-data
. The
recommended directory is thus /var/www/
.
cd /var/www/
sudo git clone https://github.com/CCExtractor/sample-platform.git
Mounting on Linux OS can be done using Google Cloud Storage FUSE.
Steps:
-
Install gcsfuse using official documentation or using the following script
curl -L -O https://github.com/GoogleCloudPlatform/gcsfuse/releases/download/v0.39.2/gcsfuse_0.39.2_amd64.deb sudo dpkg --install gcsfuse_0.39.2_amd64.deb rm gcsfuse_0.39.2_amd64.deb
-
Now, there are multiple ways to mount the bucket, official documentation here.
For Ubuntu and derivatives, assuming
/repository
to be the location of samples to be configured, an entry can be added to/etc/fstab
file, replace GCS_BUCKET_NAME with the name of the bucket created for the platform:echo "GCS_BUCKET_NAME /repository gcsfuse rw,gid=33,noatime,async,_netdev,noexec,user,implicit_dirs,allow_other,file_mode=774,dir_mode=775 0 0" | sudo tee -a /etc/fstab
-
Now run the following command as root to mount the bucket:
sudo mkdir /repository sudo mount /repository
You may check if the mount was successful and if the bucket is accessible by running ls /repository
command.
In case you get "permission denied" for /repository
, you can check for the following reasons:
- Check if the service account created has access to the GCS bucket.
- Check the output of
sudo mount /repository
command.
Place the service account key file at the root of the sample-platform folder.
The platform has been tested for MySQL v8.0 and Python 3.7 to 3.9.
It is recommended to install python and MySQL beforehand to avoid any inconvenience. Here is the installation link of MySQL on Ubuntu 22.04.
Next, navigate to the install
folder and run install.sh
with root
permissions.
cd sample-platform/install/
sudo ./install.sh
The install.sh
will begin downloading and updating all the necessary dependencies. Once done, it'll ask to enter some details in order to set up the sample-platform. After filling in these details, the platform should be ready for use.
Please read the below troubleshooting notes in case of any error or doubt.
- Install cygwin (http://cygwin.com/install.html). When cygwin asks which packages to install, select Python, MySql, and google-api-client. If you already have cygwin installed, you must run its setup file to install the new packages. Make sure the dropdown menu is set to Full, so you can all packages. To select one, click skip and it will change to the version number of the package. Use the end of this tutorial for help on getting cygwin to recognize python.
- Install WinFsp and Rclone from their official websites.
- Now rclone is a command line program, follow the official documentation to mount the google cloud storage bucket, using the service account key file.
- By default home directory of Cygwin is
C:\cygwin\home\<USERNAME>\
(this can be obtained by runningcygpath -w ~
from cygwin terminal), and assuming\repository
to be the location of samples to be configured, mount the bucket using rclone atC:\cygwin\home\<USERNAME>\repository
through the following command using command prompt:rclone mount GCS_BUCKET_NAME MOUNT_LOCATION --no-console
- Start a terminal session once installation is complete.
- Install Nginx (see http://nginx.org/en/docs/windows.html)
- Follow the steps from the Linux installation from within the Cygwin terminal to complete the platform's installation.
- The configuration of the sample platform assumes that no other application
already runs on port 80. A default installation of nginx leaves a
default
config file behind, so it's advised to delete that and stop other applications that use the port.
Note : Do not forget to do service nginx reload
and
service platform start
after making any changes to the nginx configuration
or platform configuration.
- SSL is required. If the platform runs on a publicly accessible server, it's recommended to use a valid certificate. Let's Encrypt offers free certificates. For local testing, a self-signed certificate can be enough.
- When the server name is asked during installation, enter the domain name
that will run the platform. E.g., if the platform will run locally, enter
localhost
as the server name. - In case of a
502 Bad Gateway
response, the platform didn't start correctly. Manually runningbootstrap_gunicorn.py
(as root!) can help to determine what goes wrong. The snippet below shows how this can be done:
cd /var/www/sample-platform/
sudo python bootstrap_gunicorn.py
-
If
gunicorn
boots up successfully, most relevant logs will be stored in thelogs
directory. Otherwise they'll likely be insyslog
. -
If any issue still persists, follow the mentioned steps to debug and troubleshoot your issue:
- Firstly check the Platform Installation log file in the install folder. Check for any errors, which may have been caused during platform installation on your system, and then try to resolve them accordingly.
- Next check for nginx status by
service nginx status
command, if it is not active, check nginx error log file, possibly in/var/log/nginx/error.log
file. - Next check for platform status by
service platform status
command, if it is notactive(running)
then check for platform logs in thelogs
directory of your project. - In case of any gunicorn error try manually running
/etc/init.d/platform start
command and recheck the platform status.
After the completion of the automated installation of the platform, the following folder structure is created in the 'SAMPLE_REPOSITORY' set during installation:
LogFiles/
- Directory containing log files of the tests completedQueuedFiles/
- Directory containing files related to queued samplesREADME
- A readme file related to SSL certificates required by the platformTempFiles/
- Directory containing temporary filesTestData/
- Directory containing files required for starting a test - runCI files, variables file, testerTestFiles/
- Directory containing regression test samplesTestResults/
- Direction containing regression test resultsvm_data/
- Directory containing test-specific subfolders, each folder containing files required for testing to be passed to the VM instance, test files and CCExtractor build artefact.
Now for tests to run, we need to download the CCExtractor testsuite release file, extract and put it in TestData/ci-linux
and TestData/ci-windows
folders.
To serve file downloads directly from the private GCS bucket, Signed download URLs have been used.
The serve_file_download
function in the utility.py
file implements the generation of a signed URL for the file to be downloaded that would expire after a configured time limit (maximum limit: 7 days) and redirects the client to the URL.
For more information about Signed URLs, you can refer to the official documentation.
Now the server being running, new tests would be queued and therefore a cron job is to be setup to run those tests.
The file mod_ci/cron.py
is to be run in periodic intervals. To setup a cron job follow the steps below:
- Open your terminal and enter the command
sudo crontab -e
. - To setup a cron job that runs this file every 10 minutes, append this at the bottom of the file
Change the
*/10 * * * * python /var/www/sample-platform/mod_ci/cron.py > /var/www/sample-platform/logs/cron.log 2>&1
/var/www/sample-plaform
directory, if you have installed the platform in a different directory.
In order to accept big files through HTTP uploads, some files need to be adapted.
If the upload is too large, Nginx will throw a
413 Request entity too large
. To remedy this error, modify the next section
in the nginx config:
# Increase Nginx upload limit
client_max_body_size 1G;
Besides HTTP, there is also an option available to upload files through FTP. As the privacy of individual sample submitters should be respected, each user must get it's own FTP account. However, system accounts pose a possible security threat, so virtual accounts (using MySQL) are to be used instead. Virtual users also offer the added benefit of easier management.
sudo apt-get install pure-ftpd-mysql
If requested, answer the following questions as follows:
Run pure-ftpd from inetd or as a standalone server? <-- standalone
Do you want pure-ftpwho to be installed setuid root? <-- No
All MySQL users will be mapped to this user. A group and user id that is still available should be chosen.
sudo groupadd -g 2015 ftpgroup
sudo useradd -u 2015 -s /bin/false -d /bin/null -c "pureftpd user" -g ftpgroup ftpuser
Edit the /etc/pure-ftpd/db/mysql.conf
file (in case of Debian/Ubuntu) so it
matches the next configuration:
MYSQLSocket /var/run/mysqld/mysqld.sock
# user from the DATABASE_USERNAME in the configuration, or a separate one
MYSQLUser user
# password from the DATABASE_PASSWORD in the configuration, or a separate one
MYSQLPassword ftpdpass
# The database name configured in the DATABASE_SOURCE_NAME dsn string in the configuration
MYSQLDatabase pureftpd
# For now we use plaintext. While this is terribly insecure in case of a database leakage, it's not really an issue,
# given the fact that the passwords for the FTP accounts will be randomly generated and hence do not contain sensitive
# user info (we need to show the password on the site after all).
MYSQLCrypt plaintext
# Queries
MYSQLGetPW SELECT password FROM ftpd WHERE user_name="\L" AND status="1" AND (ip_access = "*" OR ip_access LIKE "\R")
MYSQLGetUID SELECT uid FROM ftpd WHERE user_name="\L" AND status="1" AND (ip_access = "*" OR ip_access LIKE "\R")
MYSQLGetGID SELECT gid FROM ftpd WHERE user_name="\L" AND status="1" AND (ip_access = "*" OR ip_access LIKE "\R")
MYSQLGetDir SELECT dir FROM ftpd WHERE user_name="\L" AND status="1" AND (ip_access = "*" OR ip_access LIKE "\R")
MySQLGetQTAFS SELECT quota_files FROM ftpd WHERE user_name="\L" AND status="1" AND (ip_access = "*" OR ip_access LIKE "\R")
# Override queries for UID & GID
MYSQLDefaultUID 2015 # Set the UID of the ftpuser here
MYSQLDefaultGID 2015 # Set the GID of the ftpgroup here
Next, a file called ChrootEveryone
must be created, so that the individual
users are jailed:
echo "yes" | sudo tee /etc/pure-ftpd/conf/ChrootEveryone
The same needs to be done for CreateHomeDir
and CallUploadScript
:
echo "yes" | sudo tee /etc/pure-ftpd/conf/CreateHomeDir
echo "yes" | sudo tee /etc/pure-ftpd/conf/CallUploadScript
Set PassivePortRange
for pure-ftpd:
echo "30000 50000" | sudo tee /etc/pure-ftpd/conf/PassivePortRange
Also /etc/default/pure-ftpd-common
needs some modification:
UPLOADSCRIPT=/path/to/cron/progress_ftp_upload.py
UPLOADUID=33 # User that owns the upload.sh script
UPLOADGID=33 # Group that owns the upload.sh script
You can get the UPLOADUID and UPLOADGID using the following command:
ls -l /path/to/cron/progress_ftp_upload.py
Since we provide bucket access only to group www-data
, we will now add ftpuser
to the group:
sudo usermod -a -G www-data ftpuser
In manual installation, you may change www-data
to the group you provided the bucket access.
When necessary, an appropriate value in the Umask file
(/etc/pure-ftpd/conf/Umask
) should be set as well.
After this you Pure-FTPD can be restarted using
sudo /etc/init.d/pure-ftpd-mysql restart
Note: if there is no output saying:
Restarting ftp upload handler: pure-uploadscript.
, the uploadscript will
need to be started. This can be done using the next command (assuming 33 is
the gid
and uid
of the user which was specified earlier):
sudo pure-uploadscript -u 33 -g 33 -B -r /home/path/to/src/cron/progress_ftp_upload.py
sudo chown 33:33 /home/path/to/src/cron/progress_ftp_upload.py
To check if the upload script is running, the next command can help:
ps aux | grep pure-uploadscript
. If it still doesn't work, rebooting the
server might help as well.
NOTE: These steps need to be performed only if the platform is being installed on a GCP VM instance.
Since pure-ftpd server runs over TCP port 21 and we allow passive TCP port range as 30000-50000, we should now allow incoming communication requests through these ports to the platform server:
- Navigate to VPC network -> Firewall and click on "Create Firewall Rule".
- Set the name as "default-allow-ftp-ingress" and enter the following details for this rule:
- Priority: 1000
- Direction of Traffic: Ingress
- Action on the match: Allow
- Target type: "Allow instances in the network"
- Source Filter: IPv4 ranges
- Source IPv4 ranges: 0.0.0.0/0
- Protocols and ports -> Specified protocols and ports -> TCP -> Port: 21,30000-50000
- Now click on "Save"
To setup automated deployments via GitHub workflows, follow these steps:
- Ensure GitHub Actions is enabled in the repository (It is disabled for forks by default).
- Create two new GitHub repository variables, from the Settings tab
-
Navigate to "Secrets and variables" -> "Actions" -> "Variables".
-
Click on "New repository variable" and setup the following variables:
-
PLATFORM_DOMAIN
: The domain your platform is to be deployed, for example:sampleplatform.ccextractor.org
. -
SSH_USER
: The user that has root access and would be used to SSH into the server.(You can see a list of users in your system by running the command
less /etc/passwd
)
-
-
Now get the SSH private and public(
.pub
) keys by running the following command locally:ssh-keygen -t ed25519 -C "[email protected]"
Note: If you are using a legacy system that doesn't support the Ed25519 algorithm, use:
ssh-keygen -t rsa -b 4096 -C "[email protected]"
-
Now SSH into the VM instance where the platform is to be deployed, open the file
/home/<SSH_USER>/.ssh/authorized_keys
and append the contents of the public key to the end of the file. -
Now the last step is to add the SSH private key to the GitHub secrets
- Navigate to "Secrets and variables" -> "Actions" -> "Secrets".
- Click on "New repository secret" and setup the following variable:
SSH_KEY_PRIVATE
: Save the contents of the private SSH key file created in the last step as this secret.
-
Also checkout the variables
INSTALL_FOLDER
andSAMPLE_REPOSITORY
in the deployment pipeline in case you have configured values other than default.
-