Skip to content

How to use DUSC

Suhas Srinivasan edited this page Dec 29, 2018 · 5 revisions

DUSC uses DAWN and EM to generate cell clusters from the scRNA-seq data. The steps to use DUSC are listed below.

Feature Learning

  1. The scRNA-seq data should be in a Cell x Gene matrix, where Cells are the rows and Genes are the columns.
  2. The matrix values can be of counts or one of the four RNA-seq expression units (RPM, TPM, FPKM and RPKM).
    Note: Log normalized values should not be used.
  3. Perform any necessary filtering of cells based on your quality criteria.
  4. Remove all row and column labels, only the numerical matrix should be present and saved as a comma-separated values (CSV) file.
  5. Run DAWN feature learning: python dawn.py <path to input csv file>.
    DAWN creates the below two files which are saved to the same directory as the input file:
    i. A binary file which contains the preprocessed data, it is named similar to the input file but with a .bin extension.
    ii. A CSV file which contains the latent features, it is also named similar to the input file but has the suffix: latent_features.

Clustering

  1. After DAWN completes, follow the below steps to prepare the file for clustering:
    a. Start Weka
    b. Click the Explorer button on the right
    c. In the Preprocess tab, select Open file…
    d. In the file browser popup, at the bottom select Files of type: CSV data files
    e. Navigate to the DAWN output path and choose the output csv file
  2. Select the Cluster tab and there are two options to generate clusters:
    i. If there is no expectation for the no. of clusters: hit Start and EM selects the no. of clusters based on cross-validation and the assignments are generated.
    ii. If there is some expectation for the no. of clusters (more details here): then click on the EM textbox and in the numClusters field enter the required value.

Note: For faster computation of the clusters, click on the EM textbox and in the numExecutionSlots field enter the no. of available CPU cores.