-
replicates from the same library with same techniques.
-
reduces random noise
-
From biologically distinct samples
-
account for biological variations such as difference in temperature and environment…
Why do we have to normalize the count matrix from RNA data matrix?
It doesn’t account for a lot of biases.
Then which bias do you have to take account for
Assume Gene A is longer than Gene B.
Then more matches with happen in Gene A.
Now we understand why we have to normalize
-
RPKM (Reads per kilobase of transcript per million reads mapped)
-
Normalizes the gene length and sequencing depth
-
Higher than RPKM of a gene, higher the gene expression
-
used to quantify transcripts from single-ended reads
-
CAN NOT BE USED FOR DIFFERENTIAL GENE EXPRESSION ANALYSIS(deSeq2 / edgeR)
-
They take not normalized raw counts as a input
-
RPKM doesn’t account to some of the biases that happen in these techniques
-
-
-
FPKM (fragments per kilobase of transcript peer million mapped fragments)
-
Analogous to RPKM
-
Higher the FPKM, higher the expression
-
used for paired ended data, a read pair, rather than single reads
-
FPKM ≠ 2 * RPKM
-
CAN NOT BE USED FOR DIFFERENTIAL GENE EXPRESSION ANALYSIS(deSeq2 / edgeR)
-
-
TPM (Transcripts per million fragments)
-
Normalizes for gene length and sequencing depth
-
TPM is better suited to compare expression between two samples
-
CAN NOT BE USED FOR DIFFERENTIAL GENE EXPRESSION ANALYSIS(deSeq2 / edgeR)
-
6M library size for each replicates
Genes | Gene Length | Technical replicates 1 | Technical replicates 2 | Technical replicates 3 |
---|---|---|---|---|
gene A | 1.5 kb | 50 | 25 | 85 |
gene B | 2 kb | 75 | 50 | 90 |
… | … | … | … | … |
Total numbers mapped | … | 125 | 75 | 175 |
- Normalize for sequencing Depth
- Divide count / total numbers mapped RPM
- for gene A = 50 / 125 = 0.4
- However for real data you would have a really small number and that is why you multiply 1,000,000
- Normalize by Gene Length
- normalize for 1kb of gene length RPK
- 4 : 1.5kb = ?? : 1kb then 2.66 is the RPK value
- you do the RPKM but in reverse order
TPM is better to compare between technical replicates.