Skip to content

E) hogwash outputs

Katie Saund edited this page Sep 14, 2020 · 46 revisions

Plots

Each test will return a .pdf with a Manhattan plot of the genetic loci (x-axis) vs. the -ln(P-value) (y-axis). The horizontal red line is the threshold for significance provided by the user, with significant loci appearing above the line.

Each test will return a heatmap to show the relationship between the genotype transition edges and the phenotype (either presence, transitions, or |Δ|). The heatmap is only generated if at least two loci were significant. Each column is a genotype. Each row corresponds to an edge on the tree.

The heat map describes the genotype by edge:

  • 1 (black) = transition edge
  • 0 (white) = non-transition edge
  • NA (grey) = low confidence

Column annotations:

  • -ln(FDR Corrected P-value) (green)
  • Locus Signifiance: whether or not the P-value is significant after multiple test correction. (Not significant in white, significant in blue)
  • Optional: Number of genetic loci included in the group (only appears when genotype is grouped according to a user provided key)

Row annotations:

  • Phenotype presence or absence
    • Present (red)
    • Absent (white)
    • Low confidence (grey)

Each test returns the phenotype on the tree. PhyC plots the phenotype reconstruction.

Plots for just the significant hits

Hogwash will return a page in the pdf for each genotype found to be significant. Genotypes are ordered by the rank of the FDR corrected P-value.

Genotype transition plot.

NPhyC null distribution with observed value.

.rda file

This R data file contains multiple types of data objects.

Binary phenotype output (both Synchronous & PhyC)

The prefix will be either hogwash_synchronous or hogwash_phyc, as appropriate.

  • $log A description of your R environment
  • $no_convergence_genotypes A character vector with the names of genotypes excluded from both ancestral reconstruction and testing. These genotypes were excluded because either the genotype was absent in all or all but 1 samples or was present in all or all but 1 sample.
  • $contingency_table A list of matrices. Each tested genotype has a corresponding contingency table.
    • Synchronous: Relationship between the genotype not/transition and phenotype not/transition on each tree edge.
    • Continuous: Relationship between the genotype not/transition and phenotype presence/absence on each tree edge.
  • $hit_pvals The -log FDR corrected P-value for each genotype tested.
  • $sig_pvals The -log FDR corrected P-value for only genotypes significantly associated with the genotype.
  • $raw_pvals The -log unadjusted P-value for each genotype tested (FDR not applied).
  • $hi_confidence_transition_edge A list of numeric vectors. Each vector corresponds to the a tested genotype. The vectors are ordered by tree edges. High confidence genotype transition edges are indicated by 1, low confidence by 0.
  • $num_hi_conf_transition_edge Named numeric vector with the total number of high confidence genotype transition edges for the respective genotype.
  • $dropped_genotypes A character vector with the names of genotypes removed from testing because they did not have at least two high confidence genotype transition edges.
  • $convergence
    • $epsilon The ε value for each genotype-phenotype pair.
    • $pheno_beta βphenotype
    • $geno_beta βgenotype for each genotype tested.
    • $num_hi_conf_edges Number of high confidence edges for each genotype-phenotype pair.
    • $N ∑βphenotypeβgenotype for each genotype-phenotype pair.

Continuous phenotype output (Continuous Test)

  • hogwash_continuous$log A description of your R environment
  • hogwash_continuous$no_convergence_genotypes A character vector with the names of genotypes excluded from both ancestral reconstruction and testing. These genotypes were excluded because either the genotype was absent in all or all but 1 samples or was present in all or all but 1 sample.
  • hogwash_continuous$dropped_genotypes A character vector with the names of genotypes removed from testing because they did not have at least two high confidence genotype transition edges.
  • hogwash_continuous$hit_pvals The FDR corrected P-value for each genotype tested.
  • hogwash_continuous$sig_pvals The FDR corrected P-value for only genotypes significantly associated with the genotype.
  • $raw_pvals The -log unadjusted P-value for each genotype tested (FDR not applied).
  • hogwash_continuous$hi_confidence_transition_edge A list of numeric vectors. Each vector corresponds to the a tested genotype. The vectors are ordered by tree edges. High confidence genotype transition edges are indicated by 1, low confidence by 0.
  • hogwash_continuous$num_hi_conf_transition_edge Named numeric vector with the total number of high confidence genotype transition edges for the respective genotype.
  • hogwash_continuous$dropped_genotypes A character vector with the names of genotypes removed from testing because they did not have at least two high confidence genotype transition edges.
  • hogwash_continuous$genotype_transition_edge Matrix. Rows correspond to tree edges. Columns correspond to tested genotypes. Encoding:
    • 0 == not a transition edge
    • 1 == transition edge where parent node is 0 and child node is 1
    • -1 == transition edge where parent node is 1 and child node is 0
    • NA == low confidence edge
  • hogwash_continuous$phenotype_transition_edges Matrix. Rows correspond to tree edges. Values are the absolute change in the phenotype on that tree edge.
  • hogwash_continuous$delta_pheno_table A list of names matrices. Each tested genotype has a matrix that describes the sum of the absolute value in phenotype change on each of the following types of tree edges: genotype 0 → 1, genotype 1 → 0, and no change in genotype (either 0 → 0 or 1 → 1).
    Example:
sum(|Δphenotype|)
geno_parent_0_child_1 0.88
geno_parent_1_child_0 0.00
geno_no_change 80.08
  • hogwash_continuous$delta_pheno_list A list of named lists. Each sublist corresponds to one tested genotype. The sublist has three numeric vectors: $geno_parent_0_child_1, $geno_parent_1_child_0, and hogwash_continuous$geno_no_change. Each value in the vector the is absolute value of phenotype change for an edge of that category. The edge values are not in any particular order.
  • $convergence
    • $epsilon The ε for each genotype-phenotype pair.
    • $pheno_beta βphenotype for each genotype-phenotype pair.
    • $geno_beta βgenotype for each genotype tested.
    • $num_hi_conf_edges Number of high confidence edges for each genotype-phenotype pair.
    • $N βgenotypeTβphenotype for each genotype-phenotype pair.

Next: Some suggestions for exploring your hogwash results.