|
| 1 | +--- |
| 2 | +title: "Homework Unit 9" |
| 3 | +output: word_document |
| 4 | +--- |
| 5 | + |
| 6 | +#Assignment for Biostatistics Week 9: |
| 7 | + |
| 8 | +Download and open the “Stenosis” data from the class webpage. Assume this is a random sample from some population. Answer the following questions referring to this dataset. |
| 9 | +```{r} |
| 10 | +library(readr) |
| 11 | +stenosis <- read_csv("~/Papers/Biostatistics JHU 2021/stenosis.txt") |
| 12 | +
|
| 13 | +``` |
| 14 | +In class: |
| 15 | + |
| 16 | +1. What proportion of this sample has aortic Stenosis (disease)? Give a 95% CI for the proportion of the population that this sample came from that have aortic stenosis. |
| 17 | + |
| 18 | +```{r} |
| 19 | + mean(stenosis$disease) |
| 20 | +
|
| 21 | +p.hat <- mean(stenosis$disease) |
| 22 | +p.hat - 1.96*(sqrt(p.hat * (1-p.hat)) / sqrt(length(stenosis$disease))) |
| 23 | +p.hat + 1.96*(sqrt(p.hat * (1-p.hat)) / sqrt(length(stenosis$disease))) |
| 24 | +
|
| 25 | +binom.test(105, 215) |
| 26 | +``` |
| 27 | + |
| 28 | +2. Test the null hypothesis that 40% of this population has aortic stenosis. Write a one-sentence conclusion. |
| 29 | + |
| 30 | +```{r} |
| 31 | +
|
| 32 | +binom.test(105,215, p=.4) |
| 33 | +
|
| 34 | +``` |
| 35 | + |
| 36 | +*We will reject the null hypothesis at 40%.* |
| 37 | + |
| 38 | +3. Make a 2x2 table showing the association between gender and aortic stenosis. |
| 39 | + |
| 40 | +```{r} |
| 41 | +
|
| 42 | +xtabs(~stenosis$disease + stenosis$`sex: male`) |
| 43 | +
|
| 44 | +my.table <- table(stenosis$disease, stenosis$`sex: male`) |
| 45 | +
|
| 46 | +``` |
| 47 | + |
| 48 | +a. Conduct a Fisher’s Exact Test on the 2x2 table to test the null hypothesis that there is no association between gender and aortic stenosis against the alternative hypothesis that there is an association. Write one (full) sentence giving your conclusions. |
| 49 | + |
| 50 | +```{r} |
| 51 | + |
| 52 | +fisher.test(my.table) |
| 53 | +
|
| 54 | +``` |
| 55 | +*P-value is much less than .05 so we are able to reject the notion of them being independent and say there is some association between stenosis and biological sex.* |
| 56 | + |
| 57 | +b. Conduct a Chi-square test on the same table. Do you get the same results as the fisher exact test? Any guess why or why not? |
| 58 | + |
| 59 | +```{r} |
| 60 | +
|
| 61 | +chisq.test(my.table) |
| 62 | +
|
| 63 | +
|
| 64 | +``` |
| 65 | +*Both of these methods give a small enough value to reject the null hypothesis and the probability of not having the disease is not the same between male and female.* |
| 66 | + |
| 67 | + |
| 68 | +4. Pretend the people with aortic stenosis are a random sample of all people with this disease. Test the hypothesis that among the people with aortic stenosis, the proportion of women is 0.5. Give the estimated proportion and a 95% CI along with the test. |
| 69 | + |
| 70 | +```{r} |
| 71 | + |
| 72 | +binom.test(62,105, p=.5) |
| 73 | +
|
| 74 | +
|
| 75 | +``` |
| 76 | + |
| 77 | +``` |
| 78 | +Sample Code: |
| 79 | +stenosis <- read.csv("stenosis.txt") |
| 80 | +table(stenosis$disease) |
| 81 | +binom.test(table(stenosis$disease)) |
| 82 | +binom.test(table(stenosis$disease),p=.4) |
| 83 | +table(stenosis$disease,stenosis$sex..male) |
| 84 | +fisher.test(table(stenosis$disease,stenosis$sex..male)) |
| 85 | +chisq.test(table(stenosis$disease,stenosis$sex..male)) |
| 86 | +``` |
| 87 | + |
| 88 | +7. In a study of maternal smoking and congenital malformations, consider children born with an oral cleft. In a random sample of 27 such infants, 15 have mothers who smoked during pregnancy. |
| 89 | + |
| 90 | +a. What is the point estimate for p? Construct a 95% confidence interval for the population proportion. |
| 91 | + *A point estimate for p=15/27=0.56* |
| 92 | + |
| 93 | +95% confidence interval is: |
| 94 | + |
| 95 | +p +/- Z*v(p*(1-p)/n) |
| 96 | + |
| 97 | +0.56 + 1.96*sqrt(0.56*(1-0.56)/27) |
| 98 | +0.56 - 1.96*sqrt(0.56*(1-0.56)/27) |
| 99 | + |
| 100 | +(0.3727618, 0.7472382) |
| 101 | + |
| 102 | +b. You would like to know whether the proportion of mothers who smoked during pregnancy for children with an oral cleft is identical to the proportion of mothers who smoked for children with other types of malformations, which is 32.8%. What is the null hypothesis of the appropriate test? |
| 103 | +*The null hypothesis is that the the proportions of malformations, whether it is a cleft pallet or something else, is the same for mothers who smoked during pregnancy.* |
| 104 | + |
| 105 | +c. What is the alternative hypothesis? |
| 106 | +*The alternative hypothesis is there is a difference in what type of malformations occur do to smoking during pragnancy.* |
| 107 | + |
| 108 | +d. Conduct the test at the .01 level of significance |
| 109 | +Ho:p=0.328 |
| 110 | + |
| 111 | +Ha:p not equal to 0.328 |
| 112 | + |
| 113 | +The test statistic is |
| 114 | + |
| 115 | +Z=(phat-p)/v(p*(1-p)/n) |
| 116 | + |
| 117 | +(0.56-0.328)/sqrt(0.328*(1-0.328)/27) |
| 118 | + |
| 119 | +2.567724 |
| 120 | + |
| 121 | +The p-value=2*P(Z>2.57)=0.0102 |
| 122 | + |
| 123 | +e. What do you conclude? |
| 124 | +*Since p-value is larger than 0.005, we do not reject HO.* |
| 125 | + |
| 126 | +12. Suppose you are interested in investigating the factors that affect the prevalence of tuberculosis among intravenous drug users. In a group of 97 individuals who admit to sharing needles, 24.7% had a positive tuberculin skin test result; among 161 drug users who deny sharing needles, 17.4% had a positive test result. |
| 127 | +```{r} |
| 128 | +TB <- matrix(c(24, 73, 28, 133), ncol = 2, byrow = TRUE) |
| 129 | +colnames(TB) <- c("TB test+", "TB test-") |
| 130 | +rownames(TB) <- c( "Sharing Needles", "Not Sharing") |
| 131 | +TB <- as.table(TB) |
| 132 | +TB |
| 133 | +summary(TB) |
| 134 | +
|
| 135 | +``` |
| 136 | +a. Assuming that the population proportions of positive skin test results are in fact equal, estimate the common value p. |
| 137 | + |
| 138 | +(97*.247) + (161*.174)/(97+161) = 0.2014457 |
| 139 | +*The estimated common p-value is 20.14%* |
| 140 | + |
| 141 | +b. Test this null hypothesis that the proportions of intravenous drug users who have a positive tuberculin skin test result are identical for those who share needles and those who do not. |
| 142 | + |
| 143 | +z=.247-.174/sqrt.2014*.7986(1/97+1/161) = 1.416 |
| 144 | + |
| 145 | +c. What do you conclude? |
| 146 | +*I can conclude there is significant evidence that the null hypothesis, the proportions are identical.* |
| 147 | + |
| 148 | +d. Construct a 95% confidence interval for the true difference in proportions |
| 149 | +(.247-.174) +/- (1.96) (sqrt(.247*.753/97) + (.147*.826/161)) = -0.0309 - 0.1769 |
| 150 | + *95% CI is between -0.0309 - 0.1769.* |
| 151 | + |
| 152 | +``` |
| 153 | +Note/tip: Please try to do this "by hand"-- ie using the computer as a calculator but not just asking it to do the calculations for you. Something along these lines (but make sure you understand where this comes from and don’t just cutnpaste!): |
| 154 | +
|
| 155 | +E <- c(70*33/440 , 70*110/440, 70*297/440, |
| 156 | + 211*33/440 , ... <<etc... finish this>> |
| 157 | +
|
| 158 | +
|
| 159 | +O <- c(6,22,42, <<etc... finish this>> |
| 160 | + |
| 161 | +stat <- sum( (( O-E )^2)/ <<etc... finish this>> |
| 162 | + |
| 163 | +pval <- 1-pchisq(stat, df=1) |
| 164 | +pval |
| 165 | +``` |
| 166 | + |
| 167 | + |
| 168 | + |
0 commit comments