Skip to content

Commit 712f82f

Browse files
authored
Add files via upload
0 parents  commit 712f82f

File tree

1 file changed

+146
-0
lines changed

1 file changed

+146
-0
lines changed

Homework2.Rmd

Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
---
2+
title: "Homework 2"
3+
output:
4+
word_document: default
5+
pdf_document: default
6+
---
7+
8+
```{r setup, include=FALSE}
9+
knitr::opts_chunk$set(echo = TRUE)
10+
```
11+
12+
# Homework 2
13+
14+
## Computer Assignment for Biostatistics Week 2:
15+
16+
Download and read in the est1.csv data.
17+
18+
This data is a subset of the Coronary Artery Surgery Study (CASS) . looking at the use of an Exercise Stress Test (y1; 1=positive) and history of Chest Pain (y2; 1=yes) as predictors of coronary artery disease (d; 1=yes), as diagnosed by the gold standard of arteriography. The men in this sample were all undergoing coronary arteriography for suspected disease. The data is taken from “The Statistical Evaluation if Medical Tests for Classification and Prediction” by Margaret Pepe.
19+
20+
Assume this data is a random sample from the population of men who would undergo arteriography for suspected coronary artery disease.
21+
22+
```{r}
23+
# read in the data here
24+
25+
library(readr)
26+
est1 <- read_csv("~/Papers/Biostatistics JHU 2021/est1.csv")
27+
28+
```
29+
30+
From this data, compute :
31+
32+
a) Proportion of men in this sample who have disease
33+
34+
```{r}
35+
n <- length(est1$d)
36+
table(est1$d)/n
37+
38+
```
39+
*0.698 men have the disease*
40+
41+
b) Proportion of men in this sample who score “Positive” on the Exercise Stress Test (y1=1)
42+
43+
```{r}
44+
45+
table(est1$y1)/n
46+
47+
```
48+
*0.635 men scored positive on the Exercise Stress Test*
49+
50+
c) Proportion of men in this sample who score “Positive” on the Exercise Stress Test and have disease (y1=1 & d=1)
51+
52+
```{r}
53+
54+
table(est1$d, est1$y1)/n
55+
56+
57+
58+
```
59+
*0.556 men score “Positive” on the Exercise Stress Test and have disease*
60+
61+
From a-c, do you think Disease status and result from EST are independent? Why or Why not?
62+
63+
*Yes, I think the results are Independent. This is because the results of testing positive for the disease and the Stress Test are not dependent on the other, and is not a predictable factor for disease. One event does not lead to the other.*
64+
65+
66+
Also compute:
67+
d) Sensitivity, Specificity, and Predictive Value Positive of EST (y1)
68+
69+
```{r}
70+
71+
```
72+
73+
*Sensitivity: 0.7967, Specificity: 0.7398, Predictive Positive Value: 0.8763*
74+
75+
e) Sensitivity, Specificity, and Predictive Value Positive of History of Chest Pain (y2)
76+
77+
```{r}
78+
79+
table(est1$d, est1$y2)
80+
xtabs(~est1$d + est1$y2)
81+
82+
```
83+
*Sensitivity: 0.9472, Specificity: 0.4457, Predictive Positive Value: 0.7982*
84+
85+
86+
If you were choosing one of the two as a screening marker, and the most important thing was not to miss cases, which might you prefer?
87+
*Based on the results for the 2 questions above I would prefer to use the test looking at a history of chest pain*
88+
89+
If you were a patient with a positive on one of the two markers, which would give you a better shot of not having disease?
90+
*The one which would give me a better chance at not having the disease would be the positive on the stress test.*
91+
92+
```{r}
93+
### *Below here is R code that might be helpful:*
94+
#cass <- read.csv("Users/MCCStudent/Desktop/cass.txt") #Change the above ^^^ to match your path
95+
96+
#head(cass)
97+
#summary(cass)
98+
99+
#What proportion have disease; SINCE d coded 0/1!!!
100+
#mean(cass$d)
101+
102+
#What proportion have disease AND y1=1
103+
#mean(cass$d & cass$y1)
104+
105+
#What proportion have y1=1 AMONG those with disease
106+
#mean(cass$y1[cass$d==1]) or mean(subset(cass,d==1)$y1)
107+
108+
# xtabs(~d+y1,data=cass)
109+
# xtabs(~d+y2,data=cass)
110+
```
111+
112+
## Other homework: Chapter 6, #8 and #10 from Principles of Biostatistics:
113+
114+
8. For Mexican American infants born in Arizona in 1986 and 1987, the probability that a child’s gestational age is less than 37 weeks is 0.142 and the probability that his or her birth weight is less than 2500 grams is 0.051. Furthermore, the probability that the two events occur simultaneously is 0.031
115+
116+
a) Let A be the event that an infant’s gestational age is less than 37 weeks, and B be the event that his or her birth weight is less than 2500 grams. Construct a Venn diagram to illustrate the relationship between A and B (this can be sketched by hand, or use shapes in Word; doesn't have to be done on R, though it should be roughly to scale)
117+
118+
b) Are A and B independent?
119+
120+
c) For a randomly selected Mexican American newborn, what is the probability that A or B or both occur?
121+
122+
d) What is the probability that event A occurs if event B occurs?
123+
124+
125+
126+
127+
128+
10. The probabilities associated with the expected principal source of payment for hospital discharges in the United States in 1990 are listed below:
129+
130+
Principal Source of Payment | Probability
131+
----- | -------
132+
Private Insurance | 0.387
133+
Medicare | 0.345
134+
Medicaid | 0.116
135+
Other Government Program | 0.033
136+
Self-payment | 0.058
137+
Other/No charge | 0.028
138+
Not stated | 0.033
139+
----- | -------
140+
Total | 1.00
141+
142+
a) What is the probability that the principal source of payment for a given hospital discharge is the patient’s private insurance?
143+
144+
b) What is the probability that the principal source of payment is a government program (including Medicare, Medicaid, and Other)?
145+
146+
c) Given that the primary source of payment is a government program, what is the probability that it is Medicare?

0 commit comments

Comments
 (0)