update

abdullahau · Mar 9, 2025 · e4c213d · e4c213d
1 parent f38575e
commit e4c213d
Show file tree

Hide file tree

Showing 10 changed files with 356 additions and 229 deletions.
diff --git a/01 - The Garden of Forking Data.qmd b/01 - The Garden of Forking Data.qmd
@@ -1597,7 +1597,7 @@ The terms **PMF**, **PDF, CDF**, and **PPF** are fundamental concepts in probabi
 **Key Differences**
 
 | **Term** | **Full Name** | **Applies To** | **Description** |
-|------------------|------------------|------------------|------------------|
+|----|----|----|----|
 | PMF | Probability Mass Function | Discrete random variables | Probability that a discrete random variable takes on a specific value. |
 | CDF | Cumulative Distribution Function | Discrete and continuous random variables | Probability that a random variable is less than or equal to a specific value. |
 | PDF | Probability Density Function | Continuous random variables | Relative likelihood of a continuous random variable taking on a specific value. |

diff --git a/02 - Sampling The Imaginary.qmd b/02 - Sampling The Imaginary.qmd
@@ -1498,7 +1498,7 @@ model = utils.StanQuap('stan_models/m2', m2, data)
 summary = model.precis().round(7)
 summary
 
-samples = model.draws()['p']
+samples = model.extract_samples()['p']
 
 # plot the posterior
 def plot_pos():

diff --git a/03a - Geocentric Models.qmd b/03a - Geocentric Models.qmd
@@ -888,7 +888,7 @@ The two-element vector in the output is the list of variances. If you take the s
 Okay, so how do we get samples from this multi-dimensional posterior? Now instead of sampling single values from a simple Gaussian distribution, we sample vectors of values from a multi-dimensional Gaussian distribution. The `StanQuap` object provides a convenience function to do exactly that:
 
 ```{python}
-post = m4_1_model.draws()
+post = m4_1_model.extract_samples()
 post = pd.DataFrame(post)
 post.head()
 ```
@@ -916,7 +916,7 @@ These samples also preserve the covariance between $µ$ and $σ$. This hardly ma
 
 #### Under the hood with multivariate sampling
 
-The `draw` and `laplace_sample` methods are for convenience. Once called, these Stan methods run simple simulation of the sort we conducted earlier. Here’s a peak at the motor. The work is done by a multi-dimensional version of `stats.norm.rvs`, `stats.multivariate_normal.rvs`. The function `stats.norm.rvs` simulates random Gaussian values, while `stats.multivariate_normal.rvs` simulates random vectors of multivariate Gaussian values. Here’s how to use it to do what `laplace_sample` does:
+The `extract_samples` and `laplace_sample` methods are for convenience. Once called, these Stan methods run simple simulation of the sort we conducted earlier. Here’s a peak at the motor. The work is done by a multi-dimensional version of `stats.norm.rvs`, `stats.multivariate_normal.rvs`. The function `stats.norm.rvs` simulates random Gaussian values, while `stats.multivariate_normal.rvs` simulates random vectors of multivariate Gaussian values. Here’s how to use it to do what `laplace_sample` does:
 
 ```{python}
 stats.multivariate_normal.rvs(

diff --git a/03b - Geocentric Models.qmd b/03b - Geocentric Models.qmd
diff --git a/R_code.qmd b/R_code.qmd
@@ -59,6 +59,20 @@ class BridgeStan(bs.StanModel):
             super().__init__(stan_so, data, make_args=make_args, warn=False)
 
 class StanQuap(object):
+  '''
+  Description:
+  Find mode of posterior distribution for arbitrary fixed effect models and 
+  then produce an approximation of the full posterior using the quadratic    
+  curvature at the mode.
+  
+  This command provides a convenient interface for finding quadratic approximations 
+  of posterior distributions for models defined in Stan. This procedure is equivalent 
+  to penalized maximum likelihood estimation and the use of a Hessian for profiling, 
+  and therefore can be used to define many common regularization procedures. The point 
+  estimates returned correspond to a maximum a posterior, or MAP, estimate. However the 
+  intention is that users will use `draws` and `laplace_sample` and other methods to work 
+  with the full posterior.
+  '''
     def __init__(self,
                  stan_file: str, 
                  stan_code: str, 
@@ -101,8 +115,8 @@ class StanQuap(object):
                                               draws=draws, 
                                               jacobian=self.jacobian)
       
-    def draws(self, draws: int = 100_000, dict_out: bool = True):
-        laplace_obj = self.laplace_sample(draws)
+    def extract_samples(self, n: int = 100_000, dict_out: bool = True):
+        laplace_obj = self.laplace_sample(n)
         if dict_out:
           return laplace_obj.stan_variables()
         return laplace_obj.draws()
@@ -179,8 +193,8 @@ class StanQuap(object):
           'Parameter': list(self.params.keys()),
           'Mean': pos_mu,
           'StDev': pos_sigma,
-          f'{plo:.2%}': lo,
-          f'{phi:.2%}': hi})
+          f'{plo:.1%}': lo,
+          f'{phi:.1%}': hi})
         return res.set_index('Parameter')
 ```
 
@@ -422,32 +436,41 @@ library(rethinking)
 data(Howell1)
 d <- Howell1
 d2 <- d[ d$age >= 18 , ]
+plot( d2$height ~ d2$weight )
+```
+
+
+```{r}
+# load data again, since it's a long way back
+library(rethinking)
+data(Howell1); d <- Howell1; d2 <- d[ d$age >= 18 , ]
+# define the average weight, x-bar
+xbar <- mean(d2$weight)
+# fit model
+
+m4.3 <- quap(
+alist(
+height ~ dnorm( mu , sigma ) ,
+mu <- a + b*( weight - xbar ) ,
+a ~ dnorm( 178 , 20 ) ,
+b ~ dlnorm( 0 , 1 ) ,
+sigma ~ dunif( 0 , 50 )
+) , data=d2 )
 ```
 
 ```{r}
-d3 <- sample( d2$height , size=20 )
-
-mu.list <- seq( from=150, to=170 , length.out=200 )
-sigma.list <- seq( from=4 , to=20 , length.out=200 )
-post2 <- expand.grid( mu=mu.list , sigma=sigma.list )
-post2$LL <- sapply( 1:nrow(post2) , function(i)
-sum( dnorm( d3 , mean=post2$mu[i] , sd=post2$sigma[i] ,
-log=TRUE ) ) )
-post2$prod <- post2$LL + dnorm( post2$mu , 178 , 20 , TRUE ) +
-dunif( post2$sigma , 0 , 50 , TRUE )
-post2$prob <- exp( post2$prod - max(post2$prod) )
-sample2.rows <- sample( 1:nrow(post2) , size=1e4 , replace=TRUE ,
-prob=post2$prob )
-sample2.mu <- post2$mu[ sample2.rows ]
-sample2.sigma <- post2$sigma[ sample2.rows ]
-plot( sample2.mu , sample2.sigma , cex=0.5 ,
-col=col.alpha(rangi2,0.1) ,
-xlab="mu" , ylab="sigma" , pch=16 )
+precis( m4.3 )
+pairs(m4.3)
 ```
 
 ```{r}
-dens( sample2.sigma , norm.comp=TRUE )
+mu <- link( m4.3 )
+?link
 ```
 
 
+```{r}
+m4.3@links
+```
+
 
diff --git a/stan_models/m4_3.stan b/stan_models/m4_3.stan
@@ -12,6 +12,7 @@ parameters {
 }
 model {
     vector[N] mu;
+    // linear model
     mu = a + b * (weight - xbar);
 
     // Likelihood Function

diff --git a/stan_models/m4_3b b/stan_models/m4_3b
diff --git a/stan_models/m4_3b.stan b/stan_models/m4_3b.stan
@@ -0,0 +1,24 @@
+
+data {
+    int<lower=0> N;
+    vector[N] height;
+    vector[N] weight;
+    real xbar;
+}
+parameters {
+    real a;
+    real log_b;
+    real<lower=0, upper=50> sigma;
+}
+model {
+    vector[N] mu;
+    mu = a + exp(log_b) * (weight - xbar);
+
+    // Likelihood Function
+    height ~ normal(mu, sigma);
+
+    // Priors
+    a ~ normal(178, 20);
+    log_b ~ normal(0, 1);
+    sigma ~ uniform(0, 50);
+}
diff --git a/stan_models/m4_3b_model.so b/stan_models/m4_3b_model.so
diff --git a/utils.py b/utils.py
@@ -56,6 +56,20 @@ def __init__(self, stan_file: str, data: dict, force_compile=False):
             super().__init__(stan_so, data, make_args=make_args, warn=False)
 
 class StanQuap(object):
+  '''
+  Description:
+  Find mode of posterior distribution for arbitrary fixed effect models and 
+  then produce an approximation of the full posterior using the quadratic    
+  curvature at the mode.
+  
+  This command provides a convenient interface for finding quadratic approximations 
+  of posterior distributions for models defined in Stan. This procedure is equivalent 
+  to penalized maximum likelihood estimation and the use of a Hessian for profiling, 
+  and therefore can be used to define many common regularization procedures. The point 
+  estimates returned correspond to a maximum a posterior, or MAP, estimate. However the 
+  intention is that users will use `draws` and `laplace_sample` and other methods to work 
+  with the full posterior.
+  '''
     def __init__(self,
                  stan_file: str, 
                  stan_code: str, 
@@ -98,8 +112,8 @@ def laplace_sample(self, draws: int = 100_000):
                                               draws=draws, 
                                               jacobian=self.jacobian)
 
-    def draws(self, draws: int = 100_000, dict_out: bool = True):
-        laplace_obj = self.laplace_sample(draws)
+    def extract_samples(self, n: int = 100_000, dict_out: bool = True):
+        laplace_obj = self.laplace_sample(draws=n)
         if dict_out:
           return laplace_obj.stan_variables()
         return laplace_obj.draws()
@@ -203,16 +217,16 @@ def invlogit(x: float) -> float:
 
 
 def precis(samples, prob=0.89):
-    if isinstance(samples, dict):
+    if isinstance(samples, dict) or isinstance(samples, pd.Series):
         samples = pd.DataFrame(samples)
     plo = (1-prob)/2
     phi = 1 - plo   
     res = pd.DataFrame({
         'Parameter':samples.columns.to_numpy(),
         'Mean': samples.mean().to_numpy(),
-      'StDev': samples.std().to_numpy(),
-      f'{plo:.1%}': samples.quantile(q=plo).to_numpy(),
-      f'{phi:.1%}': samples.quantile(q=phi).to_numpy()
+        'StDev': samples.std().to_numpy(),
+        f'{plo:.1%}': samples.quantile(q=plo).to_numpy(),
+        f'{phi:.1%}': samples.quantile(q=phi).to_numpy()
     })
     return res.set_index('Parameter')