Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add sus scrofa #1312 #1672

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

AprilYUZhang
Copy link

No description provided.

@gregorgorjanc
Copy link
Contributor

I will work with @AprilYUZhang on revising this PR (for issue #1312) and the follow-up QC.

_ZhangEtAl = stdpopsim.Citation(
doi="https://doi.org/10.1016/j.gpb.2022.02.001",
year=2022,
author="Mingpeng Zhang et al.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to author="Zhang et al.",

_JohnssonEtAl = stdpopsim.Citation(
doi="https://doi.org/10.1186/s12711-021-00643-0",
year=2021,
author="Martin Johnsson et al.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to author="Johnsson et al.",

"16": 1.1407464686635653e-08,
"17": 1.3779949516098884e-08,
"18": 1.3679923888648167e-08,
"X": -1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should provide a value for X chromosome too. I appreciate that Johnsson et al. (2021) does not list X-chromosome recombination rate. Hence, I suggest that you calculate average recombination rate across other chromosomes (exclude Y and MT) and replace this for the X-chromosome rec rate value. Make this calculation clear in the code here.

# citation is optional and can be deleted if not needed.]
citations=[
stdpopsim.Citation(
author="", year=-1, doi="", reasons={stdpopsim.CiteReason.ASSEMBLY}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AprilYUZhang provide reference for the pig genome assembly that we use here

common_name="Pig",
genome=_genome,
ploidy=2,
# The age at first pregnancy varies in the wild from about 10 to 20 months,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AprilYUZhang cite the paper here in the comments and point to page/figure/table for there estimates

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I have no idea; this sentence is from the explanation of generation time. Should I write "the details see in _ZhangEtAl"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point to where in the paper is this detail, say page

author="Martin Johnsson et al.",
reasons={stdpopsim.CiteReason.REC_RATE},
)
# [The following are notes for implementers and should be deleted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AprilYUZhang cite the paper here in the comments and point to page/figure/table for there estimates

Copy link
Contributor

@gregorgorjanc gregorgorjanc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AprilYUZhang I left some comments for you to address. I will then have another look. Thanks for this addition to the stdpopsim catalogue!

"MT": 0,
}

# the de novo mutation rate is inferred by Mingpeng Zhang et al 2022,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AprilYUZhang point to page/figure/table for there estimates from Zhang et al

Copy link

codecov bot commented Feb 5, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.85%. Comparing base (237d101) to head (ee06b14).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1672   +/-   ##
=======================================
  Coverage   99.85%   99.85%           
=======================================
  Files         136      139    +3     
  Lines        4690     4731   +41     
  Branches      470      470           
=======================================
+ Hits         4683     4724   +41     
  Misses          3        3           
  Partials        4        4           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gregorgorjanc
Copy link
Contributor

I advised @AprilYUZhang offline on this implementation, which is now very complete, and I volunteer to QC it.

I have a question though about recombination rate averaging. The information on the recombination rate for this species comes from a study with an extensive pedigree (so we deem it's very good) and the study reported the recombination rate per Mb of each autosome. To get a value for the X chromosome, @AprilYUZhang has:

  1. summed these recombination rates per Mb for each chromosome and divided by the chromosome length, to get recombination rate per basepair for each autosome
  2. calculated weighted average to get average recombination rate per basepair across the genome
  3. used the value from 2) for the X chromosome.

I think this makes sense, but would like to hear from others here what you have done in such cases. I think that weighted average to represent the recombination rate per basepair for the genome is the right thing to do (average of ratios ...). But is this the right value to use for the X chromosome (in the absence of other information) or should one use the simple (unweighted) average. It's a minor detail, but I thought I would ask here.

Otherwise, I think this looks very good @AprilYUZhang and could soon be merged in by the maintainers. You will have to squash the commits before this happens though!

@AprilYUZhang is also looking at adding the recombination map and demographic models later.

@petrelharp
Copy link
Contributor

Weighted average is consistent with what we do elsewhere - sounds good!

# which is roughly equivalent to the average age of
# the first pregnancy in sows and the beginning of rut in boars
# plus the pregnant gestation period of sows."
generation_time=3,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm - generation time isn't the average age of first pregnancy of an individual, it's the average age of a parent at time of birth of their child, i.e., the thing you'd get by picking a random individual and asking how old their parent was when they were born. For instance, human generation time is 25-30 years (estimates vary).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@petrelharp your are totally right, but this is verbatim text from the paper that we are following, so I guess we stick to their 3 years, which is an OK value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Um, I guess? I mean, according to the stated rationale, that explanation is an underestimate, probably by a good factor, unless most pigs die after one litter.

This paper used 5 years, which the paper you're citing cites several other papers as using? Hm, that's "based on a studbook"; I wonder what the method is?

However, googling suggests that a mean life span is like 4-8 years; so 3 can't be too far off (maybe just a factor of 2). If you do want to use this one, then make sure you document that it is not the actual generation time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@petrelharp you are completely right here! Darn, I should know this (generation interval is a key quantity in the breeder's equation) :( I guess I was too fixated on following the publication. Indeed, looking now at this, 3 years is indeed low. @AprilYUZhang best if you follow https://link.springer.com/article/10.1186/s12711-016-0204-2 for the generation interval statistic. This will be a "fun" discrepancy when dealing with demographic models:(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if part of the discrepancy is that for the breeder's equation the relevant "generation time" is "how fast can we get these pigs to breed" which is more like the minimum value, not the average-in-the-wild value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, your definition still stands! Generation interval is the average age of parents at the age of their progeny. Having said this, to increase response to selection, breeders aim to reduce this average by replacing parents as soon as possible.

@petrelharp
Copy link
Contributor

Looks good! Just a thing about the generation time.

@AprilYUZhang
Copy link
Author

Thank you for your suggestions @gregorgorjanc @petrelharp! I will follow these comments. I still have a question: do we need to change other parameters? Because now there are mainly two sets of pig demography paramsters. One is sourced from "Analyses of pig genomes provide insight into porcine demography and evolution" where the generation interval is 5 years and the mutation rate is $2.5 × 10^{-8}$. Another is "Revisiting the Evolutionary History of Pigs via De Novo Mutation Rate Estimation in A Three-generation Pedigree" where the generation interval is 3 years and the mutation rate is $3.6 × 10^{-9}$. There is a quite big difference between these two papers.

@petrelharp
Copy link
Contributor

That is a good question. I think that (a) for the "species" parameters, choose what you judge to be the "best". However, for just this sort of reason, demographic models can specify their own mutation rate (and I think generation time?) that overrides the species parameters, so things are self-consistent. If you run into problems, let me know?

@gregorgorjanc
Copy link
Contributor

gregorgorjanc commented Feb 11, 2025

@AprilYUZhang That 2.5x10^-8 is from human studies (at least that's what https://pmc.ncbi.nlm.nih.gov/articles/PMC3566564/ and https://pmc.ncbi.nlm.nih.gov/articles/PMC4053821/ state, and https://link.springer.com/article/10.1186/s12711-016-0204-2 cites these two papers as a source).

The one you started with (3.6x10^-9) has an independent estimate from de-novo mutations in a pedigree, so it should be better, but it's an order of magnitude lower than the human one (which might be ok) and the recombination rate (which makes me wonder).

Annoyingly, this new demographic model preprint https://www.biorxiv.org/content/10.1101/2025.02.05.636574v1 also used 5 years for generation interval and 2.5x10^-8 mutation rate (citing https://link.springer.com/article/10.1186/s12711-016-0204-2), so these values keep propagating because everyone uses them:(

Chrissy has done work in this space (https://icqg6.org/wp-content/uploads/2020/11/ICQG6_2020_Abstract_Book.pdf#page=85, https://link.springer.com/article/10.1186/s12864-023-09296-3) so talk to her - she will be able to guide you to the best current estimate for mutation rate.

But for the generation interval, I do think that 3 years could indeed be low, particularly for a wild population, which would correspond to most of the period that we would simulate here! I reckon that 5 years is a better estimate, however I am now starting to wonder if the value of 5 is propagated through literature similarly as the human mutation rate!? Can you please check the average age at first litter for sows and for boars and their average lifespan and a reference that looked into this without plucking a value from the air?

@AprilYUZhang
Copy link
Author

Hi @gregorgorjanc and @petrelharp, I sort out the literature about generation interval; maybe we can balance according to these studies.

study subject generation time reference annotation
Wild boar for PSMC inference 5 years “Analyses of Pig Genomes Provide Insight into Porcine Demography and Evolution.” Nature 491 (7424): 393–98. https://doi.org/10.1038/nature11622.
French and Italian wild boars 3.6 year “Influence of Harvesting Pressure on Demographic Tactics: Implication for Wildlife Management.” Journal of Applied Ecology 48 (June):835–43. https://doi.org/10.1111/j.1365-2664.2011.02017.x. generation times were 3.6 years in the lightly hunted population and only 2.3 years in the heavily hunted population. Calculated generation time as the inverse of the relative elasticity of the population growth rate to a change in all recruitment parameters
Wild boar and domesticated pigs 3 years “Revisiting the Evolutionary History of Pigs via De Novo Mutation Rate Estimation in a Three-Generation Pedigree.” Genomics, Proteomics & Bioinformatics 20 (6): 1040–52. https://doi.org/10.1016/j.gpb.2022.02.001 The ages of estrus in sows and boars in the wild are different. The age at first pregnancy varies in the wild from about 10 to 20 months [26], while boars begin rut when they are three to five years old. The first rut age of 4–5 years was documented in Russian wild boars [27], and 3–4 years was recorded in Chinese wild boars [28]. Pigs are multiparous animals, and the gestation period lasts about 114–130 days [29]. Comparing to the animals with single birth, such as cattle and yaks, we believe that the generation transmission of pigs is of good continuity. Therefore, we set the generation interval of pigs as 3 years, which is roughly equivalent to the average age of the first pregnancy in sows and the beginning of rut in boars plus the pregnant gestation period of sows.
two closed purebred maternal pig lines 1.43 / 1.42 years (two lines). “Changes in Allele Frequencies and Genetic Architecture Due to Selection in Two Pig Populations.” Genetics Selection Evolution 56 (1): 76. https://doi.org/10.1186/s12711-024-00941-3. calculate according to pedigree and practical data collected from breeding system

@AprilYUZhang
Copy link
Author

About average age, as the below figure shows in the second paper's results.
Screenshot 2025-02-17 at 02 24 52
Yearling is 0-1 years, sub-adults are 1-2 years, adults are 3 and more than 3.
From this age structure, 3.6 years looks rational. I think this number is more likely to be less than 5.

@AprilYUZhang
Copy link
Author

This is the literature review of mutation rate. Interestingly, these results show a big difference.

paper type mutation rate (per site per generation) trios
Analyses of pig genomes provide insight into porcine demography and evolution [@groenenAnalysesPigGenomes2012] - $2.5*10^{–8}$
Revisiting the Evolutionary History of Pigs via De Novo Mutation Rate Estimation in a Three-Generation Pedigree [@zhangRevisitingEvolutionaryHistory2022] Germline de novo mutat $3.6*10^{-9}$ 6
Evolution of the germline mutation rate across vertebrates [@bergeronEvolutionGermlineMutation2023] Germline de novo mutation $4.32  *10^{-9}$ ~ 2
Estimating mutation rate and characterising single nucleotide de novo mutations in pigs C.M. Rochus (in view) Germline de novo mutation $8.2 * 10^{-9}$ per gamete 46

@gregorgorjanc
Copy link
Contributor

@AprilYUZhang for mutation rate I suggest you implement the value from zhangRevisitingEvolutionaryHistory2022 since they have more trios than bergeronEvolutionGermlineMutation2023 (and value is very close). However, once the work from Chrissy Rochus gets published, I advise we switch to her value - she has a much larger number of trios and from talking to her, she literally looked at every single de-novo mutation in her dataset and found out that it's very hard to find general criteria to discard or keep de-novo mutations in order to filter out data errors. I guess the later is one important source of differences in mutation rates between studies.

@gregorgorjanc
Copy link
Contributor

@AprilYUZhang generation interval is tricky since estimates are indeed very different.

The 5 year used in several papers comes from one paper we mentioned above that made an assumption (I guess because there was no good published data /estimates available!?).

As discussed in this thread, the 3 years assumption used in one of the papers is likely an underestimate - that was the average age of parents at first litter, but there are subsequent litters, hence an underestimate.

The estimate of 3.6 years from the lightly hunted wild pig population in Italy is interesting because it's some actual concrete data about the wild pigs, but it's tricky to know how much hunting there was. I guess there is selection/predation in nature too, so maybe this is as good as we will get!?

The low values of ~1.4 years come from an intense selective breeding programme, which is not a got reflection of genetic interval across a long period that we aim to simulate here.

Taken together, I vote for the 3.6 years, since that study is the only that has some concrete data. I have spent just now a bit of time and found it very hard to find a study that does not report selective breeding populations (with values of 2 years or so), just makes an assumption of 5 or 3 years, or cites the paper that assumed 5 or 3 years.

@@ -15,6 +15,16 @@
},
)

# Generation time in wild boar
_ZhangEtAl = stdpopsim.Citation(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AprilYUZhang change this to _ServantyEtAl and use this citation

@petrelharp
Copy link
Contributor

The estimate of 3.6 years from the lightly hunted wild pig population in Italy is interesting because it's some actual concrete data about the wild pigs

This is my vote, too! Recall that ancestral pig populations were also hunted (by humans and other now-scarce megafauna) - but also, this is our best available estimate, seems like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants