Skip to content

TR improvement #203

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 of 2 tasks
maxaalexeeva opened this issue Mar 26, 2021 · 11 comments
Open
1 of 2 tasks

TR improvement #203

maxaalexeeva opened this issue Mar 26, 2021 · 11 comments
Assignees

Comments

@maxaalexeeva
Copy link
Contributor

maxaalexeeva commented Mar 26, 2021

  • handle parameter settings expressed in natural language, e.g., where b is a positive number
  • handle compound vars with greek characters, e.g. _ αs and αc are soil evaporation coefficient and crop transpiration coefficient, respectively._ alpha c does not get reduced to one variable even though keepLongestVariable should take care of that.
@maxaalexeeva
Copy link
Contributor Author

@alicekwak @paul, if you notice some issues with TR extractions or any cases you think we should handle related to any currently extracted mentions (variables, definitions, parameter settings and intervals, or units), feel free to throw them in on this thread, esp if you think that is something you would like me to have a look at.

@maxaalexeeva maxaalexeeva self-assigned this Mar 26, 2021
@alicekwak
Copy link
Contributor

Issues for definition extraction

  • handle cases where multiple definitions are given for a variable. (i.e. t2b - kl is the water extraction rate, an empiric soil–root factor for the fraction of available water that can be supplied to the plant from each rooted soil layer, t4h - the limiting value R∞, called the total size of the epidemic which is the total number of people having the disease at the end of the epidemic.)
  • handle cases where expansion on acl:relcl causes issues (i.e. t2c - "distance from the axial center (m), S is a sink or source term" is wrongly extracted as the definition of d21)
  • handle cases where a variable's definition is separated by parentheses. (i.e. t2l - standardized reference crop evapotranspiration for short (ETos) or tall (ETrs) surfaces)

@alicekwak
Copy link
Contributor

alicekwak commented Apr 7, 2021

Issues for variable extraction

  • handle variables with special characters (i.e. R∞ in t4h).
  • handle compound variables with complex structure (i.e. e°(Tmax) in t4l). needs to add cases like this to compound var rule and looksLikeAVar action.

@alicekwak
Copy link
Contributor

alicekwak commented Apr 26, 2021

Issues for function extraction

  • handle mentions with overlapping inputs (i.e. t6f: one variable input (LAI) & one concept input (respective LAI) overlapping with each other, but not having the exact same tokenInterval)
  • need a separate expansionHandler for function rules (i.e. t3a: for function extraction, it should not expand on conj_and, but for definition extraction, it should.)

@alicekwak
Copy link
Contributor

alicekwak commented Jun 23, 2021

Issue for parameter setting

  • fractions with variables as denominators interfere with parameter setting
    ex1) If we take a = 0.47 and b = 0.103 , then 1 / a + 1 / b = 11.83. (italicized part captured as parameter setting)
    ex2) so the length of the latent period is distributed exponentially with mean equals to e-bs ds = 1 / b (italicized part captured as parameter setting)

@alicekwak
Copy link
Contributor

alicekwak commented Jul 21, 2021

Issues for parameter setting

  • parameter setting for phrases is not extracted.
    (See below for the example sentences; examples are either from 2003-double-epidemic paper or from "A Modeling Approach to Determining the Relationship Between Erosion and Soil Productivity)

  • Note that the SIR model is the limiting case of the SEIR model when the time interval from the infection to onset is zero;

  • most of the daily new number of confirmed cases from the community stayed around 15 to 25

  • The limiting value of the curve R is 1011.

  • The observed mean of the time from onset to admission is about 3.75.

  • If the temperature in the second soil layer is less than 0 °C, the curve number is assigned a value of 98 regardless of the soil water content.

  • If a snow cover exists with 5 mm or greater water content, the value of albedo is set to 0.8.

  • If irrigation water is applied to land without furrows, the peak runoff rate is assumed to be 0.00189 m3/s per m of field width.

  • The lower limit of enrichment ratio is 1.0

  • LFI is the labile P immobilization factor allowing the P:C ratio of soil microorganisms to range from 0.01 to 0.02 as a function of labile P concentration. The labile P immobilization factor varies linearly from 0.01 for cLP = 0 to 0.02 for cLP = 10 and remains constant at 0.02 for cLP > 10.

  • At planting time, the model takes a soil sample and applies up to 15 kg/ha of N fertilizer if needed.

  • "the value of identifier" or "the value of number" patterns are not properly addressed by the current parameter setting rules

  • If the parameters above are used to simulate the future spread of epidemic we obtain the value of R∞ to be 350.

  • The value of EA ranges from 0 to 1.0 according to the equation (Williams et al., 1983)

  • Plant evaporation is computed as a linear function of LAI and E0 up to a LAI value of 3.

  • The value of kd used in EPIC is 175.

  • The value of BFT ranges from 0 to 1.0.

  • Other patterns which current parameter setting rules fail to account for

  • The time t0 is the critical point of the function I at which dl/dt = 0 (note: should this be addressed as a function?)

  • where CE, the crop management factor, has a constant value of 0.5.

  • DN is the denitrification rate in layer j in kg/ha per d, CDN is the denitrification constant (~-0.035),

  • DCR' is the decay rate constant adjusted for limiting N or P, and 0.16 is the result of assuming that c = 0.4 FR and 0.4 of the c in FR is assimilated by soil microorganisms;

  • EFL is the Kamprath (1970) efficiency factor (~1.5)

  • where BD2 is the bulk density that gives SS ~ 0.2 for a particular percent sand, SAN.

  • Million only (not 0.5 million or 13.82 million, even though 0.5 million and 13.82 million are captured as values) is linked to IP(0) and S(0) in the examples below (probably due to keepLongest?)

  • Since the population of Hong Kong is about 6.8 million, we therefore assume that S(0) = 6.8 – 0.5 = 6.3 million. (note: allowed “a - b” to be a value, would it be okay to keep it this way??)

  • First, we assume that IP(0) = 0.5 million, that is 0.5 million Hong Kong people are infected by disease B.

  • The initial population S(0) was set to be 13.82 millions;

  • Cases where values captured as parameter setting where they shouldn't be. (Probably because values are phrases themselves.)

  • Therefore, the observed sum 10.12 = 6.37+3.75 which is close to the estimation 11.83.

@maxaalexeeva
Copy link
Contributor Author

maxaalexeeva commented Jul 21, 2021

Issues for parameter setting

  • parameter setting for phrases is not extracted.
    (See below for the example sentences; examples are either from 2003-double-epidemic paper or from "A Modeling Approach to Determining the Relationship Between Erosion and Soil Productivity)
  • Note that the SIR model is the limiting case of the SEIR model when the time interval from the infection to onset is zero;
  • most of the daily new number of confirmed cases from the community stayed around 15 to 25
  • added rule for stay; might overgenerate; need to expand concepts in param settings.
  • The limiting value of the curve R is 1011.

1011 is labeled DATE by the parser, and we are avoiding dates. Have to make peace with dates extracted as values then...

  • The observed mean of the time from onset to admission is about 3.75.
  • extracts but needs expansion
  • If the temperature in the second soil layer is less than 0 °C, the curve number is assigned a value of 98 regardless of the soil water content.

added a less-than int param setting rule and "assigned" param setting rule

  • If a snow cover exists with 5 mm or greater water content, the value of albedo is set to 0.8.
  • If irrigation water is applied to land without furrows, the peak runoff rate is assumed to be 0.00189 m3/s per m of field width.
  • "assumed" added to the "assigned" rule
  • The lower limit of enrichment ratio is 1.0
  • added a lower limit rule
  • LFI is the labile P immobilization factor allowing the P:C ratio of soil microorganisms to range from 0.01 to 0.02 as a function of labile P concentration. The labile P immobilization factor varies linearly from 0.01 for cLP = 0 to 0.02 for cLP = 10 and remains constant at 0.02 for cLP > 10.
  • adjusted ranges and vary rules; added a constant rule
  • At planting time, the model takes a soil sample and applies up to 15 kg/ha of N fertilizer if needed.
  • a very flaky rule added
  • "the value of identifier" or "the value of number" patterns are not properly addressed by the current parameter setting rules

Made small adjustments to existing rules and added a value_of rule, so this section is mainly fine except for one comment below

  • If the parameters above are used to simulate the future spread of epidemic we obtain the value of R∞ to be 350.
  • The value of EA ranges from 0 to 1.0 according to the equation (Williams et al., 1983)
  • Plant evaporation is computed as a linear function of LAI and E0 up to a LAI value of 3.
  • The value of kd used in EPIC is 175.

keep longest action interferes

  • The value of BFT ranges from 0 to 1.0.
  • Other patterns which current parameter setting rules fail to account for
  • The time t0 is the critical point of the function I at which dl/dt = 0 (note: should this be addressed as a function?)
  • where CE, the crop management factor, has a constant value of 0.5.
  • DN is the denitrification rate in layer j in kg/ha per d, CDN is the denitrification constant (~-0.035),
  • DCR' is the decay rate constant adjusted for limiting N or P, and 0.16 is the result of assuming that c = 0.4 FR and 0.4 of the c in FR is assimilated by soil microorganisms;
  • not sure what to include here.
  • EFL is the Kamprath (1970) efficiency factor (~1.5)
  • added a tilda rule
  • where BD2 is the bulk density that gives SS ~ 0.2 for a particular percent sand, SAN.
  • Million only (not 0.5 million or 13.82 million, even though 0.5 million and 13.82 million are captured as values) is linked to IP(0) and S(0) in the examples below (probably due to keepLongest?)
  • mainly fixed the million issue
  • Since the population of Hong Kong is about 6.8 million, we therefore assume that S(0) = 6.8 – 0.5 = 6.3 million. (note: allowed “a - b” to be a value, would it be okay to keep it this way??)
  • First, we assume that IP(0) = 0.5 million, that is 0.5 million Hong Kong people are infected by disease B.
  • The initial population S(0) was set to be 13.82 millions;
  • Cases where values captured as parameter setting where they shouldn't be. (Probably because values are phrases themselves.)
  • Therefore, the observed sum 10.12 = 6.37+3.75 which is close to the estimation 11.83.
  • adjusted a rule

@maxaalexeeva
Copy link
Contributor Author

  • The time t0 is the critical point of the function I at which dl/dt = 0

@maxaalexeeva
Copy link
Contributor Author

with the commented out part, I lose discont char attachment in this sentence: The model consists of individuals who are either Susceptible (S), Infected (I), or Recovered (R).

if (newDescriptions.nonEmpty) {// && newDescriptions.length == variables.length) {

but without it there's an error when there are too few new descriptions (shows up in running cosmos doc corpus, possibly in this doc: APSIM7.10_CropModule_WheatDocumentation.json).

@maxaalexeeva
Copy link
Contributor Author

  • in this sentence, expanding the function variable also expands the variable in the param setting (in the param setting branch):

_ The barley module allows a total retranslocation of no more than 20 % of stem biomass present at the start of grainfilling Grain yield on a commercial moisture basis is calculated using the parameter grn_water_cont = 0.125 ._

possible solutions:

  • only expand vars in functions if the var is a concept (not identifier) - this is probably the way to go
  • separating func and param setting expansion

@maxaalexeeva
Copy link
Contributor Author

maxaalexeeva commented Aug 11, 2021

  • The current model specifies sla_max as varying from 27 000 to 22000 mm 2 g -1

min and max are in the wrong order

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants