You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The proposed change is to the fitting interface. The interface should allow arbitrary models to be applied to a property for fitting. Currently, only linear models with a potential centering parameter are supported. The change should support applying exponential, Gaussian, etc. models for a particular parameter. By generalizing the current interface, in the future, the number of models can be easily extended or developed by users.
Goals
Provide a utility for fitting idpflex data structures using arbitrary sets of lmfit models. The solution should be simpler than what a user can achieve by interacting directly with lmfit. This can be achieved since idpflex can leverage details of data structures which may be unfamiliar to users (tree, PropertyDict, Parameters, etc.). Example use cases include fitting a tree of properties or fitting to an arbitrary set of property groups.
Non Goals
The solution should not implement a new parameter, model, or fitting interface and/or structure. The solution should not hide the structure or process internally and should make the effort to expose the model, parameters, and fitting to the user. The solution for multiple structures will be a linear combination of the structures. The models applied to properties will expect independence from the other properties. Multiple properties will be concatenated to create a feature vector.
Proposed Design
The design will be layered. First, a generic solution for creating a model of (potentially) multiple properties will be developed. Second, a multi-structure model can be made by linearly combining the multi-property models. Finally, a function for modelling and fitting a tree will be described.
Model Requirements
For consistency, all models being applied to the properties should take a keyword argument prop during initialization. For consistency, all models being applied to the properties should have a single independent parameter named x. This somewhat breaks the Non Goal of changing the model interface by putting restrictions on models. An alternative approach would be nice. An example model is below.
The first aspect of the proposed design is to have a function that will accept a PropertyDict and a dictionary/list of lmfit models. This will provide a simple interface for fitting multiple properties of a single structure with arbitrary models. The container of models should have a model for each property. In the case of a single model instead of a container of models, the same model will be applied to each property. The function will apply the appropriate model to each property and create a composite model of the concatenation of these models. The composite model will have a complete set of parameters for all of the sub-models which will be prefixed by the property name.
An example which minimizes the following equations using optional weights. sans_ws*((sans_slope*sansProp + sans_intercept) - exp_sansProp) and saxs_ws*(saxs_c - exp_saxsProp).
properties=PropertyDict([sansProp, saxsProp])
exp_properties=PropertyDict([exp_sansProp, exp_saxsProp])
models= [LinearModel, ConstantModel]
multiproperty_model=create_model_from_property_group(properties, models, ws=None)
# Yielding# multiproperty_model.make_params() == Parameters([Parameter('sans_slope', ...),# Parameter('sans_intercept', ...),# Parameter('saxs_c', ...)])# The parameter constraints, values, etc could then be changed.params=multiproperty_model.make_params()
forparaminparams:
if'intercept'inparam.name:
pass# Which can be fit usingmultiproperty_fit=multiproperty_model.fit(exp_properties.feature_vector,
x=exp_properties.feature_domain,
weights=None, # could be changedmethod='leastsq', # could be changedparams=params)
Multiple Structure Fitting
The above will be used as an internal building block for multi-structure fitting. The goal will be to provide a function that will take a list of property structures and a container of models. The container of models will be directly passed to the function described above. The result will be a single model that is composed of the output of the above. The parameters that are in common across structures will be linked to have the same shared value. Additionally, the structures will be linearly combined using probabilities that sum to one.
Note: Each sub-model must have unique parameter names as required by lmfit which creates this large number of redundant parameters.
An example: An example which minimizes the following equations using optional weights.
properties= [PropertyDict([sansProp0, saxsProp0]), PropertyDict([sansProp1, saxsProp1])]
models= [LinearModel, ConstantModel]
multistructure_model=create_model_from_property_groups(properties, models)
# Result# multistructure_model.params == Parameters([Parameter('struct0_sans_slope', expr='sans_slope', ...),# Parameter('struct1_sans_slope', expr='sans_slope', ...),# Parameter('struct0_sans_intercept', expr='sans_intercept, ...),# Parameter('struct1_sans_intercept', expr='sans_intercept, ...),# Parameter('struct0_saxs_c', expr='saxs_c, ...),# Parameter('struct1_saxs_c', expr='saxs_c, ...),# Parameter('sans_slope', ...),# Parameter('sans_intercept', ...),# Parameter('saxs_c', ...),# Parameter('struct0_prob', min=0, max=1, ...),# Parameter('struct1_prob', expr='1 - (struct0_prop)', min=0, max=1, ...),# ])# The parameter constraints, initial values, etc could then be changed.params=multistructure_model.make_params()
forparaminparams:
if'intercept'inparam.name:
pass# Which can be fit usingmultiproperty_fit=multistructure_model.fit(exp_properties.feature_vector,
x=exp_properties.feature_domain,
weights=None, # could be changedmethod='leastsq', # could be changedparams=params)
Tree Fitting
Tree fitting will use multi-structure fitting at each depth in the same procedure as currently available in idpflex. The function will take a tree filled with PropertyDict and a list of models, one for each property. It will output a list of multi-structure models described above. A utility fitting function (the same as currently available) can be used to fit every depth of the tree.
Additional Considerations
Parameter Initializations
During what step should parameters be initialized, bounded, etc? Probabilities can be set to equal for all structures. What responsibility does idpflex have for initializing parameters (lmfit leaves it to the user/model creator). Values can (should?) be applied at function definition def func(x, slope=1, intercept=0, prop=None):.
Should all models be required to implement a guess method? This could potentially simplify initialization but increases the difficulty for users to create custom models.
Parameter Adjustments
These methods will potentially create complex models with sets of complicated parameters. It may be useful to create utility functions for working with these parameters similar to the utility fitting function which maps over models. For example, mapping over all parameters and setting the min value of slopes to 0 or setting all constants to not vary. new_params = idpflex.bayes.apply_to_constants(params, vary=False) This is likely unnecessary and tricky to generalize. A user should be able to iterate over the parameters themselves. Instead, examples, a tutorial, or documentation can be provided somewhere to demonstrate.
Model Creation
Since the proposed model interface requires independent variable 'x' and a keyword argument 'prop' the models in lmfit.models will be unavailable to users. This can be remedied by duplicating these in idpflex in compatible forms. However, this would inflate the codebase and is not likely to be widely used. Instead, a handful of property compatible models (Linear, Constant, Gaussian, etc.) can be provided by lmfit and serve as examples of model creation.
It would also be possible for the model creation methods to accept functions or lambdas directly to prevent the pattern below.
The proposed solution exclusively supports combining structures linearly with a variable 'probability'. There may be other desired methods for combining structures (quadratic?) but these are outside of the scope of the proposed solution. This could be achieved in a hack-y fashion by creating "new" structures in the desired fashion and run those through the multi-structure fitting. Furthermore, the probability parameters are exposed to the user allowing customization.
The text was updated successfully, but these errors were encountered:
@jmborr Here is my outlined proposal for approaching the generic modeling. Do you have any comments or suggestions for the implementation or final interfaces?
However no testing has been completed and the "legacy"
multiproperty fitting has not been removed. Initial use with
`fitting_both.py` script indicates successful fitting.
Replaces references to the MultiPropertyModel class for the new
model creation. Adjustments to the test to reflect the change in
model creation interface. Tests for acceptance of general models
and tests for parameter naming are still to be completed.
Generalized Fitting Models
Project: idpflex
Author: Connor Pigg
Date: 25-June-2019
Summary
The proposed change is to the fitting interface. The interface should allow arbitrary models to be applied to a property for fitting. Currently, only linear models with a potential centering parameter are supported. The change should support applying exponential, Gaussian, etc. models for a particular parameter. By generalizing the current interface, in the future, the number of models can be easily extended or developed by users.
Goals
Provide a utility for fitting idpflex data structures using arbitrary sets of lmfit models. The solution should be simpler than what a user can achieve by interacting directly with lmfit. This can be achieved since idpflex can leverage details of data structures which may be unfamiliar to users (tree, PropertyDict, Parameters, etc.). Example use cases include fitting a tree of properties or fitting to an arbitrary set of property groups.
Non Goals
The solution should not implement a new parameter, model, or fitting interface and/or structure. The solution should not hide the structure or process internally and should make the effort to expose the model, parameters, and fitting to the user. The solution for multiple structures will be a linear combination of the structures. The models applied to properties will expect independence from the other properties. Multiple properties will be concatenated to create a feature vector.
Proposed Design
The design will be layered. First, a generic solution for creating a model of (potentially) multiple properties will be developed. Second, a multi-structure model can be made by linearly combining the multi-property models. Finally, a function for modelling and fitting a tree will be described.
Model Requirements
For consistency, all models being applied to the properties should take a keyword argument
prop
during initialization. For consistency, all models being applied to the properties should have a single independent parameter namedx
. This somewhat breaks the Non Goal of changing the model interface by putting restrictions on models. An alternative approach would be nice. An example model is below.Multiple Property Fitting
The first aspect of the proposed design is to have a function that will accept a PropertyDict and a dictionary/list of lmfit models. This will provide a simple interface for fitting multiple properties of a single structure with arbitrary models. The container of models should have a model for each property. In the case of a single model instead of a container of models, the same model will be applied to each property. The function will apply the appropriate model to each property and create a composite model of the concatenation of these models. The composite model will have a complete set of parameters for all of the sub-models which will be prefixed by the property name.
An example which minimizes the following equations using optional weights.
sans_ws*((sans_slope*sansProp + sans_intercept) - exp_sansProp)
andsaxs_ws*(saxs_c - exp_saxsProp)
.Multiple Structure Fitting
The above will be used as an internal building block for multi-structure fitting. The goal will be to provide a function that will take a list of property structures and a container of models. The container of models will be directly passed to the function described above. The result will be a single model that is composed of the output of the above. The parameters that are in common across structures will be linked to have the same shared value. Additionally, the structures will be linearly combined using probabilities that sum to one.
Note: Each sub-model must have unique parameter names as required by lmfit which creates this large number of redundant parameters.
An example: An example which minimizes the following equations using optional weights.
sans_ws*((sans_slope*(struct0_prob*sansProp0+struct1_prob*sansProp1) + sans_intercept) - exp_sansProp)
saxs_ws*(saxs_c - exp_saxsProp)
.Tree Fitting
Tree fitting will use multi-structure fitting at each depth in the same procedure as currently available in idpflex. The function will take a tree filled with PropertyDict and a list of models, one for each property. It will output a list of multi-structure models described above. A utility fitting function (the same as currently available) can be used to fit every depth of the tree.
Additional Considerations
Parameter Initializations
During what step should parameters be initialized, bounded, etc? Probabilities can be set to equal for all structures. What responsibility does idpflex have for initializing parameters (lmfit leaves it to the user/model creator). Values can (should?) be applied at function definition
def func(x, slope=1, intercept=0, prop=None):
.Should all models be required to implement a
guess
method? This could potentially simplify initialization but increases the difficulty for users to create custom models.Parameter Adjustments
These methods will potentially create complex models with sets of complicated parameters. It may be useful to create utility functions for working with these parameters similar to the utility fitting function which maps over models. For example, mapping over all parameters and setting the min value of slopes to 0 or setting all constants to not vary.
new_params = idpflex.bayes.apply_to_constants(params, vary=False)
This is likely unnecessary and tricky to generalize. A user should be able to iterate over the parameters themselves. Instead, examples, a tutorial, or documentation can be provided somewhere to demonstrate.Model Creation
Since the proposed model interface requires independent variable 'x' and a keyword argument 'prop' the models in lmfit.models will be unavailable to users. This can be remedied by duplicating these in idpflex in compatible forms. However, this would inflate the codebase and is not likely to be widely used. Instead, a handful of property compatible models (Linear, Constant, Gaussian, etc.) can be provided by lmfit and serve as examples of model creation.
It would also be possible for the model creation methods to accept functions or lambdas directly to prevent the pattern below.
Instead, the following would be allowed.
Structure Combination
The proposed solution exclusively supports combining structures linearly with a variable 'probability'. There may be other desired methods for combining structures (quadratic?) but these are outside of the scope of the proposed solution. This could be achieved in a hack-y fashion by creating "new" structures in the desired fashion and run those through the multi-structure fitting. Furthermore, the probability parameters are exposed to the user allowing customization.
The text was updated successfully, but these errors were encountered: