-
Notifications
You must be signed in to change notification settings - Fork 629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for arbitrary linear combination gradient recipes #909
Changes from 38 commits
6246431
8f4f5d3
19d5ad9
a77d73f
7eeb013
50db4f9
e166e5c
e34c034
b966cd9
ef78263
0a0dbdf
95d73f7
538bec5
b1b4334
8b30ce5
8e201b1
6f08762
9741a6b
573795a
9489dcd
a92decf
9d7096a
34debc9
7680d25
1dc4234
db02026
32de6d2
aa7d6ea
b0916a0
5970497
19783eb
3d155c5
9c425b4
debf057
1bac591
ba443c0
456b563
4c00dc1
c5ca866
59454bf
30b3314
e8c849c
a3a821d
85881e3
4a03b07
834801e
b0989c2
a9a8122
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -65,10 +65,22 @@ | |
transformation on the quadrature operators. | ||
|
||
For gates that *are* supported via the analytic method, the gradient recipe | ||
(with multiplier :math:`c_k`, parameter shift :math:`s_k` for parameter :math:`\phi_k`) | ||
works as follows: | ||
|
||
.. math:: \frac{\partial}{\partial\phi_k}O = c_k\left[O(\phi_k+s_k)-O(\phi_k-s_k)\right]. | ||
.. math:: \frac{\partial}{\partial\phi_k}f = \sum_{i \in I_{k}} c_i * f(a_i * \phi_k+s_i). | ||
antalszava marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
where :math:`f` is the expectation value of an observable on a circuit that has | ||
been evolved by the operation being considered with parameter :math:`\phi_k`, | ||
there are multiple terms indexed with :math:`i` for each parameter :math:`\phi` | ||
and the :math:`[c_i, a_i, s_i]` are coefficients specific to the gate. | ||
|
||
The following specific case holds for example for qubit operations that are | ||
generated by one of the Pauli matrices and results in an overall positive and | ||
negative shift: | ||
Comment on lines
+77
to
+79
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Very nice way of putting this! I find it very clear to read and understand. |
||
|
||
.. math:: \frac{\partial}{\partial\phi_k}f = \frac{1}{2}\left[f(\phi_k+\frac{\pi}{2})-f(\phi_k-\frac{\pi}{2})\right], | ||
antalszava marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
i.e., so that :math:`[c_0, a_0, s_0]=[1/2, 1, \pi/2]` and :math:`[c_1, a_1, s_1]=[-1/2, 1, -\pi/2]`. | ||
|
||
CV Operation base classes | ||
~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
@@ -613,19 +625,22 @@ def grad_method(self): | |
return None if self.num_params == 0 else "F" | ||
|
||
grad_recipe = None | ||
r"""list[tuple[float]] or None: Gradient recipe for the parameter-shift method. | ||
r"""tuple(Union(list[list[float]], None)) or None: Gradient recipe for the | ||
parameter-shift method. | ||
|
||
This is a list with one tuple per operation parameter. For parameter | ||
:math:`k`, the tuple is of the form :math:`(c_k, s_k)`, resulting in | ||
a gradient recipe of | ||
This is a tuple with one nested list per operation parameter. For | ||
parameter :math:`k`, the nested list contains elements of the form | ||
antalszava marked this conversation as resolved.
Show resolved
Hide resolved
|
||
:math:`[c_i, a_i, s_i]` where :math:`i \in I_{k}` is the index of the | ||
term, resulting in a gradient recipe of | ||
|
||
.. math:: \frac{\partial}{\partial\phi_k}O = c_k\left[O(\phi_k+s_k)-O(\phi_k-s_k)\right]. | ||
.. math:: \frac{\partial}{\partial\phi_k}O = \sum_{i \in I_{k}} c_i * O(a_i * \phi_k+s_i). | ||
|
||
If ``None``, the default gradient recipe | ||
:math:`(c_k, s_k)=(1/2, \pi/2)` is assumed for every parameter. | ||
If ``None``, the default gradient recipe containing the two terms | ||
:math:`[c_0, a_0, s_0]=[1/2, 1, \pi/2]` and :math:`[c_1, a_1, | ||
s_1]=[-1/2, 1, -\pi/2]` is assumed for every parameter. | ||
antalszava marked this conversation as resolved.
Show resolved
Hide resolved
|
||
""" | ||
|
||
def get_parameter_shift(self, idx): | ||
def get_parameter_shift(self, idx, shift=np.pi / 2): | ||
"""Multiplier and shift for the given parameter, based on its gradient recipe. | ||
|
||
Args: | ||
|
@@ -636,16 +651,32 @@ def get_parameter_shift(self, idx): | |
""" | ||
# get the gradient recipe for this parameter | ||
recipe = self.grad_recipe[idx] | ||
multiplier, shift = (0.5, np.pi / 2) if recipe is None else recipe | ||
|
||
# internal multiplier in the Variable | ||
var_mult = self.data[idx].mult | ||
# Default values | ||
multiplier = 0.5 / np.sin(shift) | ||
a = 1 | ||
|
||
# We set the default recipe following: | ||
# ∂f(x) = c*f(x+s) - c*f(x-s) | ||
# where we express a positive and a negative shift by default | ||
default_param_shift = [[multiplier, a, shift], [-multiplier, a, -shift]] | ||
param_shift = default_param_shift if recipe is None else recipe | ||
|
||
if hasattr(self.data[idx], "mult"): | ||
# Parameter is a variable, we are in non-tape mode | ||
# Need to use the internal multiplier in the Variable to update the | ||
# multiplier and the shift | ||
var_mult = self.data[idx].mult | ||
|
||
for elem in param_shift: | ||
|
||
multiplier *= var_mult | ||
if var_mult != 0: | ||
# zero multiplier means the shift is unimportant | ||
shift /= var_mult | ||
return multiplier, shift | ||
# Update the multiplier | ||
elem[0] *= var_mult | ||
if var_mult != 0: | ||
# Update the shift | ||
# zero multiplier means the shift is unimportant | ||
elem[2] /= var_mult | ||
trbromley marked this conversation as resolved.
Show resolved
Hide resolved
|
||
return param_shift | ||
|
||
@property | ||
def generator(self): | ||
|
@@ -1588,16 +1619,33 @@ def heisenberg_pd(self, idx): | |
""" | ||
# get the gradient recipe for this parameter | ||
recipe = self.grad_recipe[idx] | ||
multiplier = 0.5 if recipe is None else recipe[0] | ||
shift = np.pi / 2 if recipe is None else recipe[1] | ||
|
||
# Default values | ||
multiplier = 0.5 | ||
a = 1 | ||
shift = np.pi / 2 | ||
|
||
# We set the default recipe to as follows: | ||
# ∂f(x) = c*f(x+s) - c*f(x-s) | ||
default_param_shift = [[multiplier, a, shift], [-multiplier, a, -shift]] | ||
param_shift = default_param_shift if recipe is None else recipe | ||
|
||
pd = None # partial derivative of the transformation | ||
|
||
p = self.parameters | ||
# evaluate the transform at the shifted parameter values | ||
p[idx] += shift | ||
U2 = self._heisenberg_rep(p) # pylint: disable=assignment-from-none | ||
p[idx] -= 2 * shift | ||
U1 = self._heisenberg_rep(p) # pylint: disable=assignment-from-none | ||
return (U2 - U1) * multiplier # partial derivative of the transformation | ||
|
||
original_p_idx = p[idx] | ||
for c, _a, s in param_shift: | ||
# evaluate the transform at the shifted parameter values | ||
p[idx] = _a * original_p_idx + s | ||
U = self._heisenberg_rep(p) # pylint: disable=assignment-from-none | ||
|
||
if pd is None: | ||
pd = c * U | ||
else: | ||
pd += c * U | ||
|
||
return pd | ||
|
||
def heisenberg_tr(self, wires, inverse=False): | ||
r"""Heisenberg picture representation of the linear transformation carried | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -181,20 +181,37 @@ def _pd_analytic(self, idx, args, kwargs, **options): | |
temp_var.idx = n | ||
op.data[p_idx] = temp_var | ||
|
||
multiplier, shift = op.get_parameter_shift(p_idx) | ||
|
||
# shifted parameter values | ||
shift_p1 = np.r_[args, args[idx] + shift] | ||
shift_p2 = np.r_[args, args[idx] - shift] | ||
param_shift = op.get_parameter_shift(p_idx) | ||
|
||
if not force_order2 and op.use_method != "B": | ||
# basic parameter-shift method, for Gaussian CV gates | ||
# succeeded by order-1 observables | ||
# evaluate the circuit at two points with shifted parameter values | ||
y2 = np.asarray(self.evaluate(shift_p1, kwargs)) | ||
y1 = np.asarray(self.evaluate(shift_p2, kwargs)) | ||
pd += (y2 - y1) * multiplier | ||
# evaluate the circuit at multiple points with the linear | ||
# combination of parameter values (in most cases at two points) | ||
for multiplier, a, shift in param_shift: | ||
|
||
# shifted parameter values | ||
shift_p = np.r_[args, a * args[idx] + shift] | ||
|
||
term = multiplier * np.asarray(self.evaluate(shift_p, kwargs)) | ||
pd += term | ||
else: | ||
if len(param_shift) != 2: | ||
# The 2nd order CV parameter-shift rule only accepts two-term shifts | ||
raise NotImplementedError( | ||
"Taking the analytic gradient for order-2 operators is " | ||
"unsupported for {op} which contains a parameter with a " | ||
"gradient recipe of more than two terms." | ||
) | ||
Comment on lines
+201
to
+205
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This line (and the equivalent in the CV tape) are showing as not covered by codecov, which is weird, since you explicitly include tests for them. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Indeed, was also wondering about that 🤔 Locally that was covered. |
||
|
||
# Get the shifts and the multipliers | ||
pos_multiplier, a1, pos_shift = param_shift[0] | ||
neg_multiplier, a2, neg_shift = param_shift[1] | ||
|
||
# shifted parameter values | ||
shift_p1 = np.r_[args, a1 * args[idx] + pos_shift] | ||
shift_p2 = np.r_[args, a2 * args[idx] + neg_shift] | ||
|
||
# order-2 parameter-shift method, for gaussian CV gates | ||
# succeeded by order-2 observables | ||
# evaluate transformed observables at the original parameter point | ||
|
@@ -203,7 +220,7 @@ def _pd_analytic(self, idx, args, kwargs, **options): | |
Z2 = op.heisenberg_tr(self.device.wires) | ||
self._set_variables(shift_p2, kwargs) | ||
Z1 = op.heisenberg_tr(self.device.wires) | ||
Z = (Z2 - Z1) * multiplier # derivative of the operation | ||
Z = pos_multiplier * Z2 + neg_multiplier * Z1 # derivative of the operation | ||
|
||
unshifted_args = np.r_[args, args[idx]] | ||
self._set_variables(unshifted_args, kwargs) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -128,16 +128,18 @@ def _pd_analytic(self, idx, args, kwargs, **options): | |
temp_var.idx = n | ||
op.data[p_idx] = temp_var | ||
|
||
multiplier, shift = op.get_parameter_shift(p_idx) | ||
param_shift = op.get_parameter_shift(p_idx) | ||
|
||
# shifted parameter values | ||
shift_p1 = np.r_[args, args[idx] + shift] | ||
shift_p2 = np.r_[args, args[idx] - shift] | ||
for multiplier, a, shift in param_shift: | ||
|
||
# evaluate the circuit at two points with shifted parameter values | ||
y2 = np.asarray(self.evaluate(shift_p1, kwargs)) | ||
y1 = np.asarray(self.evaluate(shift_p2, kwargs)) | ||
pd += (y2 - y1) * multiplier | ||
# shifted parameter values | ||
shift_p = np.r_[args, a * args[idx] + shift] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Think I need to understand what There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's essentially a slightly more efficient way of doing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh nice! My follow up thought was why this is a use case for doing that 🤔 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, the gradient rules in non-tape mode are written a bit strangely 😆 Before shifting a parameter, a new 'temp' parameter is added to the circuit. It is this parameter that is shifted; it is then deleted once the gradient has been computed. The tape mode implementation is a lot 'cleaner' imho There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See the corresponding logic in def _pd_analytic(self, idx, args, kwargs, **options):
n = self.num_variables
pd = 0.0
# find the Operators in which the free parameter appears, use the product rule
for op, p_idx in self.variable_deps[idx]:
# We temporarily edit the Operator such that parameter p_idx is replaced by a new one,
# which we can modify without affecting other Operators depending on the original.
orig = op.data[p_idx]
assert orig.idx == idx
# reference to a new, temporary parameter with index n, otherwise identical with orig
temp_var = copy.copy(orig)
temp_var.idx = n
op.data[p_idx] = temp_var
param_shift = op.get_parameter_shift(p_idx) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here I mainly kept the implementation as was before. |
||
|
||
# evaluate the circuit at point with shifted parameter values | ||
y = np.asarray(self.evaluate(shift_p, kwargs)) | ||
|
||
# add the contribution to the partial derivative | ||
pd += multiplier * y | ||
antalszava marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# restore the original parameter | ||
op.data[p_idx] = orig | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought it was best to add some more details! Might have to double check that the math renders correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Checked and updated.
Update: enclosing with
$$
didn't seem to work, added it with<img src="https://render.githubusercontent.com/render/math?math=">
.