Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for arbitrary linear combination gradient recipes #909

Merged
merged 48 commits into from
Nov 25, 2020
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
6246431
Have positive and negative multiplier and shift values
antalszava Nov 18, 2020
8f4f5d3
No print
antalszava Nov 18, 2020
19d5ad9
Formatting
antalszava Nov 18, 2020
a77d73f
Merge branch 'master' into multiple_shifts
antalszava Nov 18, 2020
7eeb013
3 element terms for grad_recipes; qubit okay; CV draft
antalszava Nov 18, 2020
50db4f9
CV for tape mode
antalszava Nov 18, 2020
e166e5c
Merge branch 'master' into multiple_shifts
antalszava Nov 18, 2020
e34c034
Comments
antalszava Nov 18, 2020
b966cd9
Merge branch 'multiple_shifts' of https://github.com/XanaduAI/pennyla…
antalszava Nov 18, 2020
ef78263
Remove unused
antalszava Nov 18, 2020
0a0dbdf
Formatting
antalszava Nov 18, 2020
95d73f7
Solve casting by specifying dtype at creation
antalszava Nov 19, 2020
538bec5
No casting needed for shifted
antalszava Nov 19, 2020
b1b4334
Update module docstring and Operation.grad_recipe docstring
antalszava Nov 19, 2020
8b30ce5
Development guide update
antalszava Nov 19, 2020
8e201b1
Wording
antalszava Nov 19, 2020
6f08762
Adding tests; adding error raised for unsupported logic for tape seco…
antalszava Nov 19, 2020
9741a6b
No f strings
antalszava Nov 19, 2020
573795a
Merge branch 'master' into multiple_shifts
antalszava Nov 20, 2020
9489dcd
Update pennylane/qnodes/cv.py
antalszava Nov 20, 2020
a92decf
Update pennylane/tape/tapes/cv_param_shift.py
antalszava Nov 20, 2020
9d7096a
Simplify using np.dot in CV param shift tape
antalszava Nov 20, 2020
34debc9
Merge branch 'master' into multiple_shifts
antalszava Nov 20, 2020
7680d25
Merge branch 'master' into multiple_shifts
josh146 Nov 21, 2020
1dc4234
Update tests/qnodes/test_qnode_cv.py
antalszava Nov 23, 2020
db02026
get_parameter_shift in tape mode as per Josh's suggestion; use that
antalszava Nov 23, 2020
32de6d2
Merge branch 'multiple_shifts' of https://github.com/XanaduAI/pennyla…
antalszava Nov 23, 2020
aa7d6ea
Changelog
antalszava Nov 23, 2020
b0916a0
Update tests/tape/tapes/test_cv_param_shift.py
antalszava Nov 23, 2020
5970497
Update .github/CHANGELOG.md
antalszava Nov 23, 2020
19783eb
Merge branch 'master' into multiple_shifts
antalszava Nov 23, 2020
3d155c5
merge in changes from 915
josh146 Nov 24, 2020
9c425b4
Merge branch 'master' into multiple_shifts
josh146 Nov 24, 2020
debf057
Update pennylane/operation.py
antalszava Nov 24, 2020
1bac591
Update grad recipe formulae as per Tom's suggestions
antalszava Nov 24, 2020
ba443c0
Merge branch 'multiple_shifts' of https://github.com/XanaduAI/pennyla…
antalszava Nov 24, 2020
456b563
Update other formula in comment
antalszava Nov 24, 2020
4c00dc1
Merge branch 'master' into multiple_shifts
josh146 Nov 24, 2020
c5ca866
CHANGELOG
antalszava Nov 24, 2020
59454bf
Add rendering img url approach
antalszava Nov 24, 2020
30b3314
Plus
antalszava Nov 24, 2020
e8c849c
Update pennylane/operation.py
antalszava Nov 24, 2020
a3a821d
Applying review suggestions
antalszava Nov 24, 2020
85881e3
Merge branch 'multiple_shifts' of https://github.com/XanaduAI/pennyla…
antalszava Nov 24, 2020
4a03b07
Update doc/development/plugins.rst
antalszava Nov 24, 2020
834801e
Update pennylane/operation.py
antalszava Nov 24, 2020
b0989c2
Merge branch 'master' into multiple_shifts
antalszava Nov 24, 2020
a9a8122
equation formatting fixes
josh146 Nov 25, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 9 additions & 6 deletions doc/development/plugins.rst
Original file line number Diff line number Diff line change
Expand Up @@ -466,11 +466,13 @@ where
* :attr:`~.Operation.grad_method`: the gradient computation method; ``'A'`` for the analytic
method, ``'F'`` for finite differences, and ``None`` if the operation may not be differentiated

* :attr:`~.Operation.grad_recipe`: The gradient recipe for the analytic ``'A'`` method.
This is a list with one tuple per operation parameter. For parameter :math:`k`, the tuple is of
the form :math:`(c_k, s_k)`, resulting in a gradient recipe of
* :attr:`~.Operation.grad_recipe`: The gradient recipe for the analytic ``'A'``
method. This is a tuple with one nested list per operation parameter. For
trbromley marked this conversation as resolved.
Show resolved Hide resolved
parameter :math:`k`, the nested list contains elements of the form
antalszava marked this conversation as resolved.
Show resolved Hide resolved
:math:`[c_i, a_i, s_i]` where :math:`i \in I_{k}` is the index of the term,
antalszava marked this conversation as resolved.
Show resolved Hide resolved
resulting in a gradient recipe of

.. math:: \frac{d}{d\phi_k}f(O(\phi_k)) = c_k\left[f(O(\phi_k+s_k))-f(O(\phi_k-s_k))\right].
.. math:: \frac{\partial}{\partial\phi_k}O = \sum_{i \in I_{k}} c_i * O(a_i * \phi_k+s_i).
antalszava marked this conversation as resolved.
Show resolved Hide resolved

where :math:`f` is an expectation value that depends on :math:`O(\phi_k)`, an example being
antalszava marked this conversation as resolved.
Show resolved Hide resolved

Expand All @@ -479,8 +481,9 @@ where
which is the simple expectation value of the operator :math:`\hat{B}` evolved via the gate
:math:`O(\phi_k)`.
antalszava marked this conversation as resolved.
Show resolved Hide resolved

Note that if ``grad_recipe = None``, the default gradient recipe is
:math:`(c_k, s_k)=(1/2, \pi/2)` for every parameter.
Note that if ``grad_recipe = None``, the default gradient recipe containing
the two terms :math:`[c_0, a_0, s_0]=[1/2, 1, \pi/2]` and :math:`[c_1, a_1,
s_1]=[-1/2, 1, -\pi/2]` is assumed for every parameter.

The user can then import this operation directly from your plugin, and use it when defining a QNode:

Expand Down
87 changes: 64 additions & 23 deletions pennylane/operation.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,8 +65,15 @@
transformation on the quadrature operators.

For gates that *are* supported via the analytic method, the gradient recipe
(with multiplier :math:`c_k`, parameter shift :math:`s_k` for parameter :math:`\phi_k`)
works as follows:
(with multipliers :math:`c_i`, scaling factors :math:`a_i` and parameter shifts
:math:`s_i` for parameter :math:`\phi_k` where :math:`i \in I_{k}`) works as
antalszava marked this conversation as resolved.
Show resolved Hide resolved
follows:

.. math:: \frac{\partial}{\partial\phi_k}O = \sum_{i \in I_{k}} c_i * O(a_i * \phi_k+s_i).
antalszava marked this conversation as resolved.
Show resolved Hide resolved

The following specific case holds for example for qubit operations that are
generated by one of the Pauli matrices and results in an overall positive and
negative shift:
Comment on lines +77 to +79
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice way of putting this! I find it very clear to read and understand.


.. math:: \frac{\partial}{\partial\phi_k}O = c_k\left[O(\phi_k+s_k)-O(\phi_k-s_k)\right].
antalszava marked this conversation as resolved.
Show resolved Hide resolved

Expand Down Expand Up @@ -613,16 +620,19 @@ def grad_method(self):
return None if self.num_params == 0 else "F"

grad_recipe = None
r"""list[tuple[float]] or None: Gradient recipe for the parameter-shift method.
r"""tuple(Union(list[list[float]], None)) or None: Gradient recipe for the
parameter-shift method.

This is a list with one tuple per operation parameter. For parameter
:math:`k`, the tuple is of the form :math:`(c_k, s_k)`, resulting in
a gradient recipe of
This is a tuple with one nested list per operation parameter. For
parameter :math:`k`, the nested list contains elements of the form
antalszava marked this conversation as resolved.
Show resolved Hide resolved
:math:`[c_i, a_i, s_i]` where :math:`i \in I_{k}` is the index of the
term, resulting in a gradient recipe of

.. math:: \frac{\partial}{\partial\phi_k}O = c_k\left[O(\phi_k+s_k)-O(\phi_k-s_k)\right].
.. math:: \frac{\partial}{\partial\phi_k}O = \sum_{i \in I_{k}} c_i * O(a_i * \phi_k+s_i).

If ``None``, the default gradient recipe
:math:`(c_k, s_k)=(1/2, \pi/2)` is assumed for every parameter.
If ``None``, the default gradient recipe containing the two terms
:math:`[c_0, a_0, s_0]=[1/2, 1, \pi/2]` and :math:`[c_1, a_1,
s_1]=[-1/2, 1, -\pi/2]` is assumed for every parameter.
antalszava marked this conversation as resolved.
Show resolved Hide resolved
"""

def get_parameter_shift(self, idx):
Expand All @@ -636,16 +646,30 @@ def get_parameter_shift(self, idx):
"""
# get the gradient recipe for this parameter
recipe = self.grad_recipe[idx]
multiplier, shift = (0.5, np.pi / 2) if recipe is None else recipe

# Default values
multiplier = 0.5
a = 1
shift = np.pi / 2

# We set the default recipe following:
# ∂f(x) = c1*f(a1*x+s1) + c2*f(a2*x+s2)
antalszava marked this conversation as resolved.
Show resolved Hide resolved
# where we express a positive and a negative shift by default
default_param_shift = [[multiplier, a, shift], [-multiplier, a, -shift]]
param_shift = default_param_shift if recipe is None else recipe

# internal multiplier in the Variable
var_mult = self.data[idx].mult

multiplier *= var_mult
if var_mult != 0:
# zero multiplier means the shift is unimportant
shift /= var_mult
return multiplier, shift
for elem in param_shift:

# Update the multiplier
elem[0] *= var_mult
if var_mult != 0:
# Update the shift
# zero multiplier means the shift is unimportant
elem[2] /= var_mult
return param_shift
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from this block here, it's a shame we can't re-use this method for the tape-mode gradient 🙁

Do you think it makes sense to do the following?

def get_parameter_shift(self, idx, shift=np.pi/2):
    recipe = self.grad_recipe[idx]

    # Default values
    multiplier = 0.5 / np.sin(s)
    a = 1

    # We set the default recipe following:
    # ∂f(x) = c1*f(a1*x+s1) + c2*f(a2*x+s2)
    # where we express a positive and a negative shift by default
    default_param_shift = [[multiplier, a, shift], [-multiplier, a, -shift]]
    param_shift = default_param_shift if recipe is None else recipe

    if hasattr(self.data[idx], "mult"):
        # Parameter is a variable, we are in non-tape mode
        ...

    return param_shift

This way:

  • We can call this method inside both the old QNode gradient methods, and the new tape gradient methods.

  • We can pass a different default shift value, for the qubit parameter-shift rule

  • The branch that worries about variables only is called in non-tape mode. Note that the above is safer than using qml.tape_active(), which only checks if the user has activated tape mode, not if the object itself is a tape.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Josh, thanks so much (forgot to reply here 😅). Worked great, adjusted the method!


@property
def generator(self):
Expand Down Expand Up @@ -1588,16 +1612,33 @@ def heisenberg_pd(self, idx):
"""
# get the gradient recipe for this parameter
recipe = self.grad_recipe[idx]
multiplier = 0.5 if recipe is None else recipe[0]
shift = np.pi / 2 if recipe is None else recipe[1]

# Default values
multiplier = 0.5
a = 1
shift = np.pi / 2

# We set the default recipe to as follows:
# ∂f(x) = c1*f(a1*x+s1) + c2*f(a2*x+s2)
default_param_shift = [[multiplier, a, shift], [-multiplier, a, -shift]]
param_shift = default_param_shift if recipe is None else recipe

pd = None # partial derivative of the transformation

p = self.parameters
# evaluate the transform at the shifted parameter values
p[idx] += shift
U2 = self._heisenberg_rep(p) # pylint: disable=assignment-from-none
p[idx] -= 2 * shift
U1 = self._heisenberg_rep(p) # pylint: disable=assignment-from-none
return (U2 - U1) * multiplier # partial derivative of the transformation

original_p_idx = p[idx]
for c, _a, s in param_shift:
# evaluate the transform at the shifted parameter values
p[idx] = _a * original_p_idx + s
U = self._heisenberg_rep(p) # pylint: disable=assignment-from-none

if pd is None:
pd = c * U
else:
pd += c * U

return pd

def heisenberg_tr(self, wires, inverse=False):
r"""Heisenberg picture representation of the linear transformation carried
Expand Down
28 changes: 22 additions & 6 deletions pennylane/ops/cv.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,9 @@ class Squeezing(CVOperation):
grad_method = "A"

shift = 0.1
grad_recipe = [(0.5 / math.sinh(shift), shift), None]
multiplier = 0.5 / math.sinh(shift)
a = 1
grad_recipe = ([[multiplier, a, shift], [-multiplier, a, -shift]], None)

@staticmethod
def _heisenberg_rep(p):
Expand Down Expand Up @@ -180,7 +182,9 @@ class Displacement(CVOperation):
grad_method = "A"

shift = 0.1
grad_recipe = [(0.5 / shift, shift), None]
multiplier = 0.5 / shift
a = 1
grad_recipe = ([[multiplier, a, shift], [-multiplier, a, -shift]], None)

@staticmethod
def _heisenberg_rep(p):
Expand Down Expand Up @@ -278,8 +282,11 @@ class TwoModeSqueezing(CVOperation):
par_domain = "R"

grad_method = "A"

shift = 0.1
grad_recipe = [(0.5 / math.sinh(shift), shift), None]
multiplier = 0.5 / math.sinh(shift)
a = 1
grad_recipe = ([[multiplier, a, shift], [-multiplier, a, -shift]], None)

@staticmethod
def _heisenberg_rep(p):
Expand Down Expand Up @@ -326,8 +333,11 @@ class QuadraticPhase(CVOperation):
par_domain = "R"

grad_method = "A"

shift = 0.1
grad_recipe = [(0.5 / shift, shift)]
multiplier = 0.5 / shift
a = 1
grad_recipe = ([[multiplier, a, shift], [-multiplier, a, -shift]],)

@staticmethod
def _heisenberg_rep(p):
Expand Down Expand Up @@ -371,8 +381,11 @@ class ControlledAddition(CVOperation):
par_domain = "R"

grad_method = "A"

shift = 0.1
grad_recipe = [(0.5 / shift, shift)]
multiplier = 0.5 / shift
a = 1
grad_recipe = ([[multiplier, a, shift], [-multiplier, a, -shift]],)

@staticmethod
def _heisenberg_rep(p):
Expand Down Expand Up @@ -417,8 +430,11 @@ class ControlledPhase(CVOperation):
par_domain = "R"

grad_method = "A"

shift = 0.1
grad_recipe = [(0.5 / shift, shift)]
multiplier = 0.5 / shift
a = 1
grad_recipe = ([[multiplier, a, shift], [-multiplier, a, -shift]],)

@staticmethod
def _heisenberg_rep(p):
Expand Down
37 changes: 27 additions & 10 deletions pennylane/qnodes/cv.py
Original file line number Diff line number Diff line change
Expand Up @@ -181,20 +181,37 @@ def _pd_analytic(self, idx, args, kwargs, **options):
temp_var.idx = n
op.data[p_idx] = temp_var

multiplier, shift = op.get_parameter_shift(p_idx)

# shifted parameter values
shift_p1 = np.r_[args, args[idx] + shift]
shift_p2 = np.r_[args, args[idx] - shift]
param_shift = op.get_parameter_shift(p_idx)

if not force_order2 and op.use_method != "B":
# basic parameter-shift method, for Gaussian CV gates
# succeeded by order-1 observables
# evaluate the circuit at two points with shifted parameter values
y2 = np.asarray(self.evaluate(shift_p1, kwargs))
y1 = np.asarray(self.evaluate(shift_p2, kwargs))
pd += (y2 - y1) * multiplier
# evaluate the circuit at multiple points with the linear
# combination of parameter values (in most cases at two points)
for multiplier, a, shift in param_shift:

# shifted parameter values
shift_p = np.r_[args, a * args[idx] + shift]

term = multiplier * np.asarray(self.evaluate(shift_p, kwargs))
pd += term
else:
if len(param_shift) != 2:
# TODO: check if more than two terms is supported
antalszava marked this conversation as resolved.
Show resolved Hide resolved
raise NotImplementedError(
"Taking the analytic gradient for order-2 operators is "
"unsupported for {op} which contains a parameter with a "
"gradient recipe of more than two terms."
)
Comment on lines +201 to +205
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line (and the equivalent in the CV tape) are showing as not covered by codecov, which is weird, since you explicitly include tests for them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, was also wondering about that 🤔 Locally that was covered.


# Get the shifts and the multipliers
pos_multiplier, a1, pos_shift = param_shift[0]
neg_multiplier, a2, neg_shift = param_shift[1]

# shifted parameter values
shift_p1 = np.r_[args, a1 * args[idx] + pos_shift]
shift_p2 = np.r_[args, a2 * args[idx] + neg_shift]

# order-2 parameter-shift method, for gaussian CV gates
# succeeded by order-2 observables
# evaluate transformed observables at the original parameter point
Expand All @@ -203,7 +220,7 @@ def _pd_analytic(self, idx, args, kwargs, **options):
Z2 = op.heisenberg_tr(self.device.wires)
self._set_variables(shift_p2, kwargs)
Z1 = op.heisenberg_tr(self.device.wires)
Z = (Z2 - Z1) * multiplier # derivative of the operation
Z = pos_multiplier * Z2 + neg_multiplier * Z1 # derivative of the operation

unshifted_args = np.r_[args, args[idx]]
self._set_variables(unshifted_args, kwargs)
Expand Down
18 changes: 10 additions & 8 deletions pennylane/qnodes/qubit.py
Original file line number Diff line number Diff line change
Expand Up @@ -128,16 +128,18 @@ def _pd_analytic(self, idx, args, kwargs, **options):
temp_var.idx = n
op.data[p_idx] = temp_var

multiplier, shift = op.get_parameter_shift(p_idx)
param_shift = op.get_parameter_shift(p_idx)

# shifted parameter values
shift_p1 = np.r_[args, args[idx] + shift]
shift_p2 = np.r_[args, args[idx] - shift]
for multiplier, a, shift in param_shift:

# evaluate the circuit at two points with shifted parameter values
y2 = np.asarray(self.evaluate(shift_p1, kwargs))
y1 = np.asarray(self.evaluate(shift_p2, kwargs))
pd += (y2 - y1) * multiplier
# shifted parameter values
shift_p = np.r_[args, a * args[idx] + shift]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think I need to understand what r_ is doing here 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.r_ is a really nice way of doing efficient row-wise concatenation using slices :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's essentially a slightly more efficient way of doing np.concatenate([args, np.array([a * args[idx] + shift])]), avoiding the need to create intermediate arrays

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice! My follow up thought was why this is a use case for doing that 🤔
E.g.
params = [1, 2, 3]
idx = 1, shift = 0.5
This means that
shift_p = [1, 2, 3, 2.5]? I guess I was expecting [1, 2.5, 3].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the gradient rules in non-tape mode are written a bit strangely 😆 Before shifting a parameter, a new 'temp' parameter is added to the circuit. It is this parameter that is shifted; it is then deleted once the gradient has been computed.

The tape mode implementation is a lot 'cleaner' imho

Copy link
Member

@josh146 josh146 Nov 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the corresponding logic in QubitQNode._pd_analytic:

def _pd_analytic(self, idx, args, kwargs, **options):
    n = self.num_variables
    pd = 0.0
    # find the Operators in which the free parameter appears, use the product rule
    for op, p_idx in self.variable_deps[idx]:

        # We temporarily edit the Operator such that parameter p_idx is replaced by a new one,
        # which we can modify without affecting other Operators depending on the original.
        orig = op.data[p_idx]
        assert orig.idx == idx

        # reference to a new, temporary parameter with index n, otherwise identical with orig
        temp_var = copy.copy(orig)
        temp_var.idx = n
        op.data[p_idx] = temp_var

        param_shift = op.get_parameter_shift(p_idx)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I mainly kept the implementation as was before.


# evaluate the circuit at point with shifted parameter values
y = np.asarray(self.evaluate(shift_p, kwargs))

# add the contribution to the partial derivative
pd += multiplier * y
antalszava marked this conversation as resolved.
Show resolved Hide resolved

# restore the original parameter
op.data[p_idx] = orig
Expand Down
Loading