Add support for arbitrary linear combination gradient recipes (PennyL…

…aneAI#909) * Have positive and negative multiplier and shift values * No print * Formatting * 3 element terms for grad_recipes; qubit okay; CV draft * CV for tape mode * Comments * Remove unused * Formatting * Solve casting by specifying dtype at creation * No casting needed for shifted * Update module docstring and Operation.grad_recipe docstring * Development guide update * Wording * Adding tests; adding error raised for unsupported logic for tape second-order CV case * No f strings * Update pennylane/qnodes/cv.py Co-authored-by: Josh Izaac <[email protected]> * Update pennylane/tape/tapes/cv_param_shift.py * Simplify using np.dot in CV param shift tape * Update tests/qnodes/test_qnode_cv.py Co-authored-by: Josh Izaac <[email protected]> * get_parameter_shift in tape mode as per Josh's suggestion; use that * Changelog * Update tests/tape/tapes/test_cv_param_shift.py Co-authored-by: Josh Izaac <[email protected]> * Update .github/CHANGELOG.md Co-authored-by: Tom Bromley <[email protected]> * merge in changes from 915 * Update pennylane/operation.py Co-authored-by: Tom Bromley <[email protected]> * Update grad recipe formulae as per Tom's suggestions * Update other formula in comment * CHANGELOG * Add rendering img url approach * Plus * Update pennylane/operation.py Co-authored-by: Tom Bromley <[email protected]> * Applying review suggestions * Update doc/development/plugins.rst * Update pennylane/operation.py * equation formatting fixes Co-authored-by: Josh Izaac <[email protected]> Co-authored-by: Tom Bromley <[email protected]>
alejomonbar · Dec 1, 2020 · 63fe205 · 63fe205
1 parent 9a85e96
commit 63fe205
Show file tree

Hide file tree

Showing 12 changed files with 263 additions and 96 deletions.
diff --git a/.github/CHANGELOG.md b/.github/CHANGELOG.md
@@ -305,6 +305,20 @@
 
 <h3>Breaking changes</h3>
 
+* Updated how parameter-shift gradient recipes are defined for operations, allowing for
+  gradient recipes that are specified as an arbitrary number of terms.
+  [(#909)](https://github.com/PennyLaneAI/pennylane/pull/909)
+
+  Previously, `Operation.grad_recipe` was restricted to two-term parameter-shift formulas.
+  With this change, the gradient recipe now contains elements of the form
+  :math:`[c_i, a_i, s_i]`, resulting in a gradient recipe of
+  :math:`\frac{\partial}{\partial\phi_k}f(\phi_k) = \sum_{i} c_i f(a_i \phi_k + s_i )`.
+
+  As this is a breaking change, all custom operations with defined gradient recipes must be
+  updated to continue working with PennyLane 0.13. Note though that if `grad_recipe = None`, the
+  default gradient recipe remains unchanged, and corresponds to the two terms :math:`[c_0, a_0, s_0]=[1/2, 1, \pi/2]`
+  and :math:`[c_1, a_1, s_1]=[-1/2, 1, -\pi/2]` for every parameter.
+
 - The `VQECost` class has been renamed to `ExpvalCost` to reflect its general applicability
   beyond VQE. Use of `VQECost` is still possible but will result in a deprecation warning.
   [(#913)](https://github.com/PennyLaneAI/pennylane/pull/913)

diff --git a/doc/development/plugins.rst b/doc/development/plugins.rst
@@ -466,21 +466,19 @@ where
 * :attr:`~.Operation.grad_method`: the gradient computation method; ``'A'`` for the analytic
   method, ``'F'`` for finite differences, and ``None`` if the operation may not be differentiated
 
-* :attr:`~.Operation.grad_recipe`: The gradient recipe for the analytic ``'A'`` method.
-  This is a list with one tuple per operation parameter. For parameter :math:`k`, the tuple is of
-  the form :math:`(c_k, s_k)`, resulting in a gradient recipe of
+* :attr:`~.Operation.grad_recipe`: The gradient recipe for the analytic ``'A'``
+  method. This is a tuple with one nested list per operation parameter. For
+  parameter :math:`\phi_k`, the nested list contains elements of the form
+  :math:`[c_i, a_i, s_i]`, resulting in a gradient recipe of
 
-  .. math:: \frac{d}{d\phi_k}f(O(\phi_k)) = c_k\left[f(O(\phi_k+s_k))-f(O(\phi_k-s_k))\right].
+  .. math:: \frac{\partial}{\partial\phi_k}f(\phi_k) = \sum_{i} c_i f(a_i \phi_k+s_i),
 
-  where :math:`f` is an expectation value that depends on :math:`O(\phi_k)`, an example being
+  where :math:`f` is the expectation value of an observable on a circuit that has been evolved by
+  the operation being considered with parameter :math:`\phi_k`.
 
-  .. math:: f(O(\phi_k)) = \braket{0 | O^{\dagger}(\phi_k) \hat{B} O(\phi_k) | 0}
-
-  which is the simple expectation value of the operator :math:`\hat{B}` evolved via the gate
-  :math:`O(\phi_k)`.
-
-  Note that if ``grad_recipe = None``, the default gradient recipe is
-  :math:`(c_k, s_k)=(1/2, \pi/2)` for every parameter.
+  Note that if ``grad_recipe = None``, the default gradient recipe containing
+  the two terms :math:`[c_0, a_0, s_0]=[1/2, 1, \pi/2]` and :math:`[c_1, a_1,
+  s_1]=[-1/2, 1, -\pi/2]` is assumed for every parameter.
 
 The user can then import this operation directly from your plugin, and use it when defining a QNode:
 

diff --git a/pennylane/operation.py b/pennylane/operation.py
@@ -65,10 +65,25 @@
     transformation on the quadrature operators.
 
 For gates that *are* supported via the analytic method, the gradient recipe
-(with multiplier :math:`c_k`, parameter shift :math:`s_k` for parameter :math:`\phi_k`)
 works as follows:
 
-.. math:: \frac{\partial}{\partial\phi_k}O = c_k\left[O(\phi_k+s_k)-O(\phi_k-s_k)\right].
+.. math:: \frac{\partial}{\partial\phi_k}f = \sum_{i} c_i f(a_i \phi_k+s_i).
+
+where :math:`f` is the expectation value of an observable on a circuit that has
+been evolved by the operation being considered with parameter :math:`\phi_k`,
+there are multiple terms indexed with :math:`i` for each parameter :math:`\phi`
+and the :math:`[c_i, a_i, s_i]` are coefficients specific to the gate.
+
+The following specific case holds for example for qubit operations that are
+generated by one of the Pauli matrices and results in an overall positive and
+negative shift:
+
+.. math::
+
+    \frac{\partial}{\partial\phi_k}f = \frac{1}{2}\left[f \left( \phi_k+\frac{\pi}{2} \right) - f
+    \left( \phi_k-\frac{\pi}{2} \right)\right],
+
+i.e., so that :math:`[c_0, a_0, s_0]=[1/2, 1, \pi/2]` and :math:`[c_1, a_1, s_1]=[-1/2, 1, -\pi/2]`.
 
 CV Operation base classes
 ~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -613,19 +628,22 @@ def grad_method(self):
         return None if self.num_params == 0 else "F"
 
     grad_recipe = None
-    r"""list[tuple[float]] or None: Gradient recipe for the parameter-shift method.
+    r"""tuple(Union(list[list[float]], None)) or None: Gradient recipe for the
+        parameter-shift method.
 
-        This is a list with one tuple per operation parameter. For parameter
-        :math:`k`, the tuple is of the form :math:`(c_k, s_k)`, resulting in
-        a gradient recipe of
+        This is a tuple with one nested list per operation parameter. For
+        parameter :math:`\phi_k`, the nested list contains elements of the form
+        :math:`[c_i, a_i, s_i]` where :math:`i` is the index of the
+        term, resulting in a gradient recipe of
 
-        .. math:: \frac{\partial}{\partial\phi_k}O = c_k\left[O(\phi_k+s_k)-O(\phi_k-s_k)\right].
+        .. math:: \frac{\partial}{\partial\phi_k}f = \sum_{i} c_i f(a_i \phi_k + s_i).
 
-        If ``None``, the default gradient recipe
-        :math:`(c_k, s_k)=(1/2, \pi/2)` is assumed for every parameter.
+        If ``None``, the default gradient recipe containing the two terms
+        :math:`[c_0, a_0, s_0]=[1/2, 1, \pi/2]` and :math:`[c_1, a_1,
+        s_1]=[-1/2, 1, -\pi/2]` is assumed for every parameter.
     """
 
-    def get_parameter_shift(self, idx):
+    def get_parameter_shift(self, idx, shift=np.pi / 2):
         """Multiplier and shift for the given parameter, based on its gradient recipe.
 
         Args:
@@ -636,16 +654,32 @@ def get_parameter_shift(self, idx):
         """
         # get the gradient recipe for this parameter
         recipe = self.grad_recipe[idx]
-        multiplier, shift = (0.5, np.pi / 2) if recipe is None else recipe
 
-        # internal multiplier in the Variable
-        var_mult = self.data[idx].mult
+        # Default values
+        multiplier = 0.5 / np.sin(shift)
+        a = 1
+
+        # We set the default recipe following:
+        # ∂f(x) = c*f(x+s) - c*f(x-s)
+        # where we express a positive and a negative shift by default
+        default_param_shift = [[multiplier, a, shift], [-multiplier, a, -shift]]
+        param_shift = default_param_shift if recipe is None else recipe
+
+        if hasattr(self.data[idx], "mult"):
+            # Parameter is a variable, we are in non-tape mode
+            # Need to use the internal multiplier in the Variable to update the
+            # multiplier and the shift
+            var_mult = self.data[idx].mult
+
+            for elem in param_shift:
 
-        multiplier *= var_mult
-        if var_mult != 0:
-            # zero multiplier means the shift is unimportant
-            shift /= var_mult
-        return multiplier, shift
+                # Update the multiplier
+                elem[0] *= var_mult
+                if var_mult != 0:
+                    # Update the shift
+                    # zero multiplier means the shift is unimportant
+                    elem[2] /= var_mult
+        return param_shift
 
     @property
     def generator(self):
@@ -1588,16 +1622,33 @@ def heisenberg_pd(self, idx):
         """
         # get the gradient recipe for this parameter
         recipe = self.grad_recipe[idx]
-        multiplier = 0.5 if recipe is None else recipe[0]
-        shift = np.pi / 2 if recipe is None else recipe[1]
+
+        # Default values
+        multiplier = 0.5
+        a = 1
+        shift = np.pi / 2
+
+        # We set the default recipe to as follows:
+        # ∂f(x) = c*f(x+s) - c*f(x-s)
+        default_param_shift = [[multiplier, a, shift], [-multiplier, a, -shift]]
+        param_shift = default_param_shift if recipe is None else recipe
+
+        pd = None  # partial derivative of the transformation
 
         p = self.parameters
-        # evaluate the transform at the shifted parameter values
-        p[idx] += shift
-        U2 = self._heisenberg_rep(p)  # pylint: disable=assignment-from-none
-        p[idx] -= 2 * shift
-        U1 = self._heisenberg_rep(p)  # pylint: disable=assignment-from-none
-        return (U2 - U1) * multiplier  # partial derivative of the transformation
+
+        original_p_idx = p[idx]
+        for c, _a, s in param_shift:
+            # evaluate the transform at the shifted parameter values
+            p[idx] = _a * original_p_idx + s
+            U = self._heisenberg_rep(p)  # pylint: disable=assignment-from-none
+
+            if pd is None:
+                pd = c * U
+            else:
+                pd += c * U
+
+        return pd
 
     def heisenberg_tr(self, wires, inverse=False):
         r"""Heisenberg picture representation of the linear transformation carried

diff --git a/pennylane/ops/cv.py b/pennylane/ops/cv.py
@@ -138,7 +138,9 @@ class Squeezing(CVOperation):
     grad_method = "A"
 
     shift = 0.1
-    grad_recipe = [(0.5 / math.sinh(shift), shift), None]
+    multiplier = 0.5 / math.sinh(shift)
+    a = 1
+    grad_recipe = ([[multiplier, a, shift], [-multiplier, a, -shift]], None)
 
     @staticmethod
     def _heisenberg_rep(p):
@@ -180,7 +182,9 @@ class Displacement(CVOperation):
     grad_method = "A"
 
     shift = 0.1
-    grad_recipe = [(0.5 / shift, shift), None]
+    multiplier = 0.5 / shift
+    a = 1
+    grad_recipe = ([[multiplier, a, shift], [-multiplier, a, -shift]], None)
 
     @staticmethod
     def _heisenberg_rep(p):
@@ -278,8 +282,11 @@ class TwoModeSqueezing(CVOperation):
     par_domain = "R"
 
     grad_method = "A"
+
     shift = 0.1
-    grad_recipe = [(0.5 / math.sinh(shift), shift), None]
+    multiplier = 0.5 / math.sinh(shift)
+    a = 1
+    grad_recipe = ([[multiplier, a, shift], [-multiplier, a, -shift]], None)
 
     @staticmethod
     def _heisenberg_rep(p):
@@ -326,8 +333,11 @@ class QuadraticPhase(CVOperation):
     par_domain = "R"
 
     grad_method = "A"
+
     shift = 0.1
-    grad_recipe = [(0.5 / shift, shift)]
+    multiplier = 0.5 / shift
+    a = 1
+    grad_recipe = ([[multiplier, a, shift], [-multiplier, a, -shift]],)
 
     @staticmethod
     def _heisenberg_rep(p):
@@ -371,8 +381,11 @@ class ControlledAddition(CVOperation):
     par_domain = "R"
 
     grad_method = "A"
+
     shift = 0.1
-    grad_recipe = [(0.5 / shift, shift)]
+    multiplier = 0.5 / shift
+    a = 1
+    grad_recipe = ([[multiplier, a, shift], [-multiplier, a, -shift]],)
 
     @staticmethod
     def _heisenberg_rep(p):
@@ -417,8 +430,11 @@ class ControlledPhase(CVOperation):
     par_domain = "R"
 
     grad_method = "A"
+
     shift = 0.1
-    grad_recipe = [(0.5 / shift, shift)]
+    multiplier = 0.5 / shift
+    a = 1
+    grad_recipe = ([[multiplier, a, shift], [-multiplier, a, -shift]],)
 
     @staticmethod
     def _heisenberg_rep(p):

diff --git a/pennylane/qnodes/cv.py b/pennylane/qnodes/cv.py
@@ -181,20 +181,37 @@ def _pd_analytic(self, idx, args, kwargs, **options):
             temp_var.idx = n
             op.data[p_idx] = temp_var
 
-            multiplier, shift = op.get_parameter_shift(p_idx)
-
-            # shifted parameter values
-            shift_p1 = np.r_[args, args[idx] + shift]
-            shift_p2 = np.r_[args, args[idx] - shift]
+            param_shift = op.get_parameter_shift(p_idx)
 
             if not force_order2 and op.use_method != "B":
                 # basic parameter-shift method, for Gaussian CV gates
                 # succeeded by order-1 observables
-                # evaluate the circuit at two points with shifted parameter values
-                y2 = np.asarray(self.evaluate(shift_p1, kwargs))
-                y1 = np.asarray(self.evaluate(shift_p2, kwargs))
-                pd += (y2 - y1) * multiplier
+                # evaluate the circuit at multiple points with the linear
+                # combination of parameter values (in most cases at two points)
+                for multiplier, a, shift in param_shift:
+
+                    # shifted parameter values
+                    shift_p = np.r_[args, a * args[idx] + shift]
+
+                    term = multiplier * np.asarray(self.evaluate(shift_p, kwargs))
+                    pd += term
             else:
+                if len(param_shift) != 2:
+                    # The 2nd order CV parameter-shift rule only accepts two-term shifts
+                    raise NotImplementedError(
+                        "Taking the analytic gradient for order-2 operators is "
+                        "unsupported for {op} which contains a parameter with a "
+                        "gradient recipe of more than two terms."
+                    )
+
+                # Get the shifts and the multipliers
+                pos_multiplier, a1, pos_shift = param_shift[0]
+                neg_multiplier, a2, neg_shift = param_shift[1]
+
+                # shifted parameter values
+                shift_p1 = np.r_[args, a1 * args[idx] + pos_shift]
+                shift_p2 = np.r_[args, a2 * args[idx] + neg_shift]
+
                 # order-2 parameter-shift method, for gaussian CV gates
                 # succeeded by order-2 observables
                 # evaluate transformed observables at the original parameter point
@@ -203,7 +220,7 @@ def _pd_analytic(self, idx, args, kwargs, **options):
                 Z2 = op.heisenberg_tr(self.device.wires)
                 self._set_variables(shift_p2, kwargs)
                 Z1 = op.heisenberg_tr(self.device.wires)
-                Z = (Z2 - Z1) * multiplier  # derivative of the operation
+                Z = pos_multiplier * Z2 + neg_multiplier * Z1  # derivative of the operation
 
                 unshifted_args = np.r_[args, args[idx]]
                 self._set_variables(unshifted_args, kwargs)

diff --git a/pennylane/qnodes/qubit.py b/pennylane/qnodes/qubit.py
@@ -128,16 +128,18 @@ def _pd_analytic(self, idx, args, kwargs, **options):
             temp_var.idx = n
             op.data[p_idx] = temp_var
 
-            multiplier, shift = op.get_parameter_shift(p_idx)
+            param_shift = op.get_parameter_shift(p_idx)
 
-            # shifted parameter values
-            shift_p1 = np.r_[args, args[idx] + shift]
-            shift_p2 = np.r_[args, args[idx] - shift]
+            for multiplier, a, shift in param_shift:
 
-            # evaluate the circuit at two points with shifted parameter values
-            y2 = np.asarray(self.evaluate(shift_p1, kwargs))
-            y1 = np.asarray(self.evaluate(shift_p2, kwargs))
-            pd += (y2 - y1) * multiplier
+                # shifted parameter values
+                shift_p = np.r_[args, a * args[idx] + shift]
+
+                # evaluate the circuit at point with shifted parameter values
+                y = np.asarray(self.evaluate(shift_p, kwargs))
+
+                # add the contribution to the partial derivative
+                pd += multiplier * y
 
             # restore the original parameter
             op.data[p_idx] = orig

diff --git a/pennylane/tape/qnode.py b/pennylane/tape/qnode.py
@@ -112,6 +112,7 @@ class QNode:
         h=1e-7 (float): step size for the finite difference method
         order=1 (int): The order of the finite difference method to use. ``1`` corresponds
             to forward finite differences, ``2`` to centered finite differences.
+        shift=pi/2 (float): the size of the shift for two-term parameter-shift gradient computations
 
     **Example**