Skip to content
This repository was archived by the owner on Dec 2, 2023. It is now read-only.

tangent.grad_dot fails with (3,) (3,) arguments #59

Open
stefdoerr opened this issue Feb 23, 2018 · 6 comments
Open

tangent.grad_dot fails with (3,) (3,) arguments #59

stefdoerr opened this issue Feb 23, 2018 · 6 comments

Comments

@stefdoerr
Copy link

stefdoerr commented Feb 23, 2018

Once I produce my gradient function with tangent.grad, calling the function fails with the following error

/shared/sdoerr/Software/miniconda3/lib/python3.6/site-packages/tangent/utils.py in grad_dot(dy, x1, x2)
    773       numpy.sum(x2, axis=tuple(numpy.arange(numpy.ndim(x2) - 2)))))
    774   dy_x2 = numpy.sum(dy, axis=tuple(-numpy.arange(numpy.ndim(x2) - 2) - 2))
--> 775   return numpy.reshape(numpy.dot(dy_x2, x2_t), numpy.shape(x1))
    776 
    777 

ValueError: shapes (1,1) and (3,1) not aligned: 1 (dim 1) != 3 (dim 0)

ipdb> x1
array([ 0.63199997, -0.01399994,  1.66399956])
ipdb> x2
array([1.32600021, 1.09599972, 0.45800018])
ipdb> dy
array([[0.00041678]])

Had to do: np.dot(x[jj], np.reshape(x[kk], (-1, 1))) to fix it. Not a huge issue but it could confuse users.

@mdanatg
Copy link

mdanatg commented Feb 24, 2018

Do you have the original code and arguments you used for gradients?

A possible cause might be that tangent.grad assumes scalar output - can you try to see if using tangent.autodiff works instead (that will require you to supply an initial gradient value, if it's not scalar).

@stefdoerr
Copy link
Author

import tangent
import numpy as np

def test(x, y):
    return np.dot(x, y)

xxx = tangent.grad(test)
xxx(np.random.rand(1, 3), np.random.rand(1, 3))

@mdanatg
Copy link

mdanatg commented Feb 26, 2018

I see - the error originates from np.dot, the matrix sizes don't align properly. For correct matrix multiplication, the call should be xxx(np.random.rand(1, 3), np.random.rand(1, 3)), as you mentioned.

For example, the following code fails with the same error:

import numpy as np

def test(x, y):
  return np.dot(x, y)

test(np.random.rand(1, 3), np.random.rand(1, 3))

That said, I think it would be useful to wrap errors and more clearly indicate when an error originates in the forward code.

@stefdoerr
Copy link
Author

Oh right, sorry. Well you can still make it work with numpy but fail with tangent even though it becomes a different error now

import tangent
import numpy as np

def test(x, y):
    return np.dot(x[0, :], y[0, :])

test(np.random.rand(1, 3), np.random.rand(1, 3)) #runs
xxx = tangent.grad(test) # doesnt
xxx(np.random.rand(1, 3), np.random.rand(1, 3)) 

@mdanatg
Copy link

mdanatg commented Feb 26, 2018

Yes, it seems that we have a bug in the handling of slice operators. This might be insufficient for your immediate needs, but it should run:

def test(x, y):
    return np.dot(x, y)[0, 0]  # Use matrix multiply instead of inner product

test(np.random.rand(1, 3), np.random.rand(3, 1))
xxx = tangent.grad(test)
xxx(np.random.rand(1, 3), np.random.rand(3, 1))

Alternatively, you could use tangent.autodiff which does not assume the result is a scalar. Implementation-wise they are very similar, but tangent.autodiff is more technically correct in that case:

def test(x, y):
    return np.dot(x, y)  # Result is a 1 x 1 matrix

test(np.random.rand(1, 3), np.random.rand(3, 1))
xxx = tangent.autodiff(test)
# Add a third parameter for the gradient seed, which matches the shape of f's result.
xxx(np.random.rand(1, 3), np.random.rand(3, 1), np.ones((1, 1)))

@mdanatg
Copy link

mdanatg commented Feb 26, 2018

Opened #62 for the slice issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants