Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] Camera intrinsics matrix from cameras.sfm #2326

Open
AndreaMaestri18 opened this issue Feb 21, 2024 · 10 comments
Open

[question] Camera intrinsics matrix from cameras.sfm #2326

AndreaMaestri18 opened this issue Feb 21, 2024 · 10 comments

Comments

@AndreaMaestri18
Copy link

AndreaMaestri18 commented Feb 21, 2024

Describe the problem
I am trying to obtain the projection coordinates on an image of a 3D point of the mesh generated with Meshroom. However when constructing the camera intrinsic matrix as described in [(https://en.wikipedia.org/wiki/Camera_resectioning)] I get wrong results.
This is what i get from cameras.sfm after structure from motion.

        {  "intrinsicId": "1128448763",
            "width": "4000",
            "height": "3000",
            "sensorWidth": "6.1699999999999999",
            "sensorHeight": "4.6275000000000004",
            "serialNumber": "6a3bff7cff7dffff206affff13ff3028",
            "type": "radial3",
            "initializationMode": "estimated",
            "initialFocalLength": "3.6099999999999994",
            "focalLength": "3.5988558298758133",
            "pixelRatio": "1",
            "pixelRatioLocked": "true",
            "principalPoint": [
                "13.498251531673015",
                "-26.248889380993322"
            ],
            "distortionInitializationMode": "none",
            "distortionParams": [
                "-0.0063465089616403896",
                "0.0030250668407204571",
                "-0.0017936039159354772"
            ],
            "undistortionOffset": [
                "0",
                "0"
            ],
            "undistortionParams": "",
            "locked": "true"
        }
    ],
    "poses": [
        {
            "poseId": "15822413",
            "pose": {
                "transform": {
                    "rotation": [
                        "-0.35775470843537033",
                        "0.72431172821648682",
                        "-0.58939298346720159",
                        "-0.00060470419753410275",
                        "0.6309865263412392",
                        "0.77579355366530955",
                        "0.93381540088239956",
                        "0.27790020500867246",
                        "-0.2253004064154816"
                    ],
                    "center": [
                        "29.711263680710861",
                        "-35.714602995244938",
                        "11.560545223684104"
                    ]
                },
                "locked": "1"
            }
        },
        ...

from here I did the following:

f = 3.5988558298758133
mx = 6.1699999999999999
my = 4.6275000000000004
px = 13.498251531673015
py = -26.248889380993322
p_w = np.array([12.168, 20.072,15.644,1]) # point coordinates in world
t = np.array([29.711263680710861, -35.714602995244938, 11.560545223684104]) # center of camera in world

K = np.array([
    [f/mx , 0, px ],
    [0, f/my, py],
    [0, 0, 1]
])
R = np.array([[-0.35775470843537033, -0.00060470419753410275, 0.93381540088239956],
            [0.72431172821648682, 0.6309865263412392, 0.27790020500867246],
            [-0.58939298346720159, 0.77579355366530955, -0.2253004064154816]], dtype = "double"
            )

I = np.identity(3)
q=np.zeros((3,4))
q[0:3, 0:3]= I
q[0:3,3] = -t     
M = K@R@q
point_image = M @ p_w / (M @ p_w)[2]

obtaining point_image = array([ 13.60954989, -25.90018649, 1. ]) which is unfortunately incorrect. Therefore my questions are the following:

  1. is K correct?
  2. is R correct?
  3. what are the units of the principal points? are those in mm or in pixel coordinates?
  4. ultimately, what am I doing wrong here?

additional info both the coordinates of the point and the camera center are in meters (obtained from blender), but i guess rescaling everything in mm would not make a difference. Is this wrong?

Desktop (please complete the following and other pertinent information):

  • OS: [linux]
  • Python version [e.g. 3.10]
  • Meshroom version: please specify if you are using a release version or your own build
    • Binary version (if applicable) [e.g. 2023]
@simogasp
Copy link
Member

simogasp commented Feb 21, 2024

px and py are offsets wrt the center of the image. I know it's not the standard way that everybody uses but here you have to add px and py to width/2 and height/2, respectively, to get the real principal point.

Also I never remember in which format, row-major or colum-major, the matrices are saved. If after fixing the px and py problem you still have a large projection error try to read R and transpose it before using it.

@simogasp
Copy link
Member

Also for the focal length, again it is not a standard format as it is expressed in mm. To get it back in pixel you can use the formula

pxFocalLength = (focalLength / sensorWidth) *std::max(image().Width(), image().Height());

@simogasp
Copy link
Member

As for the matrix, it should be stored in column-major order as per default in Eigen
https://eigen.tuxfamily.org/dox/group__TopicStorageOrders.html
So your R should be ok.

@AndreaMaestri18
Copy link
Author

hey @simogasp, thanks a lot!

The projection is still way off unfortunately.

pxFocalLength = (f / mx) * 4000
pyFocalLength = (f / my) * 3000
K = np.array([
    [pxFocalLength , 0, 2000 + px ],
    [0, pyFocalLength, 1500+py],
    [0, 0, 1]
])
R = np.array([[-0.35775470843537033, -0.00060470419753410275, 0.93381540088239956],
            [0.72431172821648682, 0.6309865263412392, 0.27790020500867246],
            [-0.58939298346720159, 0.77579355366530955, -0.2253004064154816]], dtype = "double"
            )
RT=R.T
I = np.identity(3)
q=np.zeros((3,4))
q[0:3, 0:3]= I
q[0:3,3] = -t     
M = K@R@q
print(M @ p_w / (M @ p_w)[2])

gives array([2.45869170e+03, 2.51985979e+03, 1.00000000e+00]).
this should be the result: (see green box)
Screenshot 2024-02-21 at 18 03 57
but i get:
Screenshot 2024-02-21 at 18 04 41
also why is it pxFocalLength = (focalLength / sensorWidth) *std::max(image().Width(), image().Height()); and not times the width for pxFocalLength and times the height for pyFocalLength?

@simogasp
Copy link
Member

simogasp commented Feb 21, 2024

also why is it pxFocalLength = (focalLength / sensorWidth) *std::max(image().Width(), image().Height()); and not times the width for pxFocalLength and times the height for pyFocalLength?

It's coming from here when reading the exif
https://github.com/alicevision/AliceVision/blob/57cc8a02f653ce1f754cda2dcf8a3cf517405bf0/src/aliceVision/sfmDataIO/viewIO.cpp#L193

and here when reading from json
https://github.com/alicevision/AliceVision/blob/57cc8a02f653ce1f754cda2dcf8a3cf517405bf0/src/aliceVision/sfmDataIO/jsonIO.cpp#L287
here is just

fx = (fmm / sensorWidth) * double(width);

with fmm the focal in mm from the json and fx the focal on the x in pixels.

It's confusing because the focal length in pixel is always used for all computations but it's exported in mm for compatibility with the ABC and software like Maya Blender and so on.

@fabiencastan @servantftechnicolor can you check if it is the right conversion?

I see that you transpose R in the snippet of code. I imagine that without transposing it does not work either, does it?

@AndreaMaestri18
Copy link
Author

Yeah indeed it doesn't work without transposing R either. Is pheraps the way I compute the coordinates wrong?

@simogasp
Copy link
Member

just to be sure because I don't speak numpy, I was assuming that

M = K@R@q

is the matrix product of the matrices, right? so that we correctly have K * [R |-R*t]. Am I right?
(you'd better not call it t because it could confuse people, that is the center c of the camera in the world coordinates and the actual t of the rototranslation matrix is -R*c = t)

@AndreaMaestri18
Copy link
Author

Thanks a lot! I got it just now. All the things you suggested were correct:

  • translating the principle point by adding width/2 and height/2
  • FX = (f/sensor_width)*width
  • FY = (f/sensor_height)*height
    The reason why it was not working for me it's because I was reading the coordinates of the point in the 3D space from blender, which flipped the axis. I was therefore expecting something that would never happen 😂

But now it works perfectly even without distortion parameters! Thanks again!!

@simogasp
Copy link
Member

simogasp commented Feb 22, 2024

Thanks a lot! I got it just now. All the things you suggested were correct:

  • translating the principle point by adding width/2 and height/2
  • FX = (f/sensor_width)*width
  • FY = (f/sensor_height)*height
    The reason why it was not working for me it's because I was reading the coordinates of the point in the 3D space from blender, which flipped the axis. I was therefore expecting something that would never happen 😂

But now it works perfectly even without distortion parameters! Thanks again!!

That was my next question. I was smelling the usual problem with the different conventions used for expressing the camera frame from computer graphics and computer vision... It's always the usual suspect! ;-)

Just for future reference would you mind posting your working snippet of code, like the one above? Thanks!

@AndreaMaestri18
Copy link
Author

yes of course! from the json I posted at the beginning of the question i get the data, then:

# info from json
f = 3.5988558298758133
mx = 6.1699999999999999
my = 4.6275000000000004
px = 13.498251531673015
py = -26.248889380993322
width = 4000
height = 3000

# points
p_w = np.array([12.187, -20.025,  -16.133, 1]) # point coordinates in world
t = np.array([29.711263680710861,-35.714602995244938, 11.560545223684104]) # center of camera in world

pxFocalLength = (f / mx) * width
pyFocalLength = (f / my) * height

K = np.array([
    [pxFocalLength , 0, px+width/2 ],
    [0, pyFocalLength, py+height/2],
    [0, 0, 1]
])
R = np.array([[-0.35775470843537033, -0.00060470419753410275, 0.93381540088239956],
            [0.72431172821648682, 0.6309865263412392, 0.27790020500867246],
            [-0.58939298346720159, 0.77579355366530955, -0.2253004064154816]], dtype = "double"
            )

q=np.zeros((3,4))
q[0:3, 0:3]= R
q[0:3,3] = -np.dot(R,t)   

M = np.dot(K,q)

pixel_coordinates = np.dot(M, p_w) / np.dot(M , p_w)[2]

and that works perfectly :)

thanks again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants