Update iphone dataset capture

HengyiWang · HengyiWang · commit d5c0db2befc0 · 2023-05-26T12:54:06.000+01:00
diff --git a/DOCUMENTATION.md b/DOCUMENTATION.md
@@ -0,0 +1,136 @@
+# Co-SLAM Documentation
+
+### [Paper](https://arxiv.org/pdf/2304.14377.pdf) | [Project Page](https://hengyiwang.github.io/projects/CoSLAM) | [Video](https://hengyiwang.github.io/projects/Co-SLAM/videos/presentation.mp4)
+
+> Co-SLAM: Joint Coordinate and Sparse Parametric Encodings for Neural Real-Time SLAM <br />
+> [Hengyi Wang](https://hengyiwang.github.io/), [Jingwen Wang](https://jingwenwang95.github.io/), [Lourdes Agapito](http://www0.cs.ucl.ac.uk/staff/L.Agapito/)<br />
+> CVPR 2023
+
+This is the documentation for Co-SLAM that contains data capture, details of different hyper-parameters.
+
+
+
+## Create your own dataset using iphone/ipad pro
+
+1. Download [strayscanner](https://apps.apple.com/us/app/stray-scanner/id1557051662) in App Store
+2. Record the RGB-D video
+3. The output fold should be at `Files/On My iPad/Stray Scanner/`
+4. Create your own config files for dataset, set `dataset: 'iphone'` , check `./configs/iPhone` for details. (Note intrinsics given by Stray Scanner should be divided by 7.5 to align the RGB-D frame)
+5. Create your config file for specific scene, you can define the scene bound yourself, or use provided `vis_bound.ipynb` to determine scene bound
+
+NOTE: The resolution of depth map is (256, 192), which is a bit too small. The camera tracking of neural SLAM won't be very robust on the iphone dataset. It is recommended to use RealSense for data capturing. Any suggestions on this would be welcome.
+
+## Parameters
+
+### Tracking
+
+```python
+iter: 10 # num of iterations for tracking
+sample: 1024 # num of samples for tracking
+pc_samples: 40960 # num of samples for tracking using pc loss (Not used)
+lr_rot: 0.001 # lr for rotation
+lr_trans: 0.001 # lr for translation
+ignore_edge_W: 20 # ignore the edge of image (W) 
+ignore_edge_H: 20 # ignore the edge of image (H)
+iter_point: 0 # num of iterations for tracking using pc loss (Not used)
+wait_iters: 100 # Stop optimizing if no improvement for k iterations 
+const_speed: True # Constant speed assumption for initializing pose
+best: True # Use the pose with smallest loss/Use last pose
+```
+
+
+
+### Mapping
+
+```python
+sample: 2048 # Number of pixels used for BA
+first_mesh: True # Save first mesh
+iters: 10 # num of iterations for BA
+  
+# lr for representation, an interesting observation is that
+# if you set lr_embed=0.001, lr_decoder=0.01, this makes
+# decoder relies more on coordinate encoding, results in
+# better completion. This is suitable for room-scale scene, but
+# is not suitable for TUM RGB-D...
+lr_embed: 0.01 # lr for HashGrid
+lr_decoder: 0.01 # lr for decoder
+  
+lr_rot: 0.001 # lr for rotation
+lr_trans: 0.001 # lr for translation
+keyframe_every: 5 # Select keyframe every 5 frames
+map_every: 5 # Perform BA every 5 frames
+n_pixels: 0.05 # num of pixels saved for each frame
+first_iters: 500 # num of iterations for first frame mapping
+ 
+# As we perform global BA, we need to make sure 1) for every 
+# iteration, there should be samples from current frame, and
+# 2) Do not sample too many pixels on current frame, which may 
+# introduce bias, it is suggested min_pixels_cur = 0.01 * #samples
+optim_cur: False # For challenging scenes, avoid optimizing current frame pose during BA
+min_pixels_cur: 20 # min pixels sampled for current frame
+map_accum_step: 1 # num of steps for accumulating gradient for model
+pose_accum_step: 5 # num of steps for accumulating gradient for pose
+map_wait_step: 0 # wait n iterations to start update model
+filter_depth: False # Filter out outliers or not
+```
+
+
+
+### Parametric encoding
+
+```python
+enc: 'HashGrid' # Type of grid, including 'DenseGrid, TiledGrid' as describled in tinycudann
+tcnn_encoding: True # Use tcnn encoding
+hash_size: 19 # Hash table size, refer to our paper for different settings of each dataset
+voxel_color: 0.08 # Voxel size for color grid (if applicable)
+voxel_sdf: 0.04 # Voxel size for sdf grid (Larger than 10 means voxel dim instead, i.e. fixed resolution)
+oneGrid: True # Use only OneGrid
+```
+
+
+
+### Coordinate encoding
+
+```python
+enc: 'OneBlob' # Type of coordinate encoding
+n_bins: 16 # Number of bins for OneBlob
+```
+
+
+
+### Decoder
+
+```python
+geo_feat_dim: 15 # Dim of geometric feature for color decoder
+hidden_dim: 32 # Dim of SDF MLPs
+num_layers: 2 # Num of layers for SDF MLP
+num_layers_color: 2 # Num of layers for color MLPs
+hidden_dim_color: 32 # Dim of color MLPs
+tcnn_network: False # Use tinycudann MLP or Pytorch MLP
+```
+
+
+
+### Training
+
+```python
+rgb_weight: 5.0 # weight of rgb loss
+depth_weight: 0.1 # weight of depth loss
+sdf_weight: 1000 # weight of sdf loss (truncation)
+fs_weight: 10 # weight of sdf loss (free space)
+eikonal_weight: 0 # weight of eikonal (Not used)
+smooth_weight: 0.001 # weight of smoothness loss (Smaller, as it is applied for feature)
+smooth_pts: 64 # Dim of random sampled grid for smoothness
+smooth_vox: 0.1 # Voxel size of random sampled grid for smoothness
+smooth_margin: 0.05 # Margin of sampled grid
+#n_samples: 256
+n_samples_d: 96 # Num of sample points for rendering
+range_d: 0.25 # Sampled range for depth-guided sampling [-25cm, 25cm]
+n_range_d: 21 # Num of depth-guided sample points
+n_importance: 0 # Num of sample points for importance sampling
+perturb: 1 # Random perturbation (1:True)
+white_bkgd: False
+trunc: 0.1 # truncation range (10cm for room-scale scene, 5cm for RUM RGBD)
+rot_rep: 'quat' # Rotation representation. (Axis angle does not support identity init)
+```
+
diff --git a/README.md b/README.md
@@ -23,7 +23,8 @@ This repository contains the code for the paper Co-SLAM: Joint Coordinate and Sp
 - [x] Code for Co-SLAM [2023-5-12]
 - [x] Code for offline RGB-D reconstruction [click here](https://github.com/HengyiWang/Hybrid-Surf). [2023-5-12]
 - [x] Code for evaluation strategy, performance analysis [click here](https://github.com/JingwenWang95/neural_slam_eval). [2023-5-18]
-- [ ] Tutorials on creating sequences using iPhone/iPad Pro
+- [x] Tutorials on params & creating sequences using iPhone/iPad Pro [click here](./DOCUMENTATION.md).   [2023-5-26]
+- [ ] Tutorials on creating sequences using RealSense
 
 ## Installation
 
diff --git a/configs/IPhone/iphone.yaml b/configs/IPhone/iphone.yaml
@@ -0,0 +1,101 @@
+dataset: 'iphone'
+
+data:
+  downsample: 1
+  sc_factor: 1
+  translation: 0
+  num_workers: 4
+
+mapping:
+  sample: 2048
+  first_mesh: True
+  iters: 10
+  lr_embed: 0.01
+  lr_decoder: 0.01
+  lr_rot: 0.001
+  lr_trans: 0.001
+  keyframe_every: 5
+  map_every: 5
+  n_pixels: 0.05
+  first_iters: 500
+  optim_cur: False
+  min_pixels_cur: 20
+  map_accum_step: 1
+  pose_accum_step: 5
+  map_wait_step: 0
+  filter_depth: False
+
+tracking:
+  iter: 10
+  sample: 1024
+  pc_samples: 40960
+  lr_rot: 0.001
+  lr_trans: 0.001
+  ignore_edge_W: 20
+  ignore_edge_H: 20
+  iter_point: 0
+  wait_iters: 100
+  const_speed: True
+  best: True
+
+grid:
+  enc: 'HashGrid'
+  tcnn_encoding: True
+  hash_size: 19
+  voxel_color: 0.08
+  voxel_sdf: 0.04
+  oneGrid: True
+
+pos:
+  enc: 'OneBlob'
+  n_bins: 16
+
+decoder:
+  geo_feat_dim: 15
+  hidden_dim: 32
+  num_layers: 2
+  num_layers_color: 2
+  hidden_dim_color: 32
+  tcnn_network: False
+
+cam:
+  H: 192
+  W: 256
+  fx: 213.17225333
+  fy: 213.17225333
+  cx: 126.25554667
+  cy: 95.27465333
+  png_depth_scale: 1000. #for depth image in png format
+  crop_edge: 0
+  near: 0
+  far: 5
+  depth_trunc: 5.
+
+training:
+  rgb_weight: 5.0
+  depth_weight: 0.1
+  sdf_weight: 1000
+  fs_weight: 10
+  eikonal_weight: 0
+  smooth_weight: 0.001 #0.001
+  smooth_pts: 64
+  smooth_vox: 0.1
+  smooth_margin: 0.05
+  #n_samples: 256
+  n_samples_d: 96
+  range_d: 0.25
+  n_range_d: 21
+  n_importance: 0
+  perturb: 1
+  white_bkgd: False
+  trunc: 0.1
+  rot_rep: 'quat'
+  rgb_missing: 0.0
+
+mesh:
+  resolution: 512
+  vis: 500
+  voxel_eval: 0.02
+  voxel_final: 0.015
+  visualisation: False
+  
diff --git a/configs/IPhone/statue.yaml b/configs/IPhone/statue.yaml
@@ -0,0 +1,12 @@
+inherit_from: configs/IPhone/iphone.yaml
+mapping:
+  bound: [[-2,2],[-1.2,0.1],[-3.5,1.2]]
+  marching_cubes_bound: [[-2,2],[-1.2,0.1],[-2,1.2]]
+data:
+  datadir: ./data/iphone/statue
+  trainskip: 1
+  output: output/statue
+  exp_name: demo
+cam:
+  far: 5
+  depth_trunc: 5
diff --git a/tools/vis_cameras.py b/tools/vis_cameras.py
@@ -0,0 +1,86 @@
+'''
+camera extrinsics visualization tools
+modified from https://github.com/opencv/opencv/blob/master/samples/python/camera_calibration_show_extrinsics.py
+'''
+
+import numpy as np
+import cv2 as cv
+import open3d as o3d
+
+
+def inverse_homogeneoux_matrix(M):
+    R = M[0:3, 0:3]
+    t = M[0:3, 3]
+    M_inv = np.identity(4)
+    M_inv[0:3, 0:3] = R.T
+    M_inv[0:3, 3] = -(R.T).dot(t)
+
+    return M_inv
+
+
+def draw_cuboid(bound):
+    x_min, x_max = bound[0, 0], bound[0, 1]
+    y_min, y_max = bound[1, 0], bound[1, 1]
+    z_min, z_max = bound[2, 0], bound[2, 1]
+    points = [[x_min, y_min, z_min], [x_max, y_min, z_min], [x_max, y_max, z_min], [x_min, y_max, z_min],
+              [x_min, y_min, z_max], [x_max, y_min, z_max], [x_max, y_max, z_max], [x_min, y_max, z_max]]
+    lines = [[0, 1], [1, 2], [2, 3], [3, 0], [4, 5], [5, 6], [6, 7], [7, 4], [0, 4], [1, 5], [2, 6], [3, 7]]
+
+    colors = [[0, 1, 0] for i in range(len(lines))]
+    line_set = o3d.geometry.LineSet()
+    line_set.points = o3d.utility.Vector3dVector(points)
+    line_set.lines = o3d.utility.Vector2iVector(lines)
+    line_set.colors = o3d.utility.Vector3dVector(colors)
+
+    return line_set
+
+
+def draw_camera(cam_width, cam_height, f, extrinsic, color, show_axis=True):
+    """
+    :param extrinsic: c2w tranformation
+    :return:
+    """
+    points = [[0, 0, 0], [-cam_width, -cam_height, f], [cam_width, -cam_height, f],
+              [cam_width, cam_height, f], [-cam_width, cam_height, f]]
+    lines = [[0, 1], [0, 2], [0, 3], [0, 4], [1, 2], [2, 3], [3, 4], [4, 1]]
+    colors = [color for i in range(len(lines))]
+
+    line_set = o3d.geometry.LineSet()
+    line_set.points = o3d.utility.Vector3dVector(points)
+    line_set.lines = o3d.utility.Vector2iVector(lines)
+    line_set.colors = o3d.utility.Vector3dVector(colors)
+    line_set.transform(extrinsic)
+
+    if show_axis:
+        axis = o3d.geometry.TriangleMesh.create_coordinate_frame()
+        axis.scale(min(cam_width, cam_height), np.array([0., 0., 0.]))
+        axis.transform(extrinsic)
+        return [line_set, axis]
+    else:
+        return [line_set]
+
+
+def visualize(extrinsics=None, things_to_draw=[]):
+
+    ########################    plot params     ########################
+    cam_width = 0.64/2     # Width/2 of the displayed camera.
+    cam_height = 0.48/2    # Height/2 of the displayed camera.
+    focal_len = 0.20     # focal length of the displayed camera.
+
+    ########################    original code    ########################
+    vis = o3d.visualization.Visualizer()
+    vis.create_window()
+
+    if extrinsics is not None:
+        for c in range(extrinsics.shape[0]):
+            c2w = extrinsics[c, ...]
+            camera = draw_camera(cam_width, cam_height, focal_len, c2w, color=[1, 0, 0])
+            for geom in camera:
+                vis.add_geometry(geom)
+
+    axis = o3d.geometry.TriangleMesh.create_coordinate_frame()
+    vis.add_geometry(axis)
+    for geom in  things_to_draw:
+        vis.add_geometry(geom)
+    vis.run()
+    vis.destroy_window()
diff --git a/vis_bound.ipynb b/vis_bound.ipynb