-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collection of questions #85
Comments
It's just a matter of design choice. I just wanted to keep the input tensors as (B, 3, N), but you can surely use fully connected layers instead.
This is also a design choice. We pose it as classification problem of inside/outside occupancy instead of regression problem with SDF. Both are valid choices.
The problem is how to define "world space". Unlike ShapeNet, humans are articulated and thus difficult to define canonicalized space. Unless all the data samples share the same normalization, providing world coordinates is of little help. The multiview PIFu takes xyz information in a view dependent way and average pooling consolidates those view information to make final prediction in the shared coordinate space.
The feature from geometry module is concatenated after image encoder (yes, the figure is misleading.) You can find how its is added below. Line 79 in 9753311
|
Ahh, I see, thank you! I have one last question: I have changed the entire setup quite dramatically, so I dont know if this also applies to your original code. But I found that using only the last stack (so no intermediate loss) during training dramatically improves the result, it goes from 60% IOU to around 80%. |
Sorry for late reply.
Yes. The original motivation was to let each stack to predict the consistent feature vectors by sharing the same MLP. On the other hand, this may limit the expressiveness of the reconstruction from the last feature (and looks like your experiment indicates that too). Please let me know if you publish/release your ongoing experiments somewhere. As PIFu can be still improved in many aspects and I'm curious how others progress this field! |
Hi Robin, I am currently working on understanding PiFu in Detail. I am curious about how you improved the IOU by a large margin. Could you please elaborate your changes to improve the IOU? Have you published your changes somewhere? Thanks a lot! |
You can find my changes here, but I made quite a lot of them: https://github.com/GTO2013/PIFu. Its made to be multi-view and each image can be of different size for example. I used it to reconstruct cars from blueprints: https://twitter.com/RobinK961/status/1387651500302815233 |
Thanks for your quick reply! I will check this. |
Hi,
I am currently working on understanding PiFu in Detail, so I can apply it to cars. I already wrote you an email if you remember. I have it running and its doing decently already for the short timeframe :)
Here are my questions:
From this image it looks like the input image is directly added to the feature vector after the encoder (line below the upper image encoder). But where is this done in code? I have looked for it, but I didnt find it. The input always goes through some filtering before the classifier sees it, is that correct?
Sorry for the amount of questions, I hope you find some time to answer them.
Thank you!
The text was updated successfully, but these errors were encountered: