Visual Odometry is All You Need for Realistic PointGoal Navigation, Сan…
Visual Odometry is All You Need for Realistic PointGoal Navigation
What scientific questions do we (try to) answer?
Is the visual odometry module is all you need to close the gap?
HC 2021 benchmark results
VO consistency gives only 0.04 SPL hit to any GT nav policy
Significantly outperforms map-based
Plug and play usability: no need to fine-tune
Will the PoitNav v2 agent ("realistic") that does fully visual/perceptual navigation transfer from simulation to reality?
Can the PoitNav v1 and PoitNav v2 Success and SPL gap be closed?
Estimation that PointNav v2 max is 0.84 SPL, not 0.99 SPL
What is Visual Odometry?
Why do we need VO?
Camera is cheap
GPS & Compass are too noisy in indoor environments
Why not use noisy egomotion from wheels?
What is PointGoal Navigation?
PointNav v2 (Realistic)
What are approaches to solve this setting?
PointNav v1 (already solved)
How this method is different from Datta et al.?
Datta et al. had less significant results before and Anticipation map dominated
VO odometry is trained on policy agnostic data (shortest path follower trajectories)
Relatively small dataset size
How this method is different from Zhao et al.?
DropOut is not applied
Concatenated to every FC
Zhao's et al. code is opensourced(
). Should we plug Depth Discretization/TopDownMap as they did and check the results?
Much lower nav-metrics hit on Gibson val between Policy+VO and policy+GT localization
Correct to state: "ensembling via data augmentations"?
Should we retrain our approach in HC2020 setting to compare with Zhao's et al.
Distributed dataset generation (image resizing and format to preserve space)
Opensourced code and checkpoints to push Embodied AI forward
Distributed (large scale) VO training
Future work: Where to move next?
Online training from HSim?
Reward how close you’re to the center
Reduce success in the zone
Stop on a range
Distance to goal at failure take a look
Properties and (potential) applications
Reuse as a fully visual navigator for Rearrangement (or other tasks that require GT localization (GPS & Compass))?
Mention that 2.5 SPL reward brings +2 SPL?
Сan map-free approach with general purpose models solve PointNav v2 similar to PointNav v1?
2) While robots still using SLAM for PointNav even PointNav v1 was solved. Solving PointNav v2 can change it, but challenging.
3) Сan map-free approach with general purpose models solve PointNav v2 similar to PointNav v1?
4) First, we investigate what solves mean: 0.84 SPL, 0.98 Success, because of actuation noise on short episodes.
5) We show that even with localization 0.84 SPL hard to reach with same scale as PointNav v1 and we reach 0.80 SPL vs 0.71 SPL in Zhao.
6) We focus on decreasing hit from switching from GT localization to estimated localization using VO and reached only 0.0x SPL hit.
8) all that set new SOTA XX that leaves Y difference for task being solved.
7) We learned that action embedding, test argumentation and scale of the dataset and model are key ingredients to boost VO robustness.
Action embedding: +8 Success + 6SPL
Train/test Augmentations: +5 Success + 4SPL
Large dataset (old 0.5M - new 1.5M): +8 Success + 6SPL
More powerful encoder: +3 Success + 3SPL
1) PointGoal task is foundational task for robots like boston dynamics, roomba and embodied ai tasks.