Visual Odometry is All You Need for Realistic PointGoal Navigation
What scientific questions do we (try to) answer?
Is the visual odometry module is all you need to close the gap?
Will the PoitNav v2 agent ("realistic") that does fully visual/perceptual navigation transfer from simulation to reality?
Can the PoitNav v1 and PoitNav v2 Success and SPL gap be closed?
What is Visual Odometry?
What is PointGoal Navigation?
Why do we need VO?
PointNav v2 (Realistic)
PointNav v1 (already solved)
What are approaches to solve this setting?
Sim2Real experiment
MP3D transfer
VO ablations
HC 2021 benchmark results
VO consistency gives only 0.04 SPL hit to any GT nav policy
How this method is different from Datta et al.?
Map-free
Map-based
How this method is different from Zhao et al.?
Single model
Less parameters
Action embedding
DropOut is not applied
Concatenated to every FC
Zhao's et al. code is opensourced(https://github.com/Xiaoming-Zhao/PointNav-VO). Should we plug Depth Discretization/TopDownMap as they did and check the results?
Datta et al. had less significant results before and Anticipation map dominated
Much lower nav-metrics hit on Gibson val between Policy+VO and policy+GT localization
Engineering contribution
Distributed dataset generation (image resizing and format to preserve space)
Future work: Where to move next?
Opensourced code and checkpoints to push Embodied AI forward
Distributed (large scale) VO training
Online training from HSim?
Estimation that PointNav v2 max is 0.84 SPL, not 0.99 SPL
Camera is cheap
GPS & Compass are too noisy in indoor environments
Why not use noisy egomotion from wheels?
Properties and (potential) applications
Correct to state: "ensembling via data augmentations"?
Significantly outperforms map-based
Reuse as a fully visual navigator for Rearrangement (or other tasks that require GT localization (GPS & Compass))?
VO odometry is trained on policy agnostic data (shortest path follower trajectories)
Plug and play usability: no need to fine-tune
Mention that 2.5 SPL reward brings +2 SPL?
nav-policy
Reward how close you’re to the center
Reduce success in the zone
Stop on a range
Distance to goal at failure take a look
Relatively small dataset size
Should we retrain our approach in HC2020 setting to compare with Zhao's et al.
HM3D transfer?
Introduction flow:
Сan map-free approach with general purpose models solve PointNav v2 similar to PointNav v1?
2) While robots still using SLAM for PointNav even PointNav v1 was solved. Solving PointNav v2 can change it, but challenging.
3) Сan map-free approach with general purpose models solve PointNav v2 similar to PointNav v1?
4) First, we investigate what solves mean: 0.84 SPL, 0.98 Success, because of actuation noise on short episodes.
5) We show that even with localization 0.84 SPL hard to reach with same scale as PointNav v1 and we reach 0.80 SPL vs 0.71 SPL in Zhao.
6) We focus on decreasing hit from switching from GT localization to estimated localization using VO and reached only 0.0x SPL hit.
8) all that set new SOTA XX that leaves Y difference for task being solved.
7) We learned that action embedding, test argumentation and scale of the dataset and model are key ingredients to boost VO robustness.
Action embedding: +8 Success + 6SPL
Train/test Augmentations: +5 Success + 4SPL
Large dataset (old 0.5M - new 1.5M): +8 Success + 6SPL
More powerful encoder: +3 Success + 3SPL
1) PointGoal task is foundational task for robots like boston dynamics, roomba and embodied ai tasks.