Please enable JavaScript.
Coggle requires JavaScript to display documents.
cs231n - Coggle Diagram
cs231n
3D
shape representations
depth map
- rgb+d, ~2.5D: represents only visible part
- for each px gives ditance(camera, object)
- rgb -> FCN -> per pixel scale invariant loss
- sensors: ms kinect
surface normals
- for each px gives normal vector to object
- orientation of surface
- cannot represent occluded part of img
- rgb -> FCN -> predicted normals: 3xHxW -> per pixel loss
- loss: \(\frac{xy}{|x||y|}\)
voxels
- ~VxVxV grid of occupancies
- need high spatial resolution to capture fine structure
- scaling to high resolution is not trivial
- storing \(1024^3\) grid takes 4gb
- optimizations: oct-trees, nested shapes
- classification: voxel -> 3D CNN -> classification loss
- generation:
- img -> 2d cnn -> FC -> 3D CNN -> per voxel cross entropy
3D convs a very expensive
- img -> 2d encoder -> 2d decoder -> voxel
- final conv layer: V filters intepret as a "tube" of voxel scores
- we lose translation invariance in z-dim
-
-
triangle mesh
- polygonal mesh
- data structures: OBJ, OFF, WRL
MeshRCNN
- Mask-RCNN + mesh head + mesh regularizer
-
with mesh deformation topology if fixed
- ⟳ voxel prediction to create initial mesh
- regularizer = minimization of l2 norm of edges
Pixel2Mesh
- iterative refinement
- graph convolution
- vertex aligned-features
- Chamfer loss function
-
-
-
-
history
classification problem
linear model
-
-
optimization
-
computation graph
NN
CNN
training
-
-
-
-
-
-
-
-
-
-
-
DL software
- 1 more item...
-