Please enable JavaScript.
Coggle requires JavaScript to display documents.
Algorithm for AI Chips company (Comparison between Training and…
Algorithm
for AI Chips company
Efficient Deep Learning
(EDL)
balance between
computations and I/O operations
operation fusions
winograd method
asynchoronous BP
polyhedral optimisation
BP re-forwarding
compress bitwidth of data
model quantisation
binary/ternary nets
mixed precision training
simplify nets architecture
knowledge distillation,
compact network design
model compression
network pruning
for distributed setting
AllReduce algorithm
tricks for large batch training,
such as modified optimisers, regularisers, normalisation, etc.
Learning Theory
to fix the lack of accuracy
variety of other optimisers, such as evolutionary methods, swarm methods, interiour point methods., etc.
gradient-based optimisers
Hardware Software Co-design
(HSC)
why
Existing hardware is not fit for models
Existing models has imbalanced utilization upon hardware
how
Specific hardware-oriented model design
Solutions from hardware-side
General solutions from software-side
Comparison between Training and Inference
Bandwidth between Ext-mem and host-mem
Bandwidth between Ext-mem and L2-mem
Hostmem Memory Requirements
L2 Memory Requirements
BatchSize
Sensitivity to Precision Loss