Please enable JavaScript.

Coggle requires JavaScript to display documents.

Perception | Physical AI | Multimodal - Coggle Diagram

- - - - GPT-4o — native multimodal (text+vision+audio in one pass)
      - Gemini 2.5 Pro — top Chatbot Arena score (1446), leads video understanding
      - Claude 4 Opus — 200K context, strong vision + reasoning
      - Grok 3 — multimodal with real-time X/web grounding
    - - Sora (OpenAI) — text-to-video world model
      - Veo 2 (Google) — photorealistic video generation
      - Video understanding at minute+ scale for complex reasoning
    - - Any-to-any: input text, output image + audio simultaneously
      - Real-time voice + vision (GPT-4o voice mode)
      - Sensor + language fusion for robotics (VLA)
    - - Multimodal AI market: USD 1.6B (2024) → USD 27B (2034)
      - 32.7% CAGR — fastest-growing AI subsegment
      - Enterprise adoption: healthcare imaging, legal doc review, manufacturing QA
    - - Real world is multimodal — humans process all senses together
      - Unimodal AI hits ceiling; multimodal unlocks emergent reasoning
      - Autonomous agents need to see, hear, speak & act simultaneously
      - Healthcare, defence, robotics all demand multi-sensor AI
      - Token-based architecture makes adding modalities modular
  - - - Built on Gemma 3 (27B) — open-source, commercially licensed
      - Trained on 1 trillion+ tokens with heavy SEA dataset curation
      - Native multimodal: image understanding + text (128K context)
      - Supports 11 SEA languages + English with cultural nuance
      - Agentic workflows: function calling, JSON output, tool use
    - - First open vision-language model for Southeast Asian context
      - Understands culturally specific imagery, signage, context
      - Released for community to build SEA-specific applications
    - - Audio large language model for Singapore/SEA accents & languages
      - Bridges audio and language understanding
      - Targets local speech diversity — Singlish, Malay, Tamil
    - - Safety-focused LLM fine-tuned for SEA cultural norms
      - Moderates content according to regional values & standards
    - - SEA-LION embodies NAIS goal: AI that works for Singaporeans
      - Multilingual AI as sovereign capability for SEA leadership
    - - Multimodal diagnostic AI — medical imaging + clinical notes + labs
      - AI triage combining visual scan + patient history
    - - AISG funds SEA-LION as 100E (100 Experiments) programme output
      - NUS/NTU multimodal AI research groups
      - SEA-LION Summit 2025 — regional multimodal AI community
    - - SEA is linguistically diverse — needs localised multimodal AI
      - Singapore as SEA AI hub — export multimodal models regionally
      - IMDA AI Verify extended to cover multimodal system governance
      - NAIS: AI must serve Singapore's social & cultural context
- - - - SAM 2 — segment anything, zero-shot
      - Grounding DINO — open-vocabulary detection
      - Florence-2 — unified vision backbone
  - - - ML modules reduce false alarms in drone detection
      - Physics + knowledge + data-based AI fusion
      - Faster, more accurate drone classification
    - - RSAF partnership on manned-unmanned teaming
      - Autonomous mission execution in contested airspace
    - - Manned-unmanned collaborative operations
      - AI-enhanced situational awareness
    - - Internal tool for mission planning & decision support
    - - Multi-modal EW & ISR sensor fusion
    - - Singapore Vision Day 2026 — leading CV conference
      - Research in medical imaging, AV perception, 3D vision
    - - Machine vision for defect detection in fabs
      - Predictive maintenance via sensor streams
    - - Smart cameras & radar for port + logistics AI
- - - - RT-1 (2022) — Google, first large-scale transformer robot policy
      - RT-2 (2023) — VLM backbone controlling robot directly
      - OpenVLA (2024) — open-source VLA for community
      - π0 (2024) — Physical Intelligence, dexterous flow matching
      - GR00T N1.7 (Apr 2026) — NVIDIA, 3B params, EgoScale on 20,854h video
      - GR00T N2 (preview) — world action model, 2x success on new tasks
    - - NVIDIA Cosmos 3 (COMPUTEX 2026) — world foundation model for autonomous systems
      - Synthetic data generation — train on sim, close the gap with domain randomisation
      - Isaac Sim / IsaacLab — physics-accurate robot simulation
    - - Figure, Agility Robotics, 1X entering factory floors
      - Boston Dynamics Atlas — fully electric humanoid (2024)
      - Industrial robot density accelerating globally
    - - LLMs cracked language — VLAs applying same recipe to physical actions
      - Labour shortage: ageing populations need robot augmentation
      - Cost of robot hardware dropping — sensors, actuators, compute
      - World models eliminate need for millions of real-world trials
      - Foundation model paradigm shift: one model, many tasks
  - - - Singapore's first multi-operator Physical AI testbed
      - IMDA + JTC + Singapore Institute of Technology (SIT)
      - Partners: Certis (security), DHL (logistics), Grab (delivery), QuikBot
      - Real mixed-use public space — not a lab environment
      - Test food delivery, parcel, cleaning, security patrol robots
    - - Co-announced with NAIS Update 2026
      - Focus on Physical AI research in Singapore context
      - Collaboration with local universities & industry
    - - One of highest per-capita robotics investments globally
      - Covers industrial, service, healthcare & defence robotics
      - Singapore robot density ~5x global average
    - - Robots for semiconductor, aerospace, precision engineering
      - Physical AI for process redesign via digital twins
      - Predictive maintenance reducing production downtime
    - - Surgical robots — AI-guided precision surgery
      - Care robots for ageing population (eldercare)
    - - Autonomous port logistics (PSA Singapore)
      - Last-mile delivery robots in HDB estates
    - - Dense urban environment = ideal testbed for robots
      - Labour constraints drive urgency — ageing + tight labour market
      - Strong manufacturing base needs robot augmentation
      - Government mandate: AI delivery phase, not just planning