Please enable JavaScript.
Coggle requires JavaScript to display documents.
Business Case for Selecting LC GPU Servers/Workstations, (Introduction,…
Business Case for Selecting LC GPU Servers/Workstations
Market
Trends in GPU Computing
1.4 Selecting the best GPU server/workstation technology: Business leaders want more competitiveness.
1.4.3 How to achieve 2: Decrease TCO (cooling costs, cloud-based services, etc.)
1.4.2 How to achieve: Get more performance out of IT equipment
1.4.4 How to improve uptime, increase productivity of employees,
1.4.1. Become more competitive and profitable as a company by being more leane, and agile than rivals.
1.1 Introduction
1.1.1 Growth in AI and increased need of GPU power for content creation workflows
1.1.1.2 Content creation
1.1.1.2.1 Application1: Designing new products and solutions
1.1.1.2.2 Application 2: Architecture
1.1.1.2.3 Application3: Digital contents for media and entertainment consumption
1.1.1.2.4 Data science
1.1.1.1 AI/ML
1.1.1.1.1 Autonomous machines and self-driving cars
1.1.1.1.2 Machine learning and deep learning
1.1.1.1.3 Healthcare: Drug discovery, medical imaging
1.1.2 Digital investments are not meant to change an organization's technology. It's to transform the business.
1.1.3 According to PWC's WP, digital ROI framework categorizes focus areas into: Customers, employees, operations, safety and soundness, infrastructure, and disruption and innovation.
Link Title
1.1.4 Advances in GPU computing technology make achieving success in each category easier.
1.2. Problem
1.2.2 How to get more GPU power out per $ spent
1.2 What to consider in selecting GPU server/workstation
1.2.1 Compute power
1.2.1.1 FLOPS: Compare how much H/W costs for same level of compute power in FLOPS (EKFW vs. competitor)
1.2.1.2 AC GPU computers cannot perform to their maximum rated performance.
1.2.2 Energy efficiency
1.2.2.1 FLOPS/Watt:
1.2.3 TCO over time
1.2.3.1 Assuming running above
1.2.3.2 AC CPU/GPU servers/workstations cost a lot of money to run. (Approx. 40% of data center power consumption is cooling)
1.2.4 Serviceability
1.2.4.1 Closed box vs. User-serviceable features.
1.2.4.2. Importance: Reduce repair time. Higher uptime
1.2.5 Reliability
1.2.5.1 Electronics failure mainly caused by high temperature generated by components
1.2.6 Expandability
1.2.6.1 Pay-as-you-grow. User-installable/replaceable.
1.2.7 Ease of deployment
1.2.7.1 Any offfice environment
Latest trend in AI/DCC: LC GPU computer
2.1 Describe: AC and LC Technologies and their basic differences
2.1.1 AC
2.1.2 LC
2.2 AC: Pros and Cons
2.2.1 AC Pro1: Lower initial costs
Recommendation: If CPU/GPU won't get hot (above xx-celsius), then AC is best choice.
2.2.2 AC Con1: Has cooling performance limits
Consequence: Thermal throttling degrades CPU/GPU performance by approximately 20%(?)
2.2.3 AC Con2: Noisy fans
Consequence: Less productivity and higher stress by users
2.2.4 AC Con3: Thermal fatigue on electronic components
Consequence: Lessened reliability in performance
Consequence: Shorter operational life of equipment.
2.3 LC: Pros and Cons
2.3.1 LC Pro1: Near silent operation
2.3.1.1 LC Pro1.1: Install anywhere
2.3.1.2 LC Pro1.2: Less user stress
2.3.2 LC Pro2: Lower TCO
2.3.1.3 LC Pro2.1: Energy-efficiency
2.3.1.3.1 LC Pro2.1.1: 30-50% less energy usage in DC environment
2.3.3 LC Pro3: More reliable
2.3.3.1.LC Pro3.1: (NEED RELIABILITY DATA)
2.3.4 LC Pro4: Better utilization of CPU/GPU performance capacities
2.3.4 LC Pro5: Faster ROI than AC
2.3.4.1 LC Pro5.1: Increased employee productivity through less waiting time
2.3.4.1.1 LC Pro5.1.1: Compare similarly configured AC and LC setups' workload processing time.
2.3.6 LC Con1: Difficult to reconfigure machine
2.3.6.1 LC Con1.1: Many LC machines are custom-made. Custom-design means custom-problems.
2.3.7 LC Con2: Potential for leakage
2.3.7.1 LC Con2.1: Non-commercial-grade custom-designed LC systems can have higher chance of liquid leakage.
2.3.8 LC Con3: Serviceability and repair recovery time
2.3.8.1 LC Con3.1: Lack of component standardization increases time to repair.
2.3.5 LC Pro6: Higher thermal capacity
2.3.5.1 LC Pro 6.1: Water has about x4000 more efficient in carry heat than air at same volume.
Standardized, industrial-grade GPU modules
3.1 Why are LC'd modular GPUs with QDCs the next stage in evolution for GPU servers/workstations?
3.1.1 Current market offerings
3.1.1.1 Air-cooled (blower-type) GPUs
3.1.1.1.1 Pros: Easy to install, replace, expand
3.1.1.1.2 Cons: Lower performance due to thermal issues (See AC Cons list)
3.1.1.2 Liquid-cooled, but closed boxes
3.1.1.2.1 Pros:
3.1.1.2.1.1 Pro1. Industrial-grade, permanent configuration built solidly according to manufacturer specs.
3.1.1.2.1.2 Pro2. Thermal performance optimized to push GPU performance the limit
3.1.1.2.2 Cons:
3.1.1.2.2.1 Con1: Impossible to work inside the chassis to change configuration, upgrade, service, etc.
3.1.2 Next-generation solution
3.1.2.1 Takes the best of current AC'd and LC'd GPU form factors.
3.1.2.1.1 Familiarity and the same ease of use as standard PCIe GPU cards
3.1.2.1.1.1: GPUs have water blocks with QDC couplings. Install and remove GPU module
3.1.2.1.2 Benefits of LC performance
3.1.2.1.3 Acoustics: More productive and less stressed workers
Conclusion: Call to Action
4.1 Look for GPU servers/workstations that:
4.1.2 Business values:
4.1.2.2 Employees: Allow employees to reduce wait times and frustration due to slow systems.
Silent operation reduces stress level and increases productivity.
4.1.2.3 Operations: Faster throughput, remote collaborative work,
4.1.2.1 Customers: Will be happier with quicker and reliable results
4.1.2.4 Infrastructure: Speed of implementation, uptime, minimizing downtime
4.1.1 Technical features
4.1.1.1. LC
4.1.1.2. Modular GPU
4.1.1.3. QDC coupling
4.1.1.4. Prevents thermal throttling
4.1.1.5. Whisper-quiet
4.1.1.6. Serviceability
4.1.1.7. Expandability
4.1.1.8. Ease of deployment
Introduction
Problem
Solution
Conclusion/Call to action