Please enable JavaScript.

Coggle requires JavaScript to display documents.

Heterogeneous Computing (HC) (Campos (Frameworks y herramientas de…

- - - - GPU Y CPU
  - - - Single-Chip Heterogeneus Processor (SCHP)
      - Accelerated Processing Unitit (APU)
      - Fused or Integrated Systems
    - - Discrete Systems
- - - - PLASMA, that enables writing of portable Single-Instruction, Multiple-Data (SIMD) programs. PLASMA uses an Intermediate Representation (IR), which provides succinct and clean abstractions (i.e., free from details of any particular SIMD architecture) to enable programs to be compiled on different PUs. Then, using a runtime, these programs can be automatically multithreaded and executed on different PUs, such as a GPU, a CPU, and a Cell BE; for example, aforloop in a kernel can be split across a CPU and a GPU. The runtime takes care of load balancing and distributed memory and also ensures that before any computation, data are moved from the CPU to the GPU or vice versa.
    - - runtime framework for the execution of workloads represented as parallel-operator directed acyclic graphs (PO-DAGs) on HCSs. They identify four criteria, viz., suitability, locality, availability, and criticality, which are im-portant to consider while performing workload division for achieving good performance.
    - - MapCG that facilitates portability between a CPU and a GPU at the level of source code. Without requiring modification,
        a program can be compiled and executed on either a CPU or GPU using a MapReduce programming model. Use of OpenCL enables using a single kernel code version for both PUs in place of providing separate kernel versions for them.
    - - framework for HC in the context of OpenMP adapted
        to Java, called ClusterJaMP. They propose an array package that provides replicated and partitioned arrays, using which a parallel-for loop can be distributed over PUs. At the beginning of a program, they execute a microbenchmark and a few bandwidth tests, and use this information to estimate relative performance of PUs. Using this, their technique dynamically changes the number and type (GPU or CPU) of PUs to achieve an optimal communication/computation ratio for achieving the best possible performance.
    - - Harmony, which utilizes performance esti-mates to schedule applications in an HCS. They propose online monitoring of kernels and describe a dependence-driven scheduling that analyzes how applications share data and decides on PU selection based on which applications can run without block-ing.
    - - runtime system for automatically dividing an Ac-celerated OpenMP region across different PUs. The runtime also handles ensuring data transfer to PUs for input and output. Accelerated OpenMP usesheteroclause as a compiler directive, using which a programmer can control whether heterogeneous
        computing is used; and if so, how many loop iterations are assigned to a CPU. They propose enhancing this directive by also providing information on the type of scheduler (static or dynamic), ratio of performance of PUs, and a factor termed asdiv
    - - MATE-CG for accelerating MapReduce applications on parallel heterogeneous environments. Apart from allowing CPU-only and GPU-only execution, MATE-CG runtime also supports dividing the work
        between a CPU and a GPU. The amount of data to be processed by the CPU and the GPU is decided by a partitioning parameter, which can be decided at runtime using an autotuning approach based on the iterative characteristic of data-intensive applications. By collecting profiling data over the first few phases, the value of this parameter can be found with small overhead.
  - - - Dynamic or static scheduling
      - Basis of workload partitioning