Please enable JavaScript.
Coggle requires JavaScript to display documents.
Parallel Computing in bioinformatics - Coggle Diagram
Parallel Computing in bioinformatics
Can help to get answers in real-time for point of care testing
Types of parallelism
Cluster
Using multiple cores (across computers)
ad hoc
- bunch of PCs over ethernet
Cluster specific
- high density and fast interconnect. E.g. spartan
Highly specialized
- high density, low power, low latency and very fast interconnect. E.g. IBM BlueGene
Break tasks into subtasks -
either dependent or independent (embarrasingly parallel)
Symmetric Multiple Processing (
SMP
)
Using multiple cores on same computer (threads)
Tools that use multithreading -
BWA, bowtie2, samtools
To use SMP - standard platforms =
POSIX threads, OpenMP, Unix Shell
In Unix,
piping (|)
does things parallelly implicitly (all processes separated by pipes are started simultaneously on different cores)
Sub-shells (<)
- instead of uncompressing first and feeding the whole thing as output, it can be uncompressed and fed out on the fly - thus saving storage memory
Sometimes threads need to talk = Inter-Process Communication (IPC)
time stamped files
pipes, sockets and message passing
shared memory
signals
Dedicated pipeline system -
Make, Snakemake, Nextflow, Cromwell
GNU parallel, makefile
For using - lecture 21, slide 22-24,27,32,36
Single Instruction, Multiple Data (
SIMD
)
one instruction operating simultaneously on different data - vectorizing code -
parallelism within a core
GPUs
can do
MIMD
- different operations on different parts of a vector array
Libraries -
Numpy, GSL, BLAS
Tools -
HMMER
(sequence alignment),
FASTA 35+, SWIFT
(full local, global, semi alignment),
BWA, bowtie
(short read alignment)