Parallel Computing in bioinformatics
Can help to get answers in real-time for point of care testing
Types of parallelism
Cluster
Symmetric Multiple Processing (SMP)
Single Instruction, Multiple Data (SIMD)
one instruction operating simultaneously on different data - vectorizing code - parallelism within a core
Using multiple cores on same computer (threads)
Using multiple cores (across computers)
ad hoc - bunch of PCs over ethernet
Cluster specific - high density and fast interconnect. E.g. spartan
Highly specialized - high density, low power, low latency and very fast interconnect. E.g. IBM BlueGene
Break tasks into subtasks - either dependent or independent (embarrasingly parallel)
Tools that use multithreading - BWA, bowtie2, samtools
To use SMP - standard platforms = POSIX threads, OpenMP, Unix Shell
Sometimes threads need to talk = Inter-Process Communication (IPC)
time stamped files
pipes, sockets and message passing
shared memory
signals
GPUs can do MIMD - different operations on different parts of a vector array
Libraries - Numpy, GSL, BLAS
Tools - HMMER (sequence alignment), FASTA 35+, SWIFT (full local, global, semi alignment), BWA, bowtie (short read alignment)
For using - lecture 21, slide 22-24,27,32,36
Dedicated pipeline system - Make, Snakemake, Nextflow, Cromwell
In Unix, piping (|) does things parallelly implicitly (all processes separated by pipes are started simultaneously on different cores)
Sub-shells (<) - instead of uncompressing first and feeding the whole thing as output, it can be uncompressed and fed out on the fly - thus saving storage memory
GNU parallel, makefile