Parallel Computing: Multicores, Playstation 3, Reconfigurable Hardware - Oliver Sinnen Department of Electrical and Computer Engineering
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Parallel Computing: Multicores, Playstation 3, Reconfigurable Hardware Oliver Sinnen Department of Electrical and Computer Engineering
Parallel computing – motivation Parallel computing: ● Multiple processing elements collaborate in the execution of one task or problem Motivation: ● Higher performance than single processor systems
Using parallel systems
● “OK, what is the problem?
Just let's use several
processors for one task!”
● Unfortunately not trivial
– Divide task/application into
sub-tasks
● How many? What size?
– Distribute them across
processing elements
● How?
– Execute them
● In which order? How can they
communicate?Content ● Current system trends ● Background on Parallel Computing Research @ ECE ● Task Scheduling ● Reconfigurable computing ● OpenMP and tasks ● Visualisation tools ● Object Oriented parallelisation
Current system trends
Parallel systems so far
Until now
● (Very) large systems
– IBM Blue Gene/L with more
than 100,000 processors
● Mid-sized shared memory
parallel systems
Blue Gene/L
– dozens of processors, shared world's fastest computer 2004-8
memory
● Clusters of PCs
– Independent PCs connected in
network – “low cost”Current system trends
Parallel computing – why now?
● Parallel computing has been around for decades
● Processor technology reaches physical limit – clock
frequency has not significantly improved in last years!
Multicores everywhere
● Multiple processors on one chip
● Current processors have two or
more cores
– Intel Core Duo, AMD
Opteron/Phenom, IBM Power6,
Sony Cell, Sun Niagra T1/T2 die of AMD 'Barcelona' – 1st “real” x86
● x86 8-cores (!) next year
quad core, launched 10.9.2007
● More cores soonCurrent system trends
Gaming consoles
Gaming consoles go parallel as well
● XBOX360: 3-core PowerPC
● Playstation 3
– Processor: Cell Broadband Engine
● 3.2 GHz
● PowerPC architecture
● Plus 8 Synergistic Processing Elements
(SPE)
– Main memory: 256 MB
– Runs Linux UbuntuCurrent system trends Cell ● Die photo
Current system trends
Cell block diagram
● PPE: power processor
element (PowerPC
instruction set)
● SPE: synergistic processor
element
– DMA: direct memory
access
– LS: local store memory
– SXU: execution unitCurrent system trends Playstation 3 cluster @ ECE ● 8 Playstation 3 ● Connected through Ethernet ● Used in 2007 to teach SoftEng710 ● Currently used in part 4 project: “Implementing the FDTD Method on the IBM Cell Microprocessor to Model Indoor Wireless Propagation”, supervisor Dr Neve
Current system trends
Special purpose hardware
● Using special purpose hardware for acceleration
– Not new, e.g. graphics cards
– But, GPUs (Graphics Processing Units) now used for other
computation
● E.g. NVIDIA Compute Unified Device Architecture (CUDA)
● Co-processors
– ClearSpeed – PCIexpress board with two special processors
(CSX600) accelerating scientific applications
● Acceleration technologies/co-processors supported by
processor manufacturer
– e.g. AMD Torrenz initiative
● Highest flexibility: reconfigurable hardware
– Reconfigurable acceleration devices on FPGA basisCurrent system trends
Reconfigurable hardware @ ECE
● 2 XD1000 development systems
● Normal PCs with two processor sockets
– one used by processor
– one used by FGPA (!)
● CPU: AMD Opteron 248 @ 2.2 GHz
● FPGA: Altera Startix II
● RAM: 4GB (CPU), 4GB (FPGA)
● OS: Linux FedoraCurrent system trends
My research
● New hardware => new forms of parallelism
Research focus
● Fundamental problems of parallel computing
– task scheduling
● Visualisation tools for parallelisation process
New forms of concurrency exploitation
● Desktop parallelisation
● Reconfigurable ComputingBackground
Background
Challenges of parallel programming
Sequential
programmingBackground Challenges of parallel programming
Background Parallelisation example program/task d = a2+a+1
Background
Parallelisation example
sub-tasks
program/task A: a = 1
B: b = a+1
d = a2+a+1
decomposition C: c = a*a
D: d = b+cBackground
Parallelisation example
sub-tasks
program/task A: a = 1
2 B: b = a+1 dependence
d = a +a+1 analysis
decomposition C: c = a*a
D: d = b+cBackground Parallelisation example sub-tasks A: a = 1 B: b = a+1 C: c = a*a D: d = b+c
Background
Parallelisation example
sub-tasks
A: a = 1
B: b = a+1
C: c = a*a
D: d = b+c
scheduling on 2
processors (P1, P2)Background
Parallelisation example
pragma omp parallel tasks
{
#pragma omp task A 2 {
for (i=0; iResearch
Dependence visualisation
Dependence visualisation
Background: Types of data dependence
● Different types of dependence:
– flow dependence – read after write
● e.g. line 2 reads a, written in line 1
– antidependence – write after read
● e.g. line 4 writes to v after line 3 reads it
– output dependence – write after write
● e.g. lines 2 and 4 write to vDependence visualisation
Background: Dependences in loops
for(i = 2; i usually have high computational loadDependence visualisation Eclipse Plugin for Java
Dependence visualisation
Eclipse Plugin for Java
● On-the-fly dependence
analysis
– Java parser
– Accurate dependence tests
● Visualisation of
dependences of loop
enclosing cursor
● All data dependence types,
different colours
● Interaction with graph and
code
● Future: support
dependence
elimination/transformsTask scheduling General
Task Scheduling
Example
Example:
2 processors
+
ex. ex.Task Scheduling
Scheduling constraints
start time: ts(n) ; finish time: tf(n), processor assignment:
proc(n)
Constraints:
● Processor constraint:
proc(ni)=proc(nj) ⇒ ts(ni) ≥ tf(nj) or ts(nj) ≥ tf(ni)
● Precedence constraint:
for all edges eji of E (from nj to ni)
proc(ni) ≠ proc(nj) ⇒ ts(ni) ≥ tf(nj) + c(eji)
proc(ni) = proc(nj) ⇒ ts(ni) ≥ tf(nj)
● i.e. local communication is considered to be cost
freeTask Scheduling Task Scheduling problem Given a task graph G=(V,E,w,c) and p processors Objective Find valid schedule for the |V| tasks on the p processors that has the earliest finish time ● Valid schedule: schedule that adheres to dependence constraints of graph ● NP-hard problem, means exponential runtime ⇒ Approximation algorithms ● List scheduling ● Clustering ● Duplication scheduling ● Genetic algorithms
Task Scheduling Task scheduling book O. Sinnen “Task Scheduling for Parallel Systems” John Wiley, 2007 ● Parallel computing intro ● Graph models ● Task scheduling fundamentals ● Algorithms ● Advanced task scheduling ● Heterogeneous systems ● Realistic parallel system models
Task scheduling Communication contention
Task Scheduling
Classic system model
system Properties:
model ● Dedicated system
● Dedicated processors
● Zero-cost local
communication
● Communication
subsystem
e.g. 8 processors ● Concurrent
communication
● Fully connectedTask Scheduling
Communication contention
contention example ● End-point
contention
– For Interface
classic
model ● Most networks
not fully
connected
● NetworkTask Scheduling
Contention aware scheduling
● Target system represented as network graph
● Integration of edge scheduling into task
scheduling
without contention with contentionTask Scheduling
Contention and task duplication
● Duplicating tasks
– execute same task on more than one processor
● Especially beneficial with communication
contention
● Developed concept and algorithms
P1 P2 P1 P2
10 0 0
A A A A
10 10
2 B B B
20 20
10 time C C D
B 30 D 30
5 5
40 40
c D 50 50
10 10 Without duplication With duplicationTask scheduling Optimal algorithms
Task Scheduling
Using A*
Best first state space search algorithm
● State s represents partial solution to
problem
● New states created by expanding state s
with all possible choices
● Only most promising state is expanded at a
time – cost function f(s)
● Cost function f(s)
– underestimate of minimum cost f*(s) of
final solution – the tighter the better
● If f*(s) ≥ f(s) for any s (i.e. f(s) is admisable)
– then final solution is optimalTask Scheduling
A* for task scheduling
● State => partial schedule
● Cost function f(s) => underestimate of schedule length
● State is expanded by scheduling one more node
State treeTask Scheduling
A* for task scheduling
● Proposed significantly better cost function f(s)
● Proposed new pruning techniques
● Extensive experimental evaluation with
interesting and surprising results
Proposed processor normalisation
normaliseReconfigurable Computing
Reconfigurable computing ● Special purpose devices can lead to extremely high performance – Concurrency can be much higher Problem ● Programming or configuring significantly more difficult – than parallel programming (!) Idea (hope ?) ● Use high level languages and automatic tools
Reconfigurable computing
Java to Hardware Compilation
● High level language to hardware
– i.e. FPGAs
● Compilation to VHDL
– then synthesis VHDL, i.e. configure FPGA
Intention
● Acceleration of Java programs
● In combination with general purpose processorReconfigurable computing
Java to Hardware Compilation
● Project bridging Parallel Computing, Embedded
Systems, Software Engineering
Test platform
● XD1000 systems
● Offer tight integration of
CPU and FPGA
● Unfortunately
– more difficult to use than
expectedParallel Programming Tasks in OpenMP
Parallel Programming
Using OpenMP directives
OpenMP
● Open standard for shared-memory programming
● Compiler directives used with FORTRAN, C/C++,
Java
● Thread based
Examples (in C)
#pragma omp parallel for #pragma omp parallel sections
{
for (i=0; iParallel Programming
Tasks/Task directives
Introduction of new directives: tasks/task
● Like sections with finer granularity
● Dependences and computation weights can be specified
#pragma omp parallel tasks
{
#pragma omp task A 1 {
...
}
#pragma omp task B 2 dependsOn(A) {
...
}
...
}
Tasks/task are transformed into sections/section with
the aid of task schedulingParallel Programming
JompX
Source-To-Source compiler
● Java/OpenMP+task directives => Java/OpenMP
//omp parallel tasks
P1 P2 boolean taskADone = false;
{ 2 0 boolean taskDDone = false;
boolean taskCDone = false;
// omp task A 2 A
{ boolean taskBDone = false;
A boolean taskFDone = false;
Block_Code _A
} //omp parallel sections
// omp task B 4 dependsOn (A) D {
B //omp section
{
Block_Code _B 5 {
} 4 2 3 Block_Code_A
C taskADone = true;
// omp task C 2 dependsOn (A)
{ Block _Code_D
B C D taskDDone = true;
Block_Code _C
} Task Code Block _Code_C
Parsing taskCDone = true;
// omp task D 3 dependsOn (A)
{
Scheduling 10 E
Generation while (!taskBDone ){}
Block_Code _D F Block _Code_E
6 7 while (!taskBDone ){}
}
// omp task E 6 dependsOn (B) while (!taskFDone ){}
{ Block _Code_G
E F
Block_Code _E }
} //omp section
// omp task F 7 dependsOn (C,D) 15 {
{ while (!taskADone ){}
Block_Code _F G Block_Code_B
} 5 taskBDone = true;
// omp task G 5 dependsOn (B,E,F) while (!taskCDone ){}
while (!taskDDone ){}
{ G
Block_Code _G Block _Code_F
} taskFDone = true;
} }
}
Code with tasks directives Tasks Graph representation Schedule of the tasks graph Codes with sections directivesParallel Programming
Task Graph visualisation in Eclipse IDE
Left:
Annotated Java Code
Right:
Visualisation of
dependence structureObject Oriented Parallelisation
Object Oriented Parallelisation
Parallel Iterator
● Desktop programs must be parallelised
– otherwise no speedup from modern processors !
● Most programs are Object Oriented (OO)
● Lion's share of computational load in loops
● Iterators used in OO loops
=> Parallel version of iteratorsfile:///windows/D/_other/support/
Object Oriented Parallelisation
openoffice/openclipart-0.18-full/
clipart/computer/icons/etiquette-
theme/mimetypes/gnome-mime-i
mage.png
0
Parallel Iterator file:///windows/D/_other/support/
openoffice/openclipart-0.18-full/
clipart/computer/icons/etiquette-
theme/mimetypes/gnome-mime-i
mage.png
file:///windows/D/_other/support/
openoffice/openclipart-0.18-full/
clipart/computer/icons/etiquette-
theme/mimetypes/gnome-mime-i
mage.png
file:///windows/D/_other/support/
openoffice/openclipart-0.18-full/
clipart/computer/icons/etiquette-
theme/mimetypes/gnome-mime-i
mage.png
file:///windows/D/_other/support/
openoffice/openclipart-0.18-full/
clipart/computer/icons/etiquette-
theme/mimetypes/gnome-mime-i
mage.png
1 2 3 4
file:///windows/D/_other/support/ file:///windows/D/_other/support/ file:///windows/D/_other/support/ file:///windows/D/_other/support/ file:///windows/D/_other/support/ file:///windows/D/_other/support/ file:///windows/D/_other/support/ file:///windows/D/_other/support/
file:///windows/D/_other/support/ file:///windows/D/_other/support/ file:///windows/D/_other/support/ file:///windows/D/_other/support/ file:///windows/D/_other/support/ file:///windows/D/_other/support/ file:///windows/D/_other/support/ file:///windows/D/_other/support/ openoffice/openclipart-0.18-full/ openoffice/openclipart-0.18-full/ openoffice/openclipart-0.18-full/ openoffice/openclipart-0.18-full/ openoffice/openclipart-0.18-full/ openoffice/openclipart-0.18-full/ openoffice/openclipart-0.18-full/ openoffice/openclipart-0.18-full/
openoffice/openclipart-0.18-full/ openoffice/openclipart-0.18-full/ openoffice/openclipart-0.18-full/ openoffice/openclipart-0.18-full/ openoffice/openclipart-0.18-full/ openoffice/openclipart-0.18-full/ openoffice/openclipart-0.18-full/ openoffice/openclipart-0.18-full/ clipart/computer/icons/etiquette- clipart/computer/icons/etiquette- clipart/computer/icons/etiquette- clipart/computer/icons/etiquette- clipart/computer/icons/etiquette- clipart/computer/icons/etiquette- clipart/computer/icons/etiquette- clipart/computer/icons/etiquette- file:///windows/D/_other/support/ file:///windows/D/_other/support/ file:///windows/D/_other/support/
clipart/computer/icons/etiquette- clipart/computer/icons/etiquette- clipart/computer/icons/etiquette- clipart/computer/icons/etiquette- clipart/computer/icons/etiquette- clipart/computer/icons/etiquette- clipart/computer/icons/etiquette- clipart/computer/icons/etiquette- theme/mimetypes/gnome-mime-i theme/mimetypes/gnome-mime-i theme/mimetypes/gnome-mime-i theme/mimetypes/gnome-mime-i theme/mimetypes/gnome-mime-i theme/mimetypes/gnome-mime-i theme/mimetypes/gnome-mime-i theme/mimetypes/gnome-mime-i openoffice/openclipart-0.18-full/ openoffice/openclipart-0.18-full/ openoffice/openclipart-0.18-full/
theme/mimetypes/gnome-mime-i theme/mimetypes/gnome-mime-i theme/mimetypes/gnome-mime-i theme/mimetypes/gnome-mime-i theme/mimetypes/gnome-mime-i theme/mimetypes/gnome-mime-i theme/mimetypes/gnome-mime-i theme/mimetypes/gnome-mime-i mage.png mage.png mage.png mage.png mage.png mage.png mage.png mage.png clipart/computer/icons/etiquette- clipart/computer/icons/etiquette- clipart/computer/icons/etiquette-
mage.png mage.png mage.png mage.png mage.png mage.png mage.png mage.png theme/mimetypes/gnome-mime-i theme/mimetypes/gnome-mime-i theme/mimetypes/gnome-mime-i
mage.png mage.png mage.png
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 5 6 7
? ? ?
Iterator it;
Iterator it = collection.iterator();
while (it.hasNext()) {
Element e = it.next();
computeElement(e);
}Object Oriented Parallelisation
Parallel Iterator Problem
hasNext() hasNext()Object Oriented Parallelisation
Parallel Iterator Problem
hasNext() next() hasNext()Object Oriented Parallelisation
Parallel Iterator Problem
hasNext() next() hasNext() next()Object Oriented Parallelisation
Parallel Iterator
Collection collection = ...;
Iterator it = collection.iterator();
ParIterator it = ParIterator.create(collection);
// each thread does this
while (it.hasNext()) {
Image image = it.next();
resize( image );
}Conclusions
Parallel and Reconfigurable Computing Lab
http://www.ece.auckland.ac.nz/~sinnen/lab.html
Research in
● Fundamental problems of parallel computing
– task scheduling
● Visualisation tools for parallelisation process
New forms of concurrency exploitation
● Reconfigurable Computing
● Desktop parallelisation
– Object Oriented ParallelisationConclusions And most importantly: ● There is summer in Europe and I am going there in one month on Research & Study leave!
You can also read