Adaptive Lightweight Scheduling for Improving Bulk-Synchronous MPI Application Scalability on Multi-core Clusters
Vivek Kale
02 February 2012, 10:30 - 02 February 2012, 11:30 Salle/Bat : 455/PCRI-N
Contact :
Activités de recherche :
Résumé :
Several prior studies have shown
that operating system (OS) services, which induce system noise,
can be a fundamental and significant
problem in scaling bulk-synchronous MPI applications
to a very large number of nodes of a multi-core cluster.
One solution for mitigating noise impact on
bulk-synchronous MPI applications is to turn off certain OS
services on the machine. However, this is typically infeasible
because these system services may be essential, or even
required, for some applications. Furthermore, the end user
typically has no direct control over these system services.
Thus, we need an application-level solution.
Building upon previous work that demonstrated the utility
of lightweight scheduling, we discuss the advanced
techniques of adaptive lightweight scheduling, and provide insights
of our advanced scheduler through experimentation on two different
high-performance clusters with very different noise signatures.
By doing this, we show larger performance gains with our adaptive
lightweight scheduler, as compared to our original lightweight
scheduler.