Fourth International Workshop on
Practical Aspects of High-level Parallel Programming (PAPP 2007)
part of
The International Conference on Computational Science
May 27-30, 2007, University of Beijing, China
 
Accepted papers

[1] I. Jo, H. Han, H.Y. Yeom, and O. Kwon. Universal Execution of Parallel Processes: Penetrating NATs over the Grid. International Conference on Computational Science (ICCS 2007), LNCS. Springer, 2007.

Today, clusters of computers are very important hardware resources for executing parallel processes. Many research institutes manage their clusters on private networks due to easy administration. It is very hard to run large-scale parallel programs without clusters on private networks. Thus, many parallel programming systems utilize a user-level proxy or gateway to exchange messages between clusters on private networks. We adopt a novel kernel-level method to overcome major drawbacks of user-level methods such as big overheads of protocol processing and context switching. We also apply our method to MPICH-G2 which is a very popular MPI implementation. Our experimental results show that the kernel-level method processes messages very e±ciently and this results in shorter execution time of parallel processes.
[2] C. Dittamo, A. Cisternino, and M. Danelutto. Parallelization of C# programs through annotations. International Conference on Computational Science (ICCS 2007), LNCS. Springer, 2007.

In this paper we discuss how extensible meta-data featured by virtual machines, such as JVM and CLR, can be used to specify the parallelization aspect of annotated programs. Our study focuses on annotated CLR programs written using a variant of C#; we developed a meta-program that processes these sequential programs in their binary form and generates optimized parallel code. We illustrate the techniques used in the implementation of our tool and provide some experimental results that validate the approach.
[3] T. Gautier, J.L. Roch, and F. Wagner. Fine Grain Distributed Implementation of a Dataflow Language with Provable Performances. International Conference on Computational Science (ICCS 2007), LNCS. Springer, 2007.

Efficient execution of multithreaded iterative numerical computations requires to carefully take into account data dependencies. This paper presents an original way to express and schedule general dataflow multithreaded computations. We propose a distributed dataflow stack implementation which efficiently supports work stealing and achieves provable performances on heterogeneous grids. It exhibits properties such as non-blocking local stack accesses and generation at runtime of optimized one-sided data communications.
[4] K. Kakehi, K. Matsuzaki, and K. Emoto. Efficient Parallel Tree Reductions on Distributed Memory Environments. International Conference on Computational Science (ICCS 2007), LNCS. Springer, 2007.

A new approach for fast parallel reductions on trees over distributed memory environments is proposed. Our approach is to use serialized trees as its data representation, and develops an efficient algorithm which is free from the depth factors. The prototype implementation supports the real efficacy.
[5] K. Matsuzaki. Efficient Implementation of Tree Accumulations on Distributed-Memory Parallel Computers. International Conference on Computational Science (ICCS 2007), LNCS. Springer, 2007.

In this paper, we develop an efficient implementation of two tree accumulations. In this implementation, we divide a binary tree based on the idea of m-bridges to obtain high locality, and represent local segments as serialized arrays to obtain high sequential performance. We furthermore develop a cost model for our implementation. The experiment results show good performance.
[6] A. Al Zain, K. Hammond, P. Trinder, S. Linton, H.W. Loidl, and M. Costanti. SymGrid-Par: Designing a Framework for Executing Computational Algebra Systems on Computational Grids. International Conference on Computational Science (ICCS 2007), LNCS. Springer, 2007.

SymGrid-Par is a new framework for executing large computer algebra problems on computational Grids. We present the design of SymGrid-Par, which supports multiple computer algebra packages, and hence provides the novel possibility of composing a system using components from different packages. Orchestration of the components on the Grid is provided by a Grid-enabled parallel Haskell (GpH). We present a prototype implementation of a core component of SymGrid-Par, together with promising measurements of two programs on a modest Grid to demonstrate the feasibility of our approach.