Supercomputing 2011 Workshop
Support for Experimental Computer Science
Kate Keahey, Frédéric Desprez
- “Experimental Infrastructure for FutureGrid”, Warren Smith, Texas Advanced Computing Center (slides)
- “Reproducible Environment Creation”, Kate Keahey, Argonne National Laboratory (slides)
- “Towards better tools for experiments on distributed systems”, Lucas Nussbaum, University of Nancy (slides)
- “Taktuk: a versatile tool for application deployments on large and complex computing infrastructures”, Pierre Neyron, CNRS (slides)
- "Héméra: Scientific Challenges using Grid’5000", Christian Perez, INRIA (slides)
- "Experimental Infrastructure Management with Pegasus", Jens-Soenke Voekler, ISI (slides)
The increasing complexity of available infrastructures with specific features (caches, hyper- threading, dual core, etc.) or with complex architectures (hierarchical, parallel, distributed, etc.) makes it extremely difficult to build analytical models that allow for a satisfying prediction. Hence, it raises the question on how to validate algorithms if a realistic analytic analysis is not possible any longer. As for some many other sciences, the one answer is experimental validation. Nevertheless, experimentation in Computer Science is a difficult subject that today still opens more questions than it solves: What may an experiment validate? What is a “good experiment”? How to build an experimental environment that allows for ”good experiments”? etc. In this talk we will provide some hints on this subject and show how some tools can help in performing “good experiments”, mainly in the context of parallel and distributed computing. More precisely we will focus on four main experimental methodologies, namely in-situ (real-scale) experiments (with an emphasis on PlanetLab and Grid’5000), Emulation (with an emphasis on Wrekavoc) benchmarking and simulation (with an emphasis on SimGRID and GridSim). We will provide a comparison of these tools and methodologies from a quantitative but also qualitative point of view.
Emmanuel Jeannot is a senior research scientist at INRIA (Institut National de Recherche en Informatique et en Automatique) and he is doing his research at INRIA Bordeaux Sud-Ouest and at the LaBRI laboratory since Sept. 2009. Before that he held the same position at INRIA Nancy Grand-Est. From Jan. 2006 to Jul. 2006, he was a visiting researcher at the University of Tennessee, ICL laboratory. From Sept. 1999 to Sept. 2005, he was assistant professor at the Université Henry Poincaré, Nancy 1. During the period of 2000 to 2009, he did his research at the LORIA laboratory. He got his PhD and Master degree in computer science (resp., in 1996 and 1999) both from Ecole Normale Supérieur de Lyon, at the LIP laboratory. After his PhD, he spent one year as a postdoc at the LaBRI laboratory in Bordeaux. His main research interests are scheduling for heterogeneous environments and grids, data redistribution, algorithms and models for parallel machines, grid computing software, adaptive online compression and programming models.
nanoUB.org, developer of the open-source HUBzero platform, is a place for the nanotechnology research and education community to collaborate and was used by over 188,000 in the past 12 months. A powerful component of academic collaboration is the sharing of knowledge via publication such as journal papers, textbooks, and web sites. HUBzero-powered nanoHUB is pioneering the academic publishing of software and has published 231 programs as of today. "Never-published research" is an oxymoron to academia. However, much of the body of research results are published only indirectly and in forms difficult for other researchers and educators to use computationally, for example as an image that depicts a set of data values as a graph. This talk will provide thoughts about extending publication towards all digital results of research in their native forms.
Dr. George B. Adams III joined the nanoHUB.org project in 2007 as Deputy Director. He helped nanoHUB grow from 49,000 to today’s 188,000 users per year and spin out the open-source HUBzero® platform. He began his career in 1983 as one of the initial five staff members of the Research Institute for Advanced Computer Science at the NASA Ames Research Center, where his work focused on high-performance computing for scientific applications. Today, Adams in addition to his work with nanoHUB, he is directing the new startup ManufacturingHUB.org, working with the Council on Competitiveness and the Office of Science and Technology Policy to improve the competitiveness and innovative capacity of small- to medium-sized manufacturers.
FutureGrid supports experimental work in Cloud, Grid and HPC areas with distributed clusters. Last year there were 118 projects accepted with 56 addressing computer science research and 9 education and training with several full semester classes. Other projects were divided between interoperability, domain science and technology evaluation. We give examples of projects and describe activities supported with examples. We describe software work focusing initially on core infrastructure but now moving higher up the stack into for example cloud platforms.
Advancing the state of computer science experiments, both in terms of scale and of complexity, requires a large combination of software to meet various needs ranging from the verification of the testbed resources and the control of the nodes, to instrumentation and monitoring facilities, or the use of emulators to alter experimental conditions.
In this talk, we will attempt to provide a map of the existing building blocks, focusing on those that are used on Grid'5000. We will then describe the future challenges, and describe a path to a comprehensive software stack that will enable researchers to advance the state of their experiments.
Lucas Nussbaum is an assistant professor at University of Nancy since 2009. He is doing his research in the AlGorille team, which is a joint team of LORIA and INRIA Nancy - Grand Est. His main research focus is on experimentation in the context of research on distributed systems, with work on emulation and on real-scale experiments. He has been involved in the design of the Grid'5000 testbed since 2005.
The prevalence of commercial cloud computing offerings provides many researchers in computer science with fairly easy access to large computing infrastructures. Unfortunately, while these infrastructures can be employed in many fields of computer science, they are not suitable for exploring many issues in the area of cloud computing, particularly its system software aspects. For such issues, experimental evaluation often requires "bare metal" access to test beds, which should be of non-trivial size to enable exploration of scalability issues. However, developing suitable experimental platforms for exploring these problems is expensive both in terms of time and money for individual research teams. This talk will describe a shared infrastructure that amortizes the cost of developing and maintaining such platforms across multiple research teams. In particular, the talk will include details regarding the hardware and software architecture adopted, observations that resulted from the experience of managing the infrastructure, and several case studies of research conducted on the platform.
Michael Kozuch is a Principal Engineer for Intel Corporation; his interests include computer architecture and system software. Michael is currently managing Intel's participation in the Open Cirrus program and investigating data center management through a research effort called Tashi. Michael received a Ph.D. in electrical engineering from Princeton University in 1997, and a B.S. in electrical engineering from Penn State in 1992. He currently works out of the Intel Science and Technology Center for Cloud Computing at Carnegie Mellon University.
Reproducibility, the ability of an experiment and result to be accurately reproduced, is widely regarded as unsatisfying in HPC and more largely in distributed system studies.
In this talk, we try to identify some concrete issues which limit reproducibility. The examples come, mainly, from the use of Grid'5000 testbed since its beginning.
Finally, we state that to enhance reproducibility, it is necessary to address simultaneously issues at different levels and contexts: testbeds, tools and software, methodology, education, data experiment access and review process.
Olivier Richard is an assistant professor at University Joseph Fourier of Grenoble since 2000. He is doing his research in the Mescal team at Grenoble Informatics Laboratory (LIG). His research interests are focused on the design of resources and jobs management systems for computing infrastructures. He is also conducting studies on methods and tools to enhance experimental evaluation in this context. He participated at Grid'5000 project since its beginning.