Talk: Serendipity: How Supercomputing Technology is Enabling a Revolution in Artificial Intelligence

José E. Moreira

Thomas J. Watson Research Center, IBM Research

With the availability of both large compute power and large data sets, we have witnessed a revolution in machine learning technology, which has become a mainstream tool for both business and scientific applications. This revolution is likely to accelerate, as even more compute power is brought to bear, and deliver many of the promises of artificial intelligence. In this talk we will investigate how far the impacts of machine learning can go. We will cover the new Summit supercomputer, which brings unprecedented compute capabilities to both traditional high performance computing and artificial intelligence problems, analyzing the similarities as well as the differences in those two fields. We will also speculate about the future of machine learning and, in particular, its possible limitations. We will conclude with a discussion of one of the most important scientific questions of our time: Is consciousness computable?

Short Bio: José E. Moreira is a Distinguished Research Staff Member in the Scalable Systems Department at the Thomas J. Watson Research Center. He received a B.S. degree in physics and B.S. and M.S. degrees in electrical engineering from the University of Sao Paulo, Brazil, in 1987, 1988 and 1990, respectively. He also received a Ph.D. degree in electrical engineering from the University of Illinois at Urbana-Champaign in 1995. Since joining IBM at the Thomas J. Watson Research Center, he has worked on a variety of high-performance computing projects. He was system software architect for the Blue Gene/L supercomputer and chief architect of the Commercial Scale Out project. He currently leads the IBM Research work on the architecture of Power processor. He is an author or coauthor of over 100 technical papers and 10 patents. Dr. Moreira is a member of the IEEE (Institute of Electrical and Electronics Engineers) and a Distinguished Scientist of the ACM (Association for Computing Machinery).

Talk: Scheduling Matters

Yves Robert

ENS Lyon, France and Univ. Tenn. Knoxville, USA

https://graal.ens-lyon.fr/~yrobert/

This talk will review a few scheduling algorithms to solve simple computational problems on large-scale platforms. Faults, energy/power shortage, I/O contention, the constraints are numerous and challenging. The talk will provide a few answers and discuss open research directions.

Short Bio: Yves Robert received the PhD degree from Institut National Polytechnique de Grenoble. He is currently a full professor in the Computer Science Laboratory LIP at ENS Lyon. He is the author of 7 books, 150 papers published in international journals, and 240 papers published in international conferences. He is the editor of 11 book proceedings and 13 journal special issues. He has advised 30 PhD students. His main research interests are scheduling techniques and resilient algorithms for large-scale platforms. Yves Robert served on many editorial boards, including IEEE TPDS, JPDC and ACM TOPC. He is a Fellow of the IEEE. He was elected a Senior Member of Institut Universitaire de France in 2007 and renewed in 2012. He was awarded the 2014 IEEE TCSC Award for Excellence in Scalable Computing, and the 2016 IEEE TCPP Outstanding Service Award. He holds a Visiting Scientist position at the University of Tennessee Knoxville since 2011.

Talk: Extreme-Scale Earthquake Simulation on Sunway TaihuLight

Haohuan Fu

Tsinghua University, China

http://thuhpgc.org/index.php/Haohuan_Fu

This talk would first introduce and discuss the design philosophy about the Sunway TaihuLight system, and then describe our recent efforts on performing earthquake simulations on such a large-scale system. Our work in 2017 accomplished a complete redesign of AWP-ODC for Sunway architectures, achieves over 15% of the system's peak, better than the 11.8% achieved by a similar software running on Titan, whose byte to flop ratio is 5 times better than TaihuLight. The extreme cases demonstrate a sustained performance of over 18.9 Pflops, enabling the simulation of Tangshan earthquake as an 18-Hz scenario with an 8-meter resolution. Our recent work further improves the simulation framework with capabilities to describe complex surface topography, and to drive building damage prediction and landslide simulation, which are demonstrated with a case study of the Wenchuan earthquake with accurate surface topography and improved coda wave effects.

Short Bio: Haohuan Fu is the deputy director of the National Supercomputing Center in Wuxi, leading the research and development division. He is also an associate professor in the Ministry of Education Key Laboratory for Earth System Modeling, and Department of Earth System Science in Tsinghua University, where he leads the research group of High Performance Geo-Computing (HPGC). Fu has a PhD in computing from Imperial College London. Since joining Tsinghua in 2011, Dr. Fu has been working towards the goal of providing both the most efficient simulation platforms and the most intelligent data management and analysis platforms for geoscience applications. His research has, for example, led to efficient designs of atmospheric dynamic solvers for both Tianhe-1A, Tianhe-2, Sunway TaihuLight supercomputers, and the reconfigurable computing platforms. The work based on the Sunway TaihuLight supercomputer manages to scale a fully-implicit solver to over 10 million cores, which won the Gordon Bell Prize of SC16.

Talk: Big Data at Extreme-Scales: Addressing Computational Challenges in the 21st Century

Manish Parashar

Rutgers University, USA

http://parashar.rutgers.edu/

Data-related challenges are quickly dominating computational and data-enabled sciences and are limiting the potential impact of scientific application workflows enabled by current and emerging extreme scale, high-performance, distributed computing environments. These data-intensive application workflows involve dynamic coordination, interactions and data coupling between multiple application processes that run at scale on different resources, and with services for monitoring, analysis and visualization and archiving, and present challenges due to increasing data volumes and complex data-coupling patterns, system energy constraints, increasing failure rates, etc. In this talk I will explore some of these challenges and investigate how solutions based on data sharing abstractions, managed data pipelines, data-staging service, and in-situ / in-transit data placement and processing can be used to help address them. This research is part of the DataSpaces project at the Rutgers Discovery Informatics Institute.

Short Bio: Manish Parashar is Distinguished Professor of Computer Science at Rutgers University. He is also the founding Director of the Rutgers Discovery Informatics Institute (RDI2). His research interests are in the broad areas of Parallel and Distributed Computing and Computational and Data-Enabled Science and Engineering. Manish is the founding chair of the IEEE Technical Consortium on High Performance Computing (TCHPC), Editor-in-Chief of the IEEE Transactions on Parallel and Distributed Systems, and serves on the editorial boards and organizing committees of a large number of journals and international conferences and workshops. He has over 350 publications, has deployed several software systems that are widely used, and has received a number of awards for his research and leadership. Manish is Fellow of AAAS, Fellow of IEEE/IEEE Computer Society and ACM Distinguished Scientist.

Keynotes