Summer 2012 Week 4: Addressing Failures in Exascale Computing
August 4, 2012 to August 11, 2012
Canyons Resort, 4000 Canyons Resort Drive, Park City, Utah
- Marc Snir (Argonne National Laboratory/University of Chicago)
- Robert Wisniewski (**Workshop conceived and developed by Robert Wisniewski while he was at IBM Research in conjunction with Marc Snir. Robert Wisniewski is currently at Intel Corporation. Workshop will be organized and led by Marc Snir in Park City.)
An exascale system will have far smaller and many more transistors than exist in current systems, resulting in more frequent hardware errors. This will be exacerbated by the use of low voltage gaps that are needed to reduce energy consumption. Additional software errors may occur as application codes become more complex as well as the software layers mapping applications to hardware become more complex. The current global checkpoint approach to error recovery assumes that (1) all errors that corrupt the computation state are detected soon after they occur (before the next checkpoint) and (2) the meantime between unrecovered errors is on the order of days. These assumptions are unlikely to hold in the exascale era.
It is widely believed that a solution to resilience requires a coordinated effort by the designers of exascale hardware, exascale middleware, and possibly exascale application codes. A fundamental rethinking of the structure of large computation systems, and possibly of the meaning of "computation" may be required.
The complexity of the issues and the relative lack of communication between these communities justify a longer, focused workshop. The aim of this workshop is to bring together these communities to develop a common understanding of the issues and the possible solutions, and to lay the basis for a coordinated research plan for handling resilience in the exascale era and beyond.
In this workshop we will establish a common taxonomy to talk about errors and failures -- including a close examination of the notion of "successful computation". We shall examine the current state of the technology and the current approaches to resilience. We will then examine potential solutions from both a hardware and software perspective, focusing on a combined approach.
Our goal would be to draft a white paper based on the themes and the discussions at the workshop.
- Sunday, the first day of the workshop will focus on setting group goals and defining a taxonomy so, please try to arrive by Saturday evening.
- The next day will focus on presentations from experts in the fields about expected failures.
- Tuesday and Wednesday will focus on design, ideas, and cross layer interaction to address the challenges.
- On Friday we will integrate and pull together the ideas.
- The latter part of the workshop, Friday afternoon and Saturday morning we will be focused on drafting.
On-site ICiS staff member: TBD
Please plan to arrive on Saturday, August 4. The program will begin on Sunday morning, August 5, with a kick-off event and close on Saturday, August 11, by noon. Please check back for the agenda.
***Additional materials will also be posted at the internal svn repository:
The Canyons resort is a fully staffed, full service resort, which provides complimentary high-speed wireless Internet in all guest rooms, public areas, and function spaces. Other amenities include: use of lodge fitness center, heated outdoor pool and hot tub, and free underground heated parking.
ICiS sessions and lodging will be in the Silverado Lodge:
There are several restaurants located within walking distance: http://www.canyonsresort.com/dining.html
If you would like information on restaurants in Park City area, the front desk will have a more complete list at check-in.
Summer Activities at the Canyons
Summer Camps and Child Care
Summer camps for children 6 to 12 years of age Monday – Friday are available at the Canyons. Child Care for children 6 weeks to 6 years is also available Monday – Friday. Canyons Little Adventures Childcare Center is a state-licensed childcare facility. Visit the link below for more information on the programs offered.
Shuttle Service (All Resort Express Airport Shuttle)
All Resort Shuttle departs from Salt Lake City Airport every 30-40 min. The vans seats up to nine passengers, and may make up to four stops at various destinations.
To be more ECO friendly ICiS has negotiated an all-inclusive $43 rate for travel from and to the Silverado Lodge. All are encouraged to use this service. If you have other parties/family traveling with you, you will need to contact All-Resort to let them know. Please reference group # 8070 if you are making your reservation.
You must make these reservations at least 48 hours prior to your arrival. To make your reservations please use one of the following options:
Office: 435-649-3999 EXT 2 Fax: 435-649-3549 Web Portal: ICiS 2012 Special
MISSED ORIGINATION/CONNECTING FLIGHTS:
In the event of weather delays or missed connections, please call 800-457-9457 so that we may reschedule accordingly.
Individual reservation cancellations must be made 24 hours prior to your original pickup time and are subject to a 20% booking fee. Individual reservation cancellations received within 24 hours of your reservation are non-refundable. Event Shuttle cancellations must be made at least 72 hours prior to the start of the event. Shuttles cancelled within 72 hours of the start time will be charged the full amount contracted.
The following car rental companies have counters in Salt Lake City International Airport: Advantage, Alamo, Avis, Budget, Dollar, Enterprise, National, and Thrifty. Make your reservations via the company website or via a travel engine like Expedia.com, Travelocity.com, or Orbitz.com.
Since ICiS has a contract with All Resort Express Airport Shuttle to transport our attendees, rental vehicles may NOT be covered as part of your reimbursement, please check with email@example.com if you plan to rent a vehicle.
Please note: Park City has a free local transit service.