Black Holes and Scientific Computing with Apple's New Mac Pro

Recently, Dr. Khanna and I were in a FaceTime session, and he was explaining how the 2013 Mac Pro in concert with OpenCL, has greatly assisted his scientific work. That conversation led to this interview which, we hope, provides substantial insights into using these Apple tools for scientific computing.

TMO: First, give us a little background on yourself. Who are you, and what do you do?

GK: I’m a computational scientist with research interests in the area of black hole physics and gravitation. I received my Ph.D. from Penn State back in 2000 and have been actively working in my field for around 15 years now. Currently, I’m an Associate Professor in the Physics Department at the University of Massachusetts Dartmouth. I also serve as the Associate Director of the newly established Center for Scientific Computing and Visualization Research (CSCVR) on my university campus.

TMO: Tell us more about your interest in black hole physics and also computational science in general.

GK: Well, even as a kid, I somehow developed a strong interest in gravity, probably due to my father who is also a physicist. It turns out that gravity, which is the most familiar force to all of us, is still perhaps the least understood force of Nature. Our current understanding of gravity was developed by Einstein nearly a century ago — in general relativity theory — which certainly has stood the test of time. However, to truly understand gravity and test our theories thoroughly, we must study phenomena wherein gravity is extremely strong. And that immediately brings us to black holes — perhaps the most intriguing and significant astrophysical objects in Nature — where gravity is infinitely powerful.

Now, Einstein’s theory of gravity and black holes are mathematically very complex, so one can only do rather limited calculations with pencil-and-paper. Sooner or later, one needs to take advantage of modern computational technologies to make significant advances. This realization is what led me to computational science and high-performance computing well over a decade ago and is also the reason why we developed the CSCVR on-campus.

TMO: What is supercomputing/parallel computing and why is it so important to scientific research today?

GK: Supercomputing is truly difficult to define. The reason is that with the rapid advances in computer technology, the bar for an entry-level supercomputer keeps getting higher every year. Indeed, the supercomputer of two decades ago, is less powerful than the iPhone today. So, I prefer a definition that a supercomputer is whatever yields performance that is much faster (say, by an order-of-magnitude or more) over a high-end workstation-class single-processor today, and involves a parallel computing model. I find this is a much more practical definition, and one that is more relevant to an application scientist.

Supercomputing is important to scientific research today because over the past decade computer simulation has joined experiment and theory as a third leg in almost every area of science and engineering research. Piggy backing on Moore’s Law, the capabilities of simulation have advanced very rapidly and continue to do so even today. Moreover, because today’s supercomputers are being built using commodity parts via the cluster approach, they are rather low-cost and fairly well standardized at this stage.

Next: How scientific computing has changed.

Page 2 – How Scientific Computing Has Changed

 

The giant “Titan” supercomputer at the Oak RIdge National Laboratory.
“18,688 CPUs paired with an equal number of GPUs to perform at a theoretical peak of 27 petaFLOPS”
Wikipedia.

TMO: How has scientific computing changed over the years? What are challenges that accompanied these changes? Conversely, what hasn't changed so much, and what are the most frustrating things in that regard?

GK: The main driver for change in scientific computing community is change in the computer hardware industry itself. Over the last decade, there has been a major shift in the design of computer processors, moving from single-core, high clock-frequency designs to multi-core, with rather modest clock-rates (a few GHz). This has to do with the so-called “power wall” that is a rather well-known fact now.

An extreme version of this approach can perhaps be seen in GPU computing wherein one has hundreds to thousands of slower, simpler cores (sub GHz typically) on a single GPU chip. The main challenge that this shift came with is dealing with a parallel programming model that accompanies multi- and many-core processors, that most software-developers, especially scientists, were unfamiliar with and that challenge continues even today.

One significant thing that has not changed over the years — partly due to the existence of large legacy code-bases and also simple human inertia — is the dominance of a chosen few programming languages. In particular, most scientific code is continuing to be developed in legacy programming languages like Fortran and C, and while these languages have adapted a bit to better suit the advances in computational technology over the years, there are likely several better alternatives. And, in my opinion, it is this issue that really limits how much computational science can benefit from the recent and future advances made in computer hardware.

Dr. Khanna at his desk at University of Massachusetts Dartmouth

TMO: What is the level of support and interest that Apple has provided to computational scientists over the years? How does Apple most show that support?

GK: Apple has provided support to computational scientists using the Mac and OS X over the years. They certainly develop their hardware (Mac) and software (OS X) to reflect technology that is current and up-to-date. They make special effort to make sure that application developers get the best technology they need to develop on the Mac via Xcode. Apple provides the support that enables the open-source community on OS X, which is extremely critical for the scientific community. And Apple often attempts to be disruptive by introducing radically new technologies like OpenCL, the new Mac Pro and the recently introduced Swift programming language.

Apple used to interact closely and directly with scientists, but I have seen that wane over the last few years. I hope that trend reverses in the near future.

Next: How OpenCL helps with scientific computing.

Page 3 – How OpenCL Helps With Scientific Computing

 

Dr. Khanna's tools for scientific computation.
The Mac Pro is rated at a max of 7.5 teraFLOPS.

TMO: How does Apple’s OpenCL relate to your work?

GK: OpenCL has certainly been a disruptive technology that Apple introduced. Let me give you a little background on this. Like I mentioned earlier, there are now these multi- and many-core processor technologies i.e. CPUs and GPUs respectively. When first introduced, they had their own vendor- and design- specific software development framework that had little in common with others: CUDA SDK for Nvidia’s GPUs; ATI Stream SDK for ATI’s GPUs; IBM Cell SDK for the Cell Broadband Engine, while multi-core processors (Intel, AMD) involved something totally different (OpenMP thread programming).

All these SDKs enabled general-purpose software development on their respective hardware, and offered programmability in the C programming language — but no Fortran. Yet, the details involved were remarkably different for each such architecture. Therefore, for a computational scientist, with limited time and resources available to spend on such specialized software engineering aspects of these architectures, it became exceedingly difficult to embrace and make effective use of these for advancing science.

However, in 2009 under Apple's leadership, an open standard was developed to unify the software development for all these different computer architectures under a single programming framework: the Open Computing Language (OpenCL). Apple released OpenCL 1.0 as part of OS X Snow Leopard in Fall 2009. All major processor vendors (Nvidia, AMD, IBM, Intel, for example) have adopted this standard and released support for OpenCL for their hardware. It is widely believed that what the OpenGL standard did for the area of 3D graphics hardware, OpenCL will do for general-purpose parallel computing. To summarize: OpenCL offers both performance and portability, making it a rather disruptive technology.

Black hole computations with awesome black Macs.

TMO: Many computational scientists use CUDA from NVIDIA. How does CUDA compare to OpenCL, and why do you prefer OpenCL?

GK: Yes, that’s an excellent question. Indeed CUDA is quite similar to OpenCL, and in fact, many OpenCL features were borrowed and inspired from similar CUDA features. Nvidia has contributed greatly to the OpenCL standard from the start. However, until recently, CUDA was proprietary, owned by Nvidia and therefore only ran on Nvidia GPUs. OpenCL has been open and has support from all key players in the computer industry from the beginning. In 2011 Nvidia decided to open-source CUDA, an excellent step in my opinion; however its adoption and impact is yet to be seen.

The main reason I prefer OpenCL is due to the fact that it is not tied to any specific vendor or platform. I can move transparently from one to another, based on whatever is providing the best performance to cost ratio while keeping my research codes more-or-less intact. In addition, there is evidence that open standards and technology have longevity over proprietary ones, especially in the scientific computing community. Essentially, you don’t want to be a position wherein your research is so intimately tied to a specific vendor that if the vendor decides to discontinue development of the product you use or simply goes out of business your research productively is in serious jeopardy.

Next: How OpenCL works.

Page 4 – How OpenCL Works

 

If you can dream it, you can do it.

TMO: Tell us more about how OpenCL works.

GK: As I mentioned before, OpenCL is a framework for parallel programming across a very wide variety of computer hardware architectures. In essence, OpenCL incorporates the extensions necessary to the programming language C, that allow for parallel computing on all these different processor architectures. In addition, it establishes numerical precision requirements to provide mathematical consistency across the different hardware and vendors – a matter that is of significant importance to the scientific computing community.

Computational scientists would need to rewrite the performance intensive routines in their codes as OpenCL kernels that would be executed on the compute hardware device of their choice, such as CPUs, GPUs, and even unique processors like DSP chips and FPGAs. The OpenCL API provides the programmer various functions from locating the OpenCL enabled hardware on a system to compiling, submitting, queuing and synchronizing the compute kernels on the hardware. Finally, it is the OpenCL runtime that actually executes the kernels and manages the needed data transfers in an efficient manner.

TMO: What is the advantage of using an advanced graphics processor, like the FirePro, with thousands of cores, compared to say a discrete collection of networked servers with OpenMPI?

GK: For many applications, both approaches would certainly offer a significant speed up over using a single processor, of course. The main challenge with the cluster approach would be that one would have to worry about the separate servers communicating data over the network. That has to be programmed in explicitly, by hand.

Unless, one is really lucky, scientific applications are not “embarrassingly parallel.” That is, they require some degree of communication with the different compute elements that are working on solving the problem in parallel. And that communication would need to take place over a relatively slow network that can often be a severe bottleneck. One has to make sure that the parallel algorithm one develops is such that it absolutely minimizes that network communication. This is something that can be rather challenging to do.

This is not an issue on a GPU like the FirePro, since all the compute-cores access the same (video) memory. However, it is worth noting that scaling up is easier in a clustered environment. There, you can just add as many servers as you need. So, ideally you want to be able to have both i.e. a cluster, with each server having a number of GPUs as accelerators. Indeed, that is the model that most large supercomputers are adopting today.

TMO: How has OpenCL benefited your own work, in particular?

GK: Since my initial investment in OpenCL in 2009, I have moved transparently through several hardware generations and vendors, taking full advantage of their offerings based on overall performance and associated cost.

For example, over the past five years I have successfully run my research codes on x86 CPUs, Nvidia GPUs, IBM processors and most recently, on AMD/ATI GPUs and even Accelerated Processing Units (APU). And my ability to be able to do this so easily has truly impacted my research productivity in the sense of being able to perform more accurate and longer computations of black hole systems that wouldn’t have been possible otherwise. Moreover, this approach has also helped in keeping my research costs low — which is actually quite critical in this era of declining support for basic research from federal agencies.

TMO: Apple released a radically different design in the new Mac Pro in late 2013. How does that hardware complement what is going on with the software side? Is this Mac suitable for computational scientists?

GK: Absolutely, the new Mac Pro is an OpenCL monster! It almost seems like they designed and built the machine precisely for the work I do. It's a dream machine for anyone who has invested deeply into OpenCL.

Last year with help from a highly-skilled computer technician here, I was trying to build a low-cost, very small footprint, Linux machine using commodity parts that could accommodate two or more high-end AMD GPUs for OpenCL computing.

We had a rather difficult time coming up with a design that worked well — the multiple GPUs had a tendency to overheat each other due to restricted airflow. In fact, we even burned out a few cards through prolonged use. We ultimately did manage to build a system that works, but it ended up being much larger in size that we had hoped and rather loud because of the many fans it required to stay cool. It’s not a system that would be tolerable in a quiet, small office environment.

This is exactly the issue that Apple has tackled beautifully with the new Mac Pro with its integrated thermal core. So, you see what I mean when I say that they designed the machine precisely for the work I do.

Next: The Advantage of the Mac Pro.

Page 5 – The Advantage of the Mac Pro

 

You can never have enough computing power. Ever.

TMO: In light of the above, can you reflect on the convenience, affordability (and politics) of having a discreet Mac Pro on the desk instead of a rack of 1U departmental servers, all networked and cooled?

GK: A very significant point indeed. First off, the Mac Pro is actually quite competitively priced, given its internal components. And like I said in my earlier remarks, the footprint of the machine and how quiet it is — even on full load — makes it ideal for an office environment.

In addition, the advantages that OS X offers with regard to the convenience of having the full suite of professional applications like Office, Adobe’s Creative Suite etc. alongside the UNIX underneath can never be overstated.

An additional aspect to appreciate is the matter of having stable and well-tested drivers for all the hardware that the Mac Pro comes with. This was a serious struggle for us when we built that Linux system I mentioned previously. Installing the proper video-card drivers for the GPUs and then OpenCL opened up a Pandora’s Box of sorts. We eventually did manage to find a set of beta drivers and OpenCL libraries that worked satisfactorily for my code and our chosen GPUs and flavor of Linux, but it was not trivial. I cannot think of a better scientific workstation than the Mac Pro given all these factors.

And of course, the infrastructure for a set of clustered servers is a very serious investment in itself. One needs adequate power supply, cooling and dedicated physical space. These are all challenging undertakings in a university environment. In addition, one often needs a specialized high-speed interconnect for low-latency and high-bandwidth communication between the servers, and that adds significantly to the overall cost. And then there's the time to get all that up and running.

In contrast, I do a lot of serious work with one small black cylinder on my desk. (Well, two actually.)

TMO: Apple claims that this new Mac Pro is capable of over 7 teraflops of computational power. That's surely with dual AMD FirePro 700s and OpenCL as you described above. Can you break down the relative contributions of the Xeon CPUs and the various GPUs: D300, D500, and D700? How many teraflops would one expect from the least expensive configuration?

GK: Sure. The Xeon CPUs contribute at most a few hundred gigaflops, so it they don’t contribute much to the total. Each D700 contributes just under 3.5 teraflops, the D500 around 2.2 while the D300 just about 2.0 teraflops. Therefore, even with the least expensive configuration, one can expect something in the ballpark of 4 teraflops from a 2013 Mac Pro.

TMO: Finally, how do you see things evolving in the near future with scientific computing?

GK: Ah, so, you want me to peer into my crystal ball. That is indeed something that is remarkably difficult to do in the context of the computer industry.

In my view, in the short term, I certainly think GPUs will continue to dominate in performance, perhaps ultimately evolve to merge with the CPU into some sort of heterogeneous processor similar to what Intel has done for the MacBook Air processor and what AMD is doing with its Fusion APUs. That offers various significant advantages.

I also think OpenCL is here to stay, and its adoption and development will continue on a strong sustained path. So, I think that the Mac and OS X will continue to play an important role for computational scientists in the future.

At some point, supercomputing is likely to be strongly impacted by developments coming from an entirely different segment of the computer industry: smartphones, where Apple leads as well. The reason is that supercomputers are extremely power-hungry and in order to keep overall costs down, they need to evolve to become a lot more power efficient.

Smartphones already do that quite well and are expected to do that even better in the future as consumers demand more performance from their phones and yet a longer battery life. Therefore, one can imagine a large supercomputer built from millions of Apple A7 processors that power the iPhone and the iPad. That would be very interesting and disruptive development indeed!

_____________________

Dr. Khanna has asked us to add that his research work is supported by the National Science Foundation.

Black hole art via Shutterstock.

Titan supercomputer via Wikipedia.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.