Multi-core performance boost: Does it stop at four cores?

Several years ago, I asked a colleague whether the QNX Neutrino SMP kernel could scale beyond 8 processors. Can't remember the answer (I think it's 32), but I do remember my colleague commenting, "But it doesn't really matter, since, practically speaking, 4 processors is the upper limit for SMP." (This was back in the day when SMP systems typically consisted of multiple discrete processors on a board, rather than multiple processing cores on a chip.)

Fast forward to last Tuesday, when the Sandia National Laboratories (SNL) published the results of a study on multi-core scalability. According to SNL's announcement, multi-core systems show a significant increase in speed when scaling from 2 to 4 cores, an insignificant increase in speed when scaling from 4 to 8 cores, and a decrease in speed when scaling beyond 8 cores.

Why such poor results beyond 4 cores? The culprit, according to SNL, is insufficient memory bandwidth: too many cores "asking for memory through the same pipe."

In a blog entry posted earlier today, Clay Breshears of Intel says he isn't surprised, since the problem of having multiple cores share a single memory pipe is well known. He questions, however, whether the poor results could also be the outcome of how SNL's software algorithm was implemented.

SNL doesn't get into details about their methodology, so indeed, it would be interesting to see if different implementations of the same algorithm yield different results.

How about it, SNL?


Stay said...

Did they test with a NUMA architecture?

Did they test with the i7 with its triple-channel having a copy bandwidth of 12Gig per sec?

Paul N. Leroux said...

Hi Stay. From what I can tell, they didn't test with a NUMA architecture. Also, I think used AMD processors, but maybe I'm misreading the article. Hopefully, SNL will publish a whitepaper on their methodology and hardware -- that would clear up some questions. Wish they had an online forum or blog; that would help, too.