Several years ago, I asked a colleague whether the QNX Neutrino SMP kernel could scale beyond 8 processors. Can't remember the answer (I think it's 32), but I do remember my colleague commenting, "But it doesn't really matter, since, practically speaking, 4 processors is the upper limit for SMP." (This was back in the day when SMP systems typically consisted of multiple discrete processors on a board, rather than multiple processing cores on a chip.)
Fast forward to last Tuesday, when the Sandia National Laboratories (SNL) published the results of a study on multi-core scalability. According to SNL's announcement, multi-core systems show a significant increase in speed when scaling from 2 to 4 cores, an insignificant increase in speed when scaling from 4 to 8 cores, and a decrease in speed when scaling beyond 8 cores.
Why such poor results beyond 4 cores? The culprit, according to SNL, is insufficient memory bandwidth: too many cores "asking for memory through the same pipe."
In a blog entry posted earlier today, Clay Breshears of Intel says he isn't surprised, since the problem of having multiple cores share a single memory pipe is well known. He questions, however, whether the poor results could also be the outcome of how SNL's software algorithm was implemented.
SNL doesn't get into details about their methodology, so indeed, it would be interesting to see if different implementations of the same algorithm yield different results.
How about it, SNL?