Why would a program require a specific minimum number of CPU cores?

Question

Is it possible to write code (or complete software, rather than a piece of code) that won't work properly when run on a CPU that has less than N number of cores? Without checking it explicitly and failing on purpose:

IF (noOfCores < 4) THEN don't run properly on purpose

I'm looking at a game's (Dragon Age: Inquisition) minimum system requirements, and it states a minimum of a four-core CPU. Many players say it does NOT run on two-core CPU's and EVEN on Intel Core i3s with two physical and two logical cores. And it's NOT a problem of computing power.

From my understanding, threads are completely isolated from the CPU by the OS since that cannot be done.

Just to clear things out:

I am NOT asking "Can I find out the number of CPU cores from code, and fail on purpose?" ... Such code would be ill-intentioned (forces you to buy a more expensive CPU to run a program - without the need of computational power). I am asking that your code, say, has four threads and fails when two threads are run on the same physical core (without explicitly checking system information and purposely failing).

In short, can there be software that requires multiple cores, without needing additional computing power that comes from multiple cores? It would just require N separate physical cores.

This might be a duplicate of http://stackoverflow.com/questions/150355/programmatically-find-the-number-of-cores-on-a-machine — Tommy Andersen, Jan 07 '15 at 12:17
If you read my question carefully you will see they are not asking the same thing. — uylmz, Jan 07 '15 at 12:19
Since the number of cores can be retrieved, it can be compared to N, and if that comparison evaluates to true, the code can do whatever the hell it wants, including but not limited to behaving in ways not advertised. What's your question? — , Jan 07 '15 at 12:36
Are you sure the problem is really and directly related to the number of cores? Maybe the mentioned game is partially based on a feature only (correctly) provided by CPU with at least 4 cores? — mgoeminne, Jan 07 '15 at 12:50
With your edit, it's not entirely clear what you *are* asking. — Simon B, Jan 07 '15 at 12:51
@mgoeminne I am sure. Spec requirements does not state a particular CPU model. It just requires 4 cores. Copied from official site: AMD quad core CPU @ 2.5 GHz Intel quad core CPU @ 2.0 GHz — uylmz, Jan 07 '15 at 12:53
There are different techniques to checking which cpu core a thread is running on, and thereafter comparing between the threads, take a look at the answers to this question: http://stackoverflow.com/questions/6026896/how-to-know-on-which-physical-processor-and-on-which-physical-core-my-code-is-ru/6030091 I am not entirely certain that this is what you are after though. — Tommy Andersen, Jan 07 '15 at 12:55
Hmm.. in case of Dragon Age, my first guess would be the same as Phillips. An interesting question then would be - what happens when you run the game on 4 core pc with 2 cores maxed out on something else? — UldisK, Jan 07 '15 at 13:01
I tried to clear question a bit. Seems like some of my statements make it hard to understand what I'm trying to ask specifically. — uylmz, Jan 07 '15 at 13:03
@Reek You should try asking the crazies over on http://codegolf.stackexchange.com/, I'm sure someone can cook up an interesting piece of code for you. ;) — user50849, Jan 07 '15 at 15:28
That suggestion may be even more weird than the synchronous threads thing, but maybe Dragon Age is using some very special virtualization features which are only there in quad core cpus (I don't know any through) or need to have some virtualized and therefore splitted memory/execution sections which are only achievable with quad cores / not achievable with [most] dual cores. Coming to mind here is the the requirement for much L1/L2/L3 cache, which is higher in most quad cores as far as I know. — Sebb, Jan 07 '15 at 19:57
Note that "minimum system requirements" are often "minimum system requirements to run with acceptable performance", especially with games. It is very possible that Dragon Age could, in theory, run on a single core box, but if you did so, it would show massive frame drops. So they require this number of cores not to force you to buy hardware, but to avoid quality complaints from users of lower-end hardware. — Gort the Robot, Jan 07 '15 at 20:24
Funny, I've been playing Dragone Age Inquisition on a first-generation i3 for over 30 hours now, without "massive frame drops". Got a new graphics card for it, though. — Michael Borgwardt, Jan 07 '15 at 22:30
It could also be a marketing decision. If you support low-end CPUs the Game will most likely look very bad on these machines and raise many support questions. If you say in the beginning - the game only runs on 4 Cores, you don't have to answer any support questions for older hardware, and there won't be as many bad reviews complaining about bad performance, since the game won't even start. So there could be an explicit check in the game, where it fails on purpose rather than running badly — Falco, Jan 08 '15 at 10:14
I'm pretty sure an hypothetical single-core TeraHertz (10^12 instructions per second) [MMIX](http://en.wikipedia.org/wiki/MMIX) would run your thing faster that your quad-core gaming machine. — Basile Starynkevitch, Jan 08 '15 at 11:44
@Sebb: I think you're onto something: if 4 physical cores does correlate with having more cache then 2 physical/4 logical, then the game could naturally be choking on 2x2 machines without hitting their processing power limits because it's missing cache all the time. The test would be to find a CPU with 2x2 cores and loads of cache, or 4 cores and little cache, and see what happens. — Steve Jessop, Jan 08 '15 at 12:38
Newer nVidia graphic drivers have ridiculously poor performance on dual core machines. Can you validate type of GFX used by people who tried to run it on a dual-core and failed? — Agent_L, Jan 08 '15 at 16:55
FYI, SQL Server doesn't (or used to not) run if there isn't a *power of two* cores (2, 4, 8). Yeah. Seriously. — Paul Draper, Jan 08 '15 at 23:42

pjc50 · Answer 1 · 2015-01-09T10:53:32.790

45

It may be possible to do this "by accident" with careless use of core affinity. Consider the following pseudocode:

start a thread
in that thread, find out which core it is running on
set its CPU affinity to that core
start doing something computationally intensive / loop forever

If you start four of those on a two-core CPU, then either something goes wrong with the core affinity setting or you end up with two threads hogging the available cores and two threads that never get scheduled. At no point has it explicitly asked how many cores there are in total.

(If you have long-running threads, setting CPU affinity generally improves throughput)

The idea that game companies are "forcing" people to buy more expensive hardware for no good reason is not very plausible. It can only lose them customers.

Edit: this post has now got 33 upvotes, which is quite a lot given that it's based on educated guesswork!

It seems that people have got DA:I to run, badly, on dual-core systems: http://www.dsogaming.com/pc-performance-analyses/dragon-age-inquisition-pc-performance-analysis/ That analysis mentions that the situation greatly improves if hyperthreading is turned on. Given that HT does not add any more instruction issue units or cache, it merely allows one thread to run while another is in a cache stall, that suggests strongly that it's linked to purely the number of threads.

Another poster claims that changing the graphics drivers works: http://answers.ea.com/t5/Dragon-Age-Inquisition/Working-solution-for-Intel-dual-core-CPUs/td-p/3994141 ; given that graphics drivers tend to be a wretched hive of scum and villany, this isn't surprising. One notorious set of drivers had a "correct&slow" versus "fast&incorrect" mode that was selected if called from QUAKE.EXE. It's entirely possible that the drivers behave differently for different numbers of apparent CPUs. Perhaps (back to speculation) a different synchronisation mechanism is used. Misuse of spinlocks?

"Misuse of locking and synchronisation primitives" is a very, very common source of bugs. (The bug I'm supposed to be looking at at work while writing this is "crash if changing printer settings at same time as print job finishes").

Edit 2: comments mention OS attempting to avoid thread starvation. Note that the game may have its own internal quasi-scheduler for assigning work to threads, and there will be a similar mechanism in the graphics card itself (which is effectively a multitasking system of its own). Chances of a bug in one of those or the interaction between them are quite high.

www.ecsl.cs.sunysb.edu/tr/ashok.pdf (2008) is a graduate thesis on better scheduling for graphics cards which explicitly mentions that they normally use first-come-first-served scheduling, which is easy to implement in non-preemptive systems. Has the situation improved? Probably not.

edited Jan 09 '15 at 10:53

answered Jan 07 '15 at 16:01

pjc50

10,595
1
26
29

1

Yeah there are two parts to answering this question: CPU affinity allows one to code something that would make this a technical requirement in Windows, the alternative answer is realtime systems can very definitely require such things. +1 for being the only person to mention CPU affinity which is really the most likely culprit for what is being asked here. – Jimmy Hoffa Jan 07 '15 at 17:24
3

What can go wrong if you set the affinity to current core? With preemptive multitasking the waiting thread _will_ be scheduled unless current one has maximum possible priority ("realtime" in Windows). I'd see another scenario: each of the 4 threads are assigned statically-defined affinity of 1,2,4,8, in which case the latter two threads will never be scheduled (although I'm not sure if setting affinity to effective zero is going to succeed). – Ruslan Jan 07 '15 at 18:33
@Ruslan Maybe trying to set invalid affinity will crash the application in the first place? – Luaan Jan 08 '15 at 16:29
1

@Luaan well that's not _that_ risky operation to lead to crash. Maximum what I'd expect is an error returned by the OS. I've just checked, in Linux I get "Invalid argument" error. Don't know what Windows would say. – Ruslan Jan 08 '15 at 16:37
@Ruslan Every major OS for certainly over a decade has included code to avoid thread starvation (usually by boosting the priority of a thread after it hasn't run for long enough). – Voo Jan 08 '15 at 19:05
@Ruslan You're right, I was thinking more in line with exceptions and stuff - it's one of those operations where you probably don't even notice it didn't work if you're just ignoring the return value. – Luaan Jan 09 '15 at 09:31
@Voo While that's definitely true, it fails for such expert configuration when you don't actually understand what you're doing. Having a producer and a consumer on threads with different priority can still result in a "deadlock" on a single-core system, for example. While this is in fact the intended behaviour, most people think process and thread priorities work differently, and they're going to have issues like this. If you let the OS do its work, it's going to do it well. The trouble starts when you think you can do better :D – Luaan Jan 09 '15 at 09:32
@Luaan Apart from explicitly setting the thread priority to disable priority boosts from the OS (Windows apparently doesn't increase the priority for threads that have >15 to begin with) - how'd you do that? It might take rather long to get things done, but I don't see how you'd get a deadlock – Voo Jan 09 '15 at 11:13
@Voo "Rather long" is a bit of an understatement - no matter the workload, there's a minimum of about 2 minutes for the forced priority change if you set the priorities "just right". So yeah, not an infinite deadlock (but then again, you never wait without timeouts anyway, right? :)), but good enough to make the application unusable. http://blog.codinghorror.com/thread-priorities-are-evil/ is a great article about some of the issues. Check the links too, it's all pretty awesome, especially the article by Joe Duffy. – Luaan Jan 09 '15 at 13:58
"Has the situation improved? Probably not." As someone who did graduate research on GPU task scheduling, your guess was at least still correct as of a couple of years ago. :) Indeed, as of that time, NVidia chips wouldn't start scheduling thread blocks from a second kernel until all blocks of the previous kernel had been scheduled. In order to run two kernels simultaneously, it was necessary to split the kernel launches into smaller slices such that the slices and alternate launches. I'm not as familiar with AMD, though, so I can't speak to that. My research was in CUDA. – reirab Jan 09 '17 at 21:02

score 34 · Answer 2 · edited May 23 '17 at 12:40

34

It could be necessary to have 4 cores because the application runs four tasks in parallel threads and expects them to finish almost simultaneously.

When every thread is executed by a separate core and all threads have the exact same computational workload, they are quite likely (but far from guaranteed) to finish roughly the same time. But when two threads run on one core, the timing will be a lot less predictable because the core will switch context between the two threads all the time.

Bugs which occur because of unexpected thread timing are referred to as "race conditions".

In the context of game development, one plausible architecture with this kind of problem could be one where different features of the game are simulated in real-time by different CPU threads. When each feature runs on an own core, they are all simulated with roughly the same speed. But when two features run on one core, both will only be simulated half as fast as the rest of the game world, which could cause all kinds of weird behaviors.

Note that a software architecture which depends on independent threads running with specific timings is extremely fragile and a sign of very bad understanding of concurrent programming. There are features available in practically all multithreading APIs to synchronize threads explicitly to prevent these kinds of problems.

edited May 23 '17 at 12:40

Community

1

answered Jan 07 '15 at 12:56

Philipp

23,166
6
61
67

This is in the vein with the answers I was expecting. – uylmz Jan 07 '15 at 13:06
11

But any game has a fragile dependence on being able to complete all the computation for the next frame in time to render it with reasonable frequency. Even if your 4 threads are synchronized correctly, it may be impossible to render in a timely fashion, and there's no benefit in a game which is computationally correct but unplayable due to lag and stutter. – Useless Jan 07 '15 at 17:52
1

@Useless: That's not really true. You can for example buffer frames or simulation data to hide any stutter, and there are concurrent designs that are more consistent. Getting all your processing done in X time and requring exact synchronization of that processing are different matters. – DeadMG Jan 07 '15 at 18:37
Not to mention that a game that renders perfectly isn't much use either if it makes too many computational mistakes. – Blrfl Jan 07 '15 at 19:14
23

"a software architecture which depends on independent threads running with specific timings is extremely fragile" Which is exactly why I can't imagine a game that doesn't run at all with 2 cores, but reliably works with 4 cores. Even with 4 cores, the timing will be unpredictable, so the race condition would occur too, even if less frequently. – svick Jan 07 '15 at 19:50
1

It's risky for user code - you can't predict available computation power because ANY other code can use "your" cores. – Ginden Jan 07 '15 at 19:55
8

@svick of course. But the question asks "is it possible?" not "is it sane?" – user253751 Jan 08 '15 at 03:20
1

It's not about predictability. There are timers that programs can use to make processing steps predictable. It's about weather all processing can be completed before the timers expire. A game that requires 4 2GHz cores can be run with similar performance on a single core 10GHz CPU (why not 8? Because you need to provide some overhead for thread switching). – slebetman Jan 08 '15 at 03:36
5

Any code with this kind of "race conditions" is flat-out **broken**, no matter how many cores you run it on. (Especially since there's absolutely no guarantee as to what *else* is running on the system.) I seriously doubt this to be the cause, given how easily it would trip the game even on a hexacore system... – DevSolar Jan 08 '15 at 16:04
1

@Useless: A game that cannot run at a reduced framerate is a game that cannot be usefully de-bugged. – Jack Aidley Jan 08 '15 at 16:05
@Useless There's no hard dependency on finishing before the next display frame. You can just not update the display until the next computational step is complete. The game would slow down, but not skip ahead. That's still playable, and sometimes beneficial to the player since he has more time to react in real time games. The worst that could happen is that you're playing a rhythm game and the slowdown comes and goes inconsistently (which would be odd for a rhythm game). You only have real time requirements in a networked game, but network latency is a bigger issue there. – Doval Jan 08 '15 at 17:31

score 16 · Answer 3 · answered Jan 07 '15 at 20:44

16

It is unlikely that these "minimum requirements" represent something below which the game will not run. Far more likely is that they represent something below which the game will not run with acceptable performance. No game company wants to deal with lots of customers complaining about crappy performance when they are running it on a single core 1 Ghz box, even if the software could technically run. So they probably deliberately design to fail hard on boxes with fewer cores than would give them acceptable performance.

One important metric in game performance is the frame rate. Typically they run at either 30 or 60 frames per second. This means that the game engine has to render the current view from the game state in a fixed amount of time. To achieve 60 fps, it has just a bit more than 16 msecs to do this. Games with high-end graphics are extremely CPU bound and so there's a huge give-and-take between trying to push higher quality (which takes more time) and the need to stay in this time budget. Thus, the time budget for each frame is extremely tight.

Because the time budget is tight, the developer ideally wants exclusive access to one or more cores. They also likely want to be able to do their rendering stuff in a core, exclusively, as it's what has to get done on that time budget, while other stuff, like calculating the world state, happens on a separate process where it won't intrude.

You could, in theory, cram all this onto a single core, but then everything becomes much harder. Suddenly you have to make sure all that game state stuff happens fast enough, and allows your rendering to happen. You can't just make them two software threads because there's no way to make the OS understand "thread A must complete X amount of work in 16 msecs regardless of what thread B does".

Game developers have zero interest in making you buy new hardware. The reason they have system requirements is that the cost of supporting lower end machines is not worth it.

answered Jan 07 '15 at 20:44

Gort the Robot

14,733
4
51
60

While this is true, it happens that you can buy dual-core hardware that is powerful enough that it can achieve more in a given time frame than the quad core hardware described in the minimum specs. Why would the vendor not list such hardware as acceptable, a decision which can only lose them sales? – Jules Jan 08 '15 at 11:16
4

The thing to compare isn't 2 vs. 4 cores. It's essentially 1 vs. 3 cores, as CPU#0 will be pretty much pegged by the graphics driver and DPCs. There are also significant cache and migration effects if you oversubscribe a CPU with several kinds of tasks in a typical modern game's job system. The requirement is there because Frostbite (DA:I's engine) is designed from the ground up with very careful tuning that requires a particular number of cores. – Lars Viklund Jan 08 '15 at 13:30
6

@LarsViklund It sounds like you know more details than anyone else here. Having you considered putting an answer together? – Gort the Robot Jan 08 '15 at 14:53
1

"It is unlikely that these "minimum requirements" represent something below which the game will not run. Far more likely is that they represent something below which the game will not run with acceptable performance." -- Intel's G3258 is a very powerful dual core processor widely used by gamers that is capable of running games equal or more resource intensive than Dragon Age Inquisition, but many players report the game does not run on it. – uylmz Jan 08 '15 at 15:46
@Reek I don't know that I'd describe the G3258 as 'very powerful.' It's current, but it's towards the low-end of current desktop chips. It's dual core, no HT, only 3 MB cache. – reirab Jan 08 '15 at 16:51
@StevenBurnap Had the original question been about the things which make the performance requirements of games different from a lot of other computing to the degree that such a minimum number of hardware threads requirement is warranted, then an answer expanded from my comment with a bit of research may have been appropriate. The current shape of the question is largely uninteresting to answer, as it's just about gathering fodder for some argument about whether it's possible to have a hard failure by not having enough hardware threads. The mass of other irrelevant answers also deters. – Lars Viklund Jan 08 '15 at 19:52
2

@Reek I am doubtful that an end user can easily tell how resource intensive a particular game is as compared to another. – Gort the Robot Jan 08 '15 at 20:41
@LarsViklund If you share your knowledge Q&A style with another, more appropriate question, we would be happy to learn from it :) – uylmz Jan 09 '15 at 13:34

score 9 · Answer 4 · answered Jan 07 '15 at 18:31

9

Three realtime threads that never sleep and one other thread. If there are less than four cores, the fourth thread never runs. If the fourth thread needs to communicate with one of the realtime threads for the realtime thread to finish, the code will not finish with less than four cores.

Obviously if realtime threads are waiting on something that doesn't allow them to sleep (such as a spinlock) the program designer screwed up.

answered Jan 07 '15 at 18:31

Joshua

1,438
11
11

1

Arguably, when a user application requests realtime threads in the first place, the designer screwed up :D – Luaan Jan 08 '15 at 16:31
2

I've done it. Half a million lines of code. One case using about 300 lines. Realtime thread spends most of its time waiting for input so it can timestamp the input and hand it to a lesser priority thread. – Joshua Jan 08 '15 at 16:33
2

@Luaan For most applications I'd agree with you, but games are a different beast, as are embedded applications. In both of those cases, concern for playing nice with other concurrent applications mostly goes out the window in favor of performance. – reirab Jan 08 '15 at 16:54
While it wouldn't be particularly efficient, this scenario would not lead to any deadlocks - priority inversion would take care of it (assuming any halfway decent scheduler in any major OS of the last decade) – Voo Jan 08 '15 at 19:10
You're assuming that priority inversion knows about hand-rolled spinlocks. Also, Windows (as is effectively specified by the question) doesn't know what priority inversion is. – Joshua Jan 08 '15 at 19:27
2

@Joshua *> Windows doesn't know what priority inversion is.* What? http://support.microsoft.com/kb/96418, http://msdn.microsoft.com/en-us/library/windows/desktop/ms684831.aspx. Also, priority inversion is the term that describes the *issue*, not a solution (@Voo). – Bob Jan 09 '15 at 04:54
@reirab Embedded applications, sure. But you still need to know what you're doing, of course. For games, it's still a bad practice. There's always the applications that take too much exclusive access to some piece of the system - all those games that suddenly throw an error message you can't even see and left with (almost) no way to kill the application, because it decided to always stay on top... And of course, Windows gives priority to foreground applications anyway - you're only fighting other badly written applications, not the well behaving residents. Leading to more bad apps :D – Luaan Jan 09 '15 at 09:39
@Bob Right you are, got a bit confused there, obviously what I meant was priority boosting :-) And boosting the priority of starved threads obviously also works with handrolled spin locks - it doesn't matter to it at all. – Voo Jan 09 '15 at 11:17
@Bob: It turns out the NT solution is an inadequate solution to priority inversion when real-time threads are involved. The 95 solution is the real fix. – Joshua Jan 09 '15 at 16:19
@Joshua While I cannot find any more recent sources (that article specifically applies to NT 3.5/4.0 and 95, which are both two decades old), I would hope that modern Windows does something better. Actually, I'm surprised that NT did random boosting while 95 actually implemented priority inheritance. – Bob Jan 10 '15 at 01:33
Ok so maybe it's fixed now, but nobody's got documentation on it yet. Still won't see through hand-rolled spinlock. – Joshua Jan 10 '15 at 03:09

score 3 · Answer 5 · answered Jan 07 '15 at 13:09

3

First of all software threads has nothing to do with hardware threads and is often mixed up. Software threads are pieces of code than can be dispatched and run on it's own within the process context. Hardware threads are mostly managed by the OS and are dispatches to the processor's core when talking about regular programs. These hardware threads are dispatched based on load; the hardware thread dispatcher acts more or less like a load balancer.

However when it comes to gaming, especially high end gaming, sometimes the hardware threads are managed by the game itself or the game instructs the hardware thread dispatcher what to do. That is because every tasks or group of tasks doesn't have the same priority like in a normal program. Because dragon age comes from an high end game studio using high end game engines I can imagine that it uses "manual" dispatch and then the number of cores becomes a minimal system requirement. Any program would crash when I send a piece of code to the 3rd physical core running on a machine with only 1 or 2 cores.

answered Jan 07 '15 at 13:09

dj bazzie wazzie

517
2
8

This. Remember that saying "check no of cores" means that a company is making its software product in a specific way to force users to buy more expensive hardware (which would be ill-intended). – uylmz Jan 07 '15 at 13:11
@Reek There's valid reasons for checking the number of cores. The most common one is probably to create one thread per core so you can parallelize computations. – Doval Jan 07 '15 at 13:26
@Doval Without failing on purpose, yes there are valid reasons. – uylmz Jan 07 '15 at 13:30
2

These problems exists as long as there is PC gaming. In the beginning we had 486dx and 486sx, later the MMX and non-MMX Pentium, core and non-core and today we have n-core requirements. This is one of the reasons why consoles still exists. – dj bazzie wazzie Jan 07 '15 at 14:02
4

Do you have a reference for games taking over CPU scheduling themselves? As far as I was aware, this is not directly possible under Windows, at least not in a way that would fail in the way you suggest. – Jules Jan 08 '15 at 11:13
@Jules: I know the id tech engine 5 does it as John Carmack explained in a presentation of the id tech 5 engine that certain tasks are running only on a single core, without interference of other tasks of the game. Windows itself doesn't provide an interface to manage the cores, but there are plenty of libraries available. For example Apple's GDC is available as XDipatch for Windows. – dj bazzie wazzie Jan 08 '15 at 12:38
@Jules As DJ bazzie wazzie says it does happen. It only makes sense on a 3 core minimum system as you have to leave 1 core free to be used by the OS and anything running in the background. – Tonny Jan 08 '15 at 13:11
2

@djbazziewazzie actually windows does provide an api to do just that, I.e. set a thread to always use the same core; this is called thread affinity, and does not allow you to manually select which piece of code runs where and when, and cannot cause a system failure as you suggest (the system will ignore a request to set affinity to a non-existant core, and just keep scheduling the thread to any core when it becomes available. I'm pretty sure this is what id Tech uses, and it doesn't really amount to "managing the hardware threads itself". – Jules Jan 08 '15 at 14:01
1

@djbazziewazzie You also appear to misunderstand the point of Grand Central Dispatch, which does *not* give developers more control over how their code is scheduled to a core; in fact, its purpose is the precise opposite: taking the choice of how many threads to create and which code should run on which thread out of the hands of applications so that it can be optimized for the available hardware at a system-wide level. Dependency on having a certain number of cores is exactly the kind of problem GCD was designed to prevent. – Jules Jan 08 '15 at 14:13
I wasn't saying that GDC does what is needed for gaming, only saying that it can bypass Windows standard protocol with instructing the lib what to do. In the presentation of id tech 5 they clearly told that (pre)rendering done by the CPU is send to a single core while other tasks as physics and AI will run on another core. I know that the id tech 5 engine can run on a single core (runs the game like a spreadsheet) but that doesn't mean it won't do some instructions when they are available. Coming back to the point where multi core becomes an hardware requirement. – dj bazzie wazzie Jan 08 '15 at 14:24
@djbazziewazzie Setting a thread to only ever be scheduled on a particular core is very different from setting a core to never schedule any thread other than a given one (i.e. a realtime thread.) The former (setting a thread affinity) is common. The latter is extremely rare and usually (intentionally) not possible from user land. – reirab Jan 08 '15 at 17:00
@djbazziewazzie GCD is nothing more than a threadpool - we had those for a very long time and in no way bypasses any standard protocol (what protocol anyhow?). The reason to set thread-affinity is to make optimal use of your caches and minimize interconnect bandwidth (well and to hide bugs in your multi-threaded code), that's it. – Voo Jan 08 '15 at 19:17

Lawtonfogle · Answer 6 · 2015-01-07T15:28:02.933

Since it is possible to use virtualize to have more virtual cores than physical and the software would not know it is running on a virtualize and instead think that it does have that many physical cores, I would say such software is not possible.

That is to say, it is not possible to write software that will always stop on less than N cores.

As others have pointed out, there are software solutions that can potentially check, especially if the OS and code being used has little protection against race conditions when N processes run on <N processors. The real trick is code that will fail when you have less than N processors but won't fail when you do have N processors but have an OS that may assign work to less than N processors.

score 1 · Answer 7 · answered Jan 07 '15 at 15:49

It could be that there are three threads doing something (generating backgrounds or generating NPC movement) and passing events to a fourth, which is supposed to aggregate/filter the events and update the view model. If the fourth thread doesn't get all the events (because it's not scheduled on a core) then the view model doesn't get updated correctly. This may only happen sporadically, but those cores need to be available at any point. This might explain why you're not seeing high CPU usage all the time, but the game is failing to work properly anyway.

In such a scenario the game would also fail randomly when background services were scheduled to run, which is quite frequently on most pcs. — Jules, Jan 08 '15 at 11:19

score 1 · Answer 8 · answered Jan 08 '15 at 04:46

1

I think Joshua is heading down the right path, just not to it's conclusion.

Suppose you have an architecture where there are three threads that are written to do as much as they can--when they finish what they are doing they do it again. To keep performance up these threads do not release control for anything--they don't want to risk the lag from the Windows task scheduler. So long as there are 4 or more cores this works fine, it fails badly if there aren't.

In general this would be bad programming but games are another matter--when you're faced with a choice between a design that's inferior on all hardware or a design that is superior on sufficiently good hardware or a failure on inferior hardware game developers usually choose to require the hardware.

answered Jan 08 '15 at 04:46

Loren Pechtel

3,371
24
19

It's usually not possible to write a thread that will not relinquish control to other threads. All modern non-RTOS operating systems use preemptive multitasking, which intentionally makes it impossible for a (user mode) thread to not release control of a given core. Kernel threads, of course, are a different matter. – reirab Jan 08 '15 at 17:05
@reirab Boost it's priority. – Loren Pechtel Jan 09 '15 at 02:27
@Loren Doesn't change the fact that the scheduler still dies its work meaning you have to share time with other threads of the same priority and the scheduler boosting priority of starved threads. You can't do that on normal OSes and even if you could, games certainly wouldn't be an acceptable application of doing so either. – Voo Jan 09 '15 at 15:24

score 1 · Answer 9 · answered Jan 08 '15 at 17:21

Is it possible to write code (or complete software, rather than a piece of code) that won't work properly when run on a CPU that has less than N number of cores?

Absolutely. The use of real-time threads would be a good example of a situation in which this is, not only possible, but the desired way (and often, the only correct way) to get the job done. However, real-time threads are usually limited to the OS kernel, usually for drivers which need to be able to guarantee that a hardware event of some sort is handled within some defined period of time. You should not have real-time threads in normal user applications and I'm not sure that it's even possible to have one in a Windows user-mode application. Generally, operating systems make it intentionally impossible to do this from user land precisely because it does allow a given application to take over control of the system.

Regarding user-land applications: Your assumption that checking for a given number of threads in order to run is necessarily malicious in intent is not correct. For instance, you could have 2 long-running, performance-intensive tasks that need a core to themselves. Regardless of CPU core speed, sharing a core with other threads could be a serious and unacceptable performance degradation due to cache thrashing along with the normal penalties incurred by thread switching (which are pretty substantial.) In this case, it would be perfectly reasonable, especially for a game, to set each of these threads to have an affinity only on one particular core for each of them and then set all of your other threads to not have affinity on those 2 cores. In order to do this, though, you'd have to add a check that the system has more than 2 cores and fail if it doesn't.

score 0 · Answer 10 · answered Jan 08 '15 at 18:15

Any code using spinlocks with any noticeable amount of lock contention will perform terribly (to an extent where -- for an application like a game -- you can say "doesn't work") if the number of threads exceeds the number of cores.

Imagine for example a producer thread submitting tasks to a queue which serves 4 consumer threads. There are only two cores:

The producer tries to obtain the spinlock, but it is held by a consumer running on the other core. The two cores are running lockstep while the producer is spinning, waiting on the lock to be released. This is already bad, but not as bad as it will get.
Unluckily, the consumer thread is at the end of its time quantum, so it is preempted, and another consumer thread is scheduled. It tries to get hold of the lock, but of course the lock is taken, so now two cores are spinning and waiting for something that cannot possibly happen.
The producer thread reaches the end of its time slice and is preempted, another consumer wakes up. Again, two consumers are waiting for a lock to be released, and it just won't happen before two more time quantums have passed.
[...] Finally the consumer that was holding the spinlock has released the lock. It is immediately taken by whoever is spinning on the other core. There is a 75% chance (3 to 1) that it's another consumer thread. In other words, it's 75% likely that the producer is still being stalled. Of course this means that the consumers stall, too. Without the producer sumbitting tasks, they have nothing to do.

Note that this works in principle with any kind of lock, not just spinlocks -- but the devastating effect is much more prominent with spinlocks because the CPU keeps burning cycles while it achieves nothing.

Now imagine that in addition to the above some programmer had the brilliant idea to use a dedicated thread with affinity set to the first core, so RDTSC will give reliable results on all processors (it won't anyway, but some people think so).

That is why good spinlocks downgrade to other lock types after a small time, and even better ones do so very quicker if past usages of the same lock have had to downgrade. — Ian, Jan 10 '15 at 18:03

score -1 · Answer 11 · edited Jan 08 '15 at 16:15

-1

Windows has built-in functionality for this: the function GetLogicalProcessorInformation is in the Windows API. You can call it from your program to get information about cores, virtual cores, and hyperthreading.

So the answer to your question would be: Yes.

edited Jan 08 '15 at 16:15

Peter Mortensen

1,050
2
12
14

answered Jan 07 '15 at 12:25

Pieter B

12,867
1
40
65

3

I am not asking "Can I find out no of cores from code?" ... Such a code will be ill-intended (forces you to buy a more expensive CPU to run a program - without the need of computational power). – uylmz Jan 07 '15 at 12:28
3

This function gives much more information then just a raw "number of cores". With this information you can deduct physical cores, logical cores and more. If you can deduct that, then you can write software to use this information. In a good or bad way (crash program when you see 4 cores but less then 4 physical cores). – Pieter B Jan 07 '15 at 12:32
1

This may work in Windows, but what about OSX/Linux/iOS/Android/etc.? While it is referencing a game as an instance where this behavior is seen (and the natural correlation would be Windows = Gaming), it doesn't seem to be a game specific request. – Robert Jan 07 '15 at 15:50
For a game like Dragon Age, the systems in question are Windows/XBox/PS4. – Gort the Robot Jan 07 '15 at 22:30
Linux has `/proc/cpuinfo` and `sysconf(_SC_NPROCESSORS_ONLN)` (the latter being mentioned in POSIX). Using the info to enforce a minimum performance threshold is still pretty bad form, though. – cHao Jan 08 '15 at 14:05

Blrfl · Answer 12 · 2015-01-07T16:38:32.637

If I understand what you're asking, it's possible, but it's a very, very bad thing.

The canonical example of what you're describing would be maintaining a counter which is incremented by multiple threads. This requires almost nothing in therms of computing power but requires careful coordination among the threads. As long as only one thread at a time does an increment (which is actually a read followed by an addition followed by a write), its value will always be correct. This is because one thread will always read the correct "previous" value, add one and write the correct "next" value. Get two threads into the action at the same time and both will read the same "previous" value, get the same result from the increment and write the same "next" value. The counter will effectively have been incremented only once even though two threads think they each did it.

This dependency between timing and correctness is what computer science calls a race condition.

Race conditions are often avoided by using synchronization mechanisms to make sure threads wanting to operate on a piece of shared data have to get in line for access. The counter described above might use a read-write lock for this.

Without access to the internal design of Dragon Age: Inquisition, all anyone can do is speculate about why it behaves the way it does. But I'll have a go based on some things I've seen done in my own experience:

It might be that the program is based around four threads that have been tuned so everything works when the threads run mostly-uninterrupted on their own physical cores. The "tuning" could come in the form of rearranging code or inserting sleeps in strategic places to mitigate race-condition-induced bugs that cropped up during development. Again, this is all conjecture, but I've seen race conditions "resolved" that way more times than I care to count.

Running a program like that way on anything less capable than the environment for which it was tuned introduces timing changes that are a result of the code not running as quickly or, more likely, context switches. Context switches happen in physical (i.e., the CPU's physical cores are switching between the work its logical cores are holding) and logical (i.e., the OS on the CPU is assigning work to the cores) ways, but either is a significant divergence from what would be the "expected" execution timing. That can bring out bad behavior.

If Dragon Age: Inquisition doesn't take the simple step of making sure there are enough physical cores available before proceeding, that's EA's fault. They're probably spending a small fortune fielding support calls and emails from people who tried to run the game on too little hardware.

Some players say its caused by DRM running on 2 cores and the actual game runs on 2 too. When DRM and game threads run on same core it gets messed up. But this doesn't sound correct to me, it may be a little story made up by a player that does not know much about sw or hw architecture. — uylmz, Jan 07 '15 at 16:43
race conditions really haven't much to do with core count, -1... a single core machine with multiple virtual threads can have race conditions totally dependent on the runtime's time slicing technique, or a many core system could avoid all race conditions dependent on how strict it is with membar operations... — Jimmy Hoffa, Jan 07 '15 at 17:27
@Reek: Without intimate knowledge of how the program works, anything's a guess. Two cores to do just the DRM seems a little excessive to me. — Blrfl, Jan 07 '15 at 17:35
@JimmyHoffa: I disagree. A race condition is still a race condition even when it's not causing undesired behavior. Core count _can_ influence whether or not that behavior happens, which is what the questioner asked, but I didn't cite it as the sole variable. — Blrfl, Jan 07 '15 at 17:51

Why would a program require a specific minimum number of CPU cores?

12 Answers12