Is there a thing as "too many threads"

Question

I wasn't sure if here or SO was the right place to ask this, but here goes anyway.

So I want to improve a system that is currently running. It has services and many stand alone apps, but none of these things are properly coordinated. Now I want to build a mini app to prove the improvements to the system should we build these things into a new "framework" if you will.

The idea is to create a dashboard application that manages and reports on a service in the background. The service will run lets say every 3 minutes and this will in turn call a class (FileProcessor i.e.) that in turn calls each type of processor's Run method. This Run method will be a call to launch a thread of a Runnable class. I want to then use this runnable class to hold properties of the file it is/was processing, if an error existed - the cause, an option to rerun the method, its current status (i.e. at what point in processing it is).

These "runnables" will sometimes be a direct interaction with a file in a directory, or it would be a record in the database requesting a file be processed.

Now having some background, I'd like to ask the following:

Imagine the direct interaction with the files in a directory (i.e. changing of an extension) is done immediately in a "batch mode" where lets say five files were found in C:\Program Files\My App\Files\\*.xyz, and this process (directly after a directory search) renames all these files to C:\Program Files\My App\Files\\*.zyx one at a time, lets say with the following:

foreach (FileInfo fi in di.GetFiles("*.xyz", TopDirectoryOnly))
{
    fi.MoveTo(Path.ChangeDirectory(fi.FullName, ".zyx"));
}

Oppositely, the records retrieved from the database will be each given to a separate thread to process, lets say 10 files request processing, 10 threads will be launched of this Runnable type, each holding this status information etc. These threads will be added to a list to keep track of these processes to be able to pass this information to the application.

The renaming of files are simple and quick and thus I believe to put it in a single "method call", whereas the files that request processing can be anything between 20 to 700 lines where values are drawn from the line and inserted into a MySQL database. For this reason I want to allow all files to "start at the same time" so that lets say one or two 700 line files does not block/delay the other twenty 15 line files that could have all been done by the time a single 700 line file processed.

Now basically I'd like to know if someone with more knowledge and experience than me can say if this is a good idea or way of approaching the solution. Perhaps I'm missing something or my design could be off.

P.S. This service and application will run on a server that is quite capable.

Thank you in advance for any direction and/or advice.

Are you looking for an answer about threads in the abstract or threads in C# in particular? — Blrfl, Jul 20 '18 at 21:42
@Blrfl This is sort of both. It is in abstract a question of my design and the use of threading in C# to accomplish the needed goal(s) — JDProwler, Jul 20 '18 at 22:09
More clarification if you don't mind: What requirements and observations of the single-threaded performance of this process led you to believe that a parallel solution is the right thing? — Blrfl, Jul 21 '18 at 12:35

score 10 · Accepted Answer · edited Jul 20 '18 at 21:05

10

Threads do have significant costs - VERY roughly - imagine 100K bytes per thread (they each need a stack for one thing), and they each place a slight burdon on operating system components (e.g. scheduler) which have to manage them all.

Threads DO present a very simple model for managing async tasks. I'm a big fan of that approach.

But if you are going to use a lot of threads, please consider using thread pools as a way to re-use the underlying thread objects (while having lots of runnables - just not running).

And - since you are using C#, async tasks (https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/async/) are a more efficient stategy to consider.

Often though - simplicity of implementation matters more than efficiency (up to a point). What you described with a thread-pool (to throttle actual thread count) may work fine.

edited Jul 20 '18 at 21:05

Robert Harvey

198,589
55
464
673

answered Jul 20 '18 at 20:35

Lewis Pringle

2,935
1
9
15

Thanks for the response. So the approach that I want to take is adequate, correct? But the thread pool, will this then replace the `List` that I wanted to use to manage the tasks that are running? Or is a thread pool just a "number of allowed threads allocated to a single application"? I need to do some research on this. I am quite capable of a few things in C#, but I still lack quite a bit, and especially in the juicy areas x_X – JDProwler Jul 20 '18 at 21:15
This is a good read on thread pools [https://softwareengineering.stackexchange.com/questions/173575/what-is-a-thread-pool] – JDProwler Jul 20 '18 at 21:23
https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/threading/how-to-use-a-thread-pool appears to describe using threadpools in C#. Basically a threadpool is just a List - like you said, but pre-allocated (maybe 10-20? as many as you want). And you queue work items (runnables) to the threadpool, and it handles them in order. – Lewis Pringle Jul 20 '18 at 22:59

Is there a thing as "too many threads"

1 Answers1