well, it really depends on what you are developing. the answer, depending on what you are developing, can range from "it's insignificant" to "it's absolutely critical, and we expect everyone in the team to have a good understanding and use of parallel implementations".
for most cases, a solid understanding and use of locks, threads, and tasks and task pools will be a good start when the need for parallelism is required. (varies by lang/lib)
add to that the differences in designs you must make - for nontrivial multiprocessing, one must often learn several new programming models, or parallelization strategies. in that case, the time to learn, to fail enough times to have a solid understanding, and to update existing programs can take a team a year (or more). once you have reached that point, you will (hopefully!) not percieve or approach problems/implementations as you do today (provided you have not made that transition yet).
another obstacle is that you're effectively optimizing a program for a certain execution. if you're not given much time to optimize programs, then you really won't benefit from it as much as you should. high level (or obvious) parallelization can improve your program's percieved speed with fairly little effort, and that's as far as many teams will go today: "We've parallelized the really obvious parts of the app" - that's fine in some cases. will the benefit of taking the low hanging fruit and using simple parallization be proportionate to the number of cores? oftentimes, when there are two to four logical cores but not so often beyond that. in many cases, that's an acceptable return, given the time investment. this parallel model is many people's introduction to implementing good uses of parallelism. it is commonly implemented using parallelized iteration, explicit tasks, simple threads or multitasking.
what you learn using this trivial parallel models will not be ideal in all complex parallel scenarios; effectively applying complex parallel designs requires a much different understanding and approach. these simple models are often detached or have trivial interaction with other components of the system. as well, many implementations of these trivial models do not scale well to effectively complex parallel systems - a bad complex parallel design can take as long to execute as the simple model. ill: it executes twice as fast as the single threaded model, while utilitzing 8 logical cores during execution. the most comon examples are using/creating too many threads and high levels of synchronization interference. in general, this is termed parallel slowdown. it's quite easy to encounter if you approach all parallel problems as simple problems.
so, let's say you really should utilize efficient multithreading in your programs (the minority, in today's climate): you'll need to employ the simple model effectively to learn the complex model and then relearn how you approach program flow and interaction. the complex model is where your program should ultimately be at since that's where hardware is today, and where the most dominant improvements will be made.
the execution of simple models can be envisioned like a fork, and the complex models operate like a complex, uh, ecosystem. i think understanding of simple models, including general locking and threading should be or will soon be expected of intermediate developers when the domain (in which you develop) uses it. understanding complex models is still a bit unusual today (in most domains), but i think the demand will increase quite quickly. as developers, much more of our programs should support these models, and most of use are quite far behind in understanding and implementing these concepts. since logical processor counts are one of the most important areas of hardware improvement, the demand for people who understand and can implement complex systems will surely increase.
finally, there are a lot of people who think the solution is just "add parallelization". oftentimes, it's better to make the existing implementation faster. it's much easier and much more straightforward in many cases. many programs in the wild have never been optimized; some people just had the impression that the unoptimized version would be eclipsed by hardware someday soon. improving the design or algos of existing programs is also an important skill if performance is important - throwing more cores at problems is not necessarily the best or simplest solution.
when targeting modern PCs, most of us who need to implement good parallel systems will not need to go beyond multithreading, locking, parallel libraries, a book's worth of reading, and a lot of experience writing and testing programs (basically, significantly restructuring how you approach writing programs).