When is a 'core' library a bad idea?

Question

When developing software, I often have a centralised 'core' library containing handy code that can be shared and referenced by different projects.

Examples:

a set of functions to manipulate strings
commonly used regular expressions
common deployment code

However some of my colleagues seem to be turning away from this approach. They have concerns such as the maintenance overhead of retesting code used by many projects once a bug is fixed. Now I'm reconsidering when I should be doing this.

What are the issues that make using a 'core' library a bad idea?

Having a core library is a good idea when the code is commonly reused, but it needs to be religiously tested, including unit tests and other space technology. — Job, Jul 18 '11 at 02:33
The retesting concern is very valid. Would you like to find out you broke a maintenance project 6 months back? — , Jul 18 '11 at 08:04
I can't imagine rewriting all of my utility code each time I needed it. — , Jul 19 '11 at 05:56
see also: [Does software reuse preclude process repeatability](http://programmers.stackexchange.com/q/204807/31260) — gnat, Jul 17 '13 at 12:46

Denis de Bernardy · Answer 1 · 2011-07-18T03:35:41.873

Core libraries are bad when they start suffering from feature creep, and very bad when they aren't well maintained.

You might find this article interesting for an extended view point (which I agree wholeheartedly with):

http://www.yosefk.com/blog/redundancy-vs-dependencies-which-is-worse.html

Don Knuth: "To me, 're-editable code' is much, much better than an untouchable black box or toolkit... you'll never convince me that reusable code isn't mostly a menace."

score 3 · Answer 2 · answered Dec 17 '17 at 12:54

I will offer a slighty different take on this. A core library, in many cases, is an excellent idea!

If you have two separate projects, they should be in two separate code repositories. Now they depend on common functionality. Let's consider for example packet processing applications. The common functionality may include:

Memory allocators
Address resolution protocol
AVL tree
Serialization code for binary protocols
Dynamic array
Linux kernel style hash list with singly linked head and doubly linked middle nodes
Hash table
TCP/IP header processing code
Regular linked list with doubly linked head and doubly linked middle nodes
Logging library
Miscellaneous (trust me, you need this for small and trivial stuff or your number of different modules will be as great as 100!)
Packet capture library
Packet I/O interface library
Packet data structure
Blocking queue for inter-thread communication
Random number generators
Red-black tree
Some kind of timer implementation

Now, different packet processing applications may need a different subset of these. Should you implement one core library with one source code repository, or should you have 18 different repositories for each of these modules? Remember that these modules may have inter-dependencies, so most of these modules may depend on e.g. the miscellaneous module.

I will claim that having one core library is the best approach. It reduces the overhead of many source code repositories. It reduces the dependency hell: a particular version of memory allocators may need a particular version of miscellaneous module. And what if you want memory allocator version 1.7 depending on miscellaneous 2.5 and AVL tree version 1.2 depending on miscellaneous 2.6? You may not be able to link miscellaneous 2.5 and miscellaneous 2.6 at the same time to your program.

So, go ahead and implement the following structure:

Core library repository
Project #1 repository
Project #2 repository
...
Project #N repository

I have seen that switching to this kind of structure from the structure:

Project #1 repository
Project #2 repository
...
Project #N repository

Has led to reduced maintenance and increased code sharing via non-copypaste mechanisms.

I have also seen projects using the following structure:

Memory allocators repository
Address resolution protocol repository
AVL tree repository
Serialization code for binary protocols repository
Dynamic array repository
Linux kernel style hash list with singly linked head and doubly linked middle nodes repository
Hash table repository
TCP/IP header processing code repository
Regular linked list with doubly linked head and doubly linked middle nodes repository
Logging library repository
Miscellaneous repository (trust me, you need this for small and trivial stuff or your number of different modules will be as great as 100!)
Packet capture library repository
Packet I/O interface library repository
Packet data structure repository
Blocking queue for inter-thread communication repository
Random number generators repository
Red-black tree repository
Some kind of timer implementation repository
Project #1 repository
Project #2 repository
...
Project #N repository

...and the dependency hell and repository number proliferation have been genuine problems.

Now, should you use an existing open source library instead of writing your own? You need to consider:

License problems. Sometimes the mere requirement to give credit to the author in the documentation provided may be too much, as 20 libraries will usually have 20 distinct authors.
Different operating system version support
Dependencies of the particular library
Size of the particular library: is it too large for the provided functionality? Does it provide too many features?
Is static linking possible? Is dynamic linking desirable?
Is the interface of the library what you want? Note that in some cases writing a wrapper to provide the desired interface may be easier than rewriting the entire component yourself.
...and many, many other things I have not mentioned in this list

I usually use the rule that everything below 1000 lines of code that does not require something beyond the programmer's expertise should be implemented on your own. Note: the 1000 lines includes unit tests. So I certainly won't advocate writing 1000 lines of code on your own if it requires 10 000 additional lines for unit tests. For my packet processing programs, this means the only external components I have used are:

Everything provided by a standard Linux distribution, because it's so many lines of code that it doesn't make sense to reimplement Linux. Parts of reimplementing Linux would also be beyond my expertise level.
Bison/flex because LALR parsing is beyond my expertise level and over 1000 lines of code. I could certainly write a recursive descent parser on my own, but Bison/flex are so handy I see them as useful.
Netmap, because it's over 1000 lines and beyond my expertise level
Skip list based timer implementation from DPDK, because it's beyond my expertise level although it is less than 1000 lines of code (although I have alternative timer implementations not using skip lists)

Some things I have implemented on my own because they are simple include even things such as:

MurMurHash
SipHash
Mersenne Twister

...because custom implementations of these can allow heavy inlining, leading to improved performance.

I don't do cryptography; if I did, I would add some kind of crypto library in the list, as writing crypto algorithms on your own may be susceptible to cache timing attacks even if you can with thorough unit testing show them to be compatible with the official algorithms.

score 3 · Answer 3 · answered Jul 18 '11 at 03:15

Using the idea of a core library being bad when multiple projects depend on it, is like saying you shouldn't use jQuery for the web, libxml in you *nix apps, or any other framework or library. Look at the entire ecosystem of modern development (DRY, OOP, etc) and every single app is built off a set of libraries and frameworks.

What can be bad is if you don't have any type of unit tests, you don't regression test and you don't use any type of API/ABI with your library. If all of your applications have proper tests, your library has proper testing, and you make sure if you break function calls you update api version number appropriately.

For complete coverage, what one would probably want is when changes are made to the Library, you can run a set of tests that will verify the API hasn't been broken, and that the execution of all the code is bug free. Then you can pull in the latest library update into your application and run the same set of tests. If you update the API, then it should be documented so you know what you need to do in your application to update it. Either way, when you run the tests for your application, then you can be as confident as you are in your tests that nothing has broken.

When using jquery, mootools, whatever javascript library or framework, you can't just blindly use the new version, sadly you can't even with a minor 1.6.z release sometimes.

score 3 · Answer 4 · answered Jul 18 '11 at 03:41

They have concerns such as the maintenance overhead of retesting code used by many projects once a bug is fixed.

If you have a comprehensive set of unit tests for the core library; that is not an issue. No code will be checked in unless all tests pass. If you do introduce a defect you write a failing test to reproduce the defect and fix it; then you'll always be testing for that error as well. Forever.

Also the functionality you describe is very easy to write unit tests for.

As a side issue you might want to have more than one core library so you don't have to include the RegEx code unless you want to.

score 1 · Answer 5 · answered Nov 18 '12 at 16:43

One point not yet mentioned is that any code is going to have dependencies on something, even if it's literally the only thing running in the ROM of an embedded microcontroller; if the manufacturer of the controller changes some behavior which the code relied upon, the code will either have to be modified to work on chips manufactured after the change, or else manufacturers of the device which uses the code will have to somehow acquire chips which do not incorporate the change--possibly paying a price premium for them.

Using a library to perform various hardware functions may mean that code is now dependent upon a library whereas it hadn't been previously, but it may also eliminate dependencies between the code and the hardware. For example, a chip manufacturer might promise to supply a library for all present and future chips which will always perform certain I/O functions a certain way. Code which uses that library to perform those I/O functions would become reliant upon the manufacturer to supply appropriate versions of that library, but would no longer be dependent upon the manufacturer to use the same hardware implementation of those functions.

Unfortunately, it's often hard to know which is the correct approach for future-proofing code. I've seen cases where a chip vendor changed the way a library worked (so as to accommodate new chips), even when it was being used to access a chip which had changed. I've also seen cases where a chip manufacturer changed the way its hardware worked, but supplied libraries were adjusted appropriately, so code which used a library routines would continue to work without change, while code which accessed hardware directly had to be adjusted.

Similar situations exist with Windows applications. Microsoft sometimes loves to change the way applications are required to do things; code which uses certain libraries for such things may be upgraded simply by updating the library, while code which does not use libraries that get updated for them must be updated manually.

score 1 · Answer 6 · 2017-12-17T07:36:36.427

I wanted to chip in with a slightly different take to this, though I love Denis de Bernardy answer and linked article about minimizing dependencies vs. minimizing redundancies (they highly reflect my own thoughts on this issue where I believe code reuse is a balancing act).

The biggest problem I have with a core library is this:

When is it complete? When will it reach a point of stability where it will do all it needs to do and effectively be "done"?

And I think it is very likely that the answer could be "never". People might always be tempted to add to it since it models such a nebulous idea, especially if this library is just evolving during the development of the software instead of having well-anticipated goals upfront. And maybe adding to the library isn't the worst thing in the world since it won't break existing dependencies to the library, but given such nebulous goals, the library could grow increasingly eclectic and ugly, providing disparate functionality of which someone interested in using the library might only find a small portion of it applicable to their needs.

The dependencies in your codebase should ideally flow towards very stable packages. A core package could easily find itself very unstable while huge portions of your codebase have dependencies flowing towards it.

So I think it's worth splitting the library up into more uniform libraries devoted to doing something more specific than just, "core library of whatever stuff people might frequently need" so that it can grow in a more uniform direction with better coordination among your team mates about exactly what it should and, more importantly, shouldn't do, and potentially reach a point of stability where it's well-tested and you don't feel like there's anything more that needs to be added to it for it to be relatively "complete" and stable (as in, unchanging).

score 1 · Answer 7 · answered Jul 18 '11 at 02:45

1

A core library can be bad when multiple projects depend on it, not only do you have to test any changes to your core but you also have to regression test every single dependent project. Secondly, your core APIs can never change because you will have to refactor every dependent project. The more projects that use your library, the deeper the trap.

Another problem is the tendency to start throwing everything "common" into your core library, bloating it and making it harder to pull in for small pieces. I'll just say that once upon a time I heard of a place that became afraid to touch any of their numerous core libraries, the overhead of QA regression testing was so great.

Instead, maybe you can create a code snippet resource to let project teams search and pull in the code they need and sever themselves from any maintenance or regression issues? That's what I do at home, anyways.

answered Jul 18 '11 at 02:45

Patrick Hughes

1,371
1
8
12

4

It's a lot harder to fix a bug in code snippets that have been copied and pasted to several places though isn't it? – Alex Angas Jul 18 '11 at 04:51
A quote from Donald Knuth: "I also must confess to a strong bias against the fashion for reusable code. To me, “re-editable code” is much, much better than an untouchable black box or toolkit. I could go on and on about this. If you’re totally convinced that reusable code is wonderful, I probably won’t be able to sway you anyway, but you’ll never convince me that reusable code isn’t mostly a menace." – Patrick Hughes Jul 18 '11 at 13:58
@AlexAngas: That is true, but there may be cases where a library is buggy, but works correctly only *because* some other library has subtle bugs that offset the bugs in the first. While both sets of bugs should be fixed when practical, having a copy of the source code of the second library be part of the project with the first would mean that an applied bug fix for that code would be a recognizable change to the project, which could be temporarily rolled back if it breaks things (thus allowing it to be identified as the cause of the breakage). – supercat Nov 18 '12 at 16:23
@AlexAngas: Of course, identifying the fix to the second routine as the cause of the breakage doesn't mean the remedy is not to fix the second, but rather it points to the fact that *some code is erroneously relying upon that routine's errant behavior*; that discovery will be the key to efficiently solving the real problems. By contrast, if all one knows is that code which used to work spontaneously stopped working, it will be very hard to track down what to do about it. – supercat Nov 18 '12 at 16:30

score 0 · Answer 8 · answered Nov 18 '12 at 16:56

0

Writing libraries for basic things like strings and linked lists is quite silly in this millennium. Use a batteries-included programming language which has the core functionality already in it.

If you like writing core run-time support libraries just for fun, then design a new programming language. If you do that in an application, then essentially you're growing a language out of its side.

Besides, hasn't someone already written N different core libraries in the language you're using? Researching existing frameworks and picking the best-suited one may be better use of time than doing it from scratch.

answered Nov 18 '12 at 16:56

Kaz

3,572
1
19
30

In my field, high-performance packet processing, certainly using a batteries-included programming language isn't an option. C is the obvious choice. And no, the N different core libraries available for e.g. hash tables are worse than the Linux kernel implementation. The Linux kernel implementation, being GPL'd, requires you to manually implement a similar implementation on your own without looking at the Linux kernel source code, but with knowing the advanced hash table features the Linux kernel implementation uses. This may vary on the field, however. – juhist Dec 17 '17 at 13:04

When is a 'core' library a bad idea?

8 Answers8

Linked