A system that sometimes will need to use a pretrained machine learning model. That model is about 10Gb on disk, and when loaded uses about 10Gb of RAM.
Loading it from disk takes a nontrivial amount of time, so in general I wish to not do it too often. Certainly not every function call against it.
Right now, I am using a Lazy Loading-(ish) pattern, where the first time a function call is made against it it is loaded then stored in a global variable.
This is nice, because doing some runs of my system it will never be needed. So loading it lazily saves a couple of minutes on those runs.
However, other times my system is running as a long-running process (exposed via a web API). In these cases, I don't want to be using up 10GB of RAM all the time, it might be days (or weeks) between people using the API methods that rely on that model, and then it might be used 1000 times over 1 hour, and then be unused for days.
There are other programs (and other users) on this system, so I don't want to be hogging all the resources to this one program, when they are not being used.
So my idea is that after a certain amount of time, if no API calls have used the model I will trigger some code to unload the model (garbage collecting the memory), leaving it to be lazy-loaded again the next time it is needed.
- Is this a sensible plan?
- Is it a well-known pattern?
- Maybe it is not required and I should just trust my OS to SWAP that out to disk.
This is related to Is there a name for the counterpart of the lazy loading pattern? However, that question seems unclear as to if it is actually just asking about memory management patterns in general.