0

A system that sometimes will need to use a pretrained machine learning model. That model is about 10Gb on disk, and when loaded uses about 10Gb of RAM.

Loading it from disk takes a nontrivial amount of time, so in general I wish to not do it too often. Certainly not every function call against it.

Right now, I am using a Lazy Loading-(ish) pattern, where the first time a function call is made against it it is loaded then stored in a global variable.

This is nice, because doing some runs of my system it will never be needed. So loading it lazily saves a couple of minutes on those runs.

However, other times my system is running as a long-running process (exposed via a web API). In these cases, I don't want to be using up 10GB of RAM all the time, it might be days (or weeks) between people using the API methods that rely on that model, and then it might be used 1000 times over 1 hour, and then be unused for days.

There are other programs (and other users) on this system, so I don't want to be hogging all the resources to this one program, when they are not being used.

So my idea is that after a certain amount of time, if no API calls have used the model I will trigger some code to unload the model (garbage collecting the memory), leaving it to be lazy-loaded again the next time it is needed.

  • Is this a sensible plan?
  • Is it a well-known pattern?
  • Maybe it is not required and I should just trust my OS to SWAP that out to disk.

This is related to Is there a name for the counterpart of the lazy loading pattern? However, that question seems unclear as to if it is actually just asking about memory management patterns in general.

  • 3
    You appear to be describing a cache. No idea if cache is an "official" pattern or not, but there's no end of information on the net on how to implement caches. – David Arno Feb 15 '18 at 09:32
  • 7
    In particular, this is probably handled by the OS virtual memory system. I would suggest measuring the performance of "do nothing, let OS do it's job" before coming up with a caching scheme yourself – Caleth Feb 15 '18 at 09:36
  • 2
    Why is the using up 10GB of ram all the time a problem? Does it interfere with other processes? – Pieter B Feb 15 '18 at 11:50
  • 10GB is nothing nowadays. And unless your able to optimize your disk usage by cutting the data between files, or just using the file cursor to read what you need (like does RDBMS) or others disk related stuff, caching the whole thing in memory is just the simplest approach. Unless for some reason you know your model risk to grow toward 100GB/1T of RAM. – Walfrat Feb 16 '18 at 07:57
  • @Caleth that sounds like an answer, do you want to make it one? (In-particular it sounds like you are saying my 3rd point was roughly correct) – Frames Catherine White Feb 19 '18 at 03:40

3 Answers3

5

The normal approach to caching would be to put stuff into the cache when you need it and remove it when you try to put a new thing into the cache but don't have enough memory to do so.

You then have a variety of ways to specific which of the things in the cache you want to remove first. Oldest, least used, biggest etc etc.

You don't say which language you are using, but I would be surprised if there were not already some caching libraries available for you to use.

Ewan
  • 70,664
  • 5
  • 76
  • 161
  • Caching isn't applicable to a single object though, is it? – Frames Catherine White Feb 16 '18 at 12:53
  • what do you mean? – Ewan Feb 16 '18 at 13:05
  • As in my ML model is a single large object. It is basically a huge array, that I need to run matrix multiply my input with. Either I have the whole thing loaded, or I have none of it loaded. (For sake of scoping this question, it is useless to load part of it) – Frames Catherine White Feb 16 '18 at 15:58
  • 1
    wait. are you saying you want to unload it, just to make your program use less memory, not because you need the memory for something else? – Ewan Feb 16 '18 at 16:00
  • Yes, correct. My program is not the only program running on the system. I don't want it to hog resources it doesn't need – Frames Catherine White Feb 16 '18 at 16:01
  • 1
    shrug, well you can just use cache time expiry. Only having one object in the cache is unusual but it doesn't change the functionality at all – Ewan Feb 16 '18 at 16:03
  • you would have to be careful not to end up with two instances of the object. one in the cache and a copy the code is working on – Ewan Feb 16 '18 at 16:04
3

Many garbage collected languages have the concept of a WeakReference<T>, including Java and .Net. This allows your code that loads the data to have a reference that can be garbage collected. As long as something in your code has a strong reference to the data it remains in memory.

This allows a fairly flexible way to lazy load data on demand, but normal memory pressure will purge the things that aren't being used any longer.

A typical use case would be something along these lines:

Get the value from the reference

If the value is null,
    load the data
    set the reference

return the data

The exact mechanism depends on your language. What this means is that you don't have to do your own garbage collection mechanisms, and it allows data to be unloaded to make room for new data.

However, there are many alternatives to explore:

  • Flyweight pattern with instance pooling
    • Allows fewer object references in memory since you likely are only referencing a handful of individual records at once
    • Makes sense when object creation is expensive but setting data on an existing object is fast--which can be true of any object when there are enough of them in RAM at once.
  • Caching
  • Manual garbage collection
Berin Loritsch
  • 45,784
  • 7
  • 87
  • 160
0

You don't need to do anything.

So long as your system has sufficient virtual memory (i.e Page file, Swap), the OS will swap out memory that hasn't been accessed in a long-time to disk.

You are basically describing implementing this manually, which on any normal system will not be required.

(This answer expands on the comment by @Caleth)

  • Can you simply memmap the model read-only, or at least most of it? If so, you can avoid almost all need for additional backing-memory. – Deduplicator May 23 '18 at 10:09
  • @Deduplicator for the purposes of this question, which is in the abstract, no. – Frames Catherine White May 23 '18 at 10:26
  • It might even make sense to switch to a different deep-learning network that allows the file/data to be on a SSD or NVMe, and to perform the inference directly streaming from the SSD or NVMe while not using much of the DRAM on the computer. Production (productization and deployment) optimizations are the reasons why there are competitions between deep-learning frameworks, and why people find it advantageous to switch from one to another. If you decided that it is not a premature optimization, then you should have confidence in your decisions. – rwong May 23 '18 at 11:46