To keep it simple, I have this service that stores data in a persistent volume. There are pods that are created by an API which use a subset of the stored data to perform an operation and generate new data that has to be stored. Both the supplied data as well as the resulting data can be several gigabytes in size.
The containers that perform those operations should be agnostic/unaware of the underlying Kubernetes architecture. My question is: What is the best practice of supplying the pods with their data and storing the resulting data?
For supplying the data I have several ideas:
- Mount the entire persistent volume to the main container. I dislike the fact that the process can access all the data and the container has to know the exact structure of the directory.
- Mount the entire persistent volume to an init container and copy only the files the main process needs to a shared emptydir. This might work but can I limit the files/directories this init container can read? If that isn't possible, the init container has to be aware of the exact structure of the storage.
- Create an API that an init container can call to pull the data to a shared emptydir. This seems to me a bit of an old-school approach and eludes Kubernetes in an anti-pattern kind of way but the advantage is that I can centralize the data reading/writing logic.
- Is it possible to mount only a subset of directories of a persistent volume?
For storing the data I have these ideas, matching the supplying of the data respectively:
- Let the main container write the data directly to the pv.
- Running a side-car with the entire pv mounted that, after the main container is finished, writes the data to the correct directory.
- Use a side-car to send the data to an API that writes the data to the pv.
- Again, mount only a subset of directories of a pv to a side-car?
What do you think are the best/cleanest solutions?