0

I've a data streaming platform (Nifi) where I need to transfer tables of data between databases on given schedules. I want to be able to calculate in advance the optimum batch/schedule size i should use.

Assume I've three databases with different number of tables, each with a specific row count and row size. The schedules are 30mins, 2hours, 8 hours. Lets assume I have a three maps with the table row count and size for each flow.

Is there an algorithm which over a 24hour period will plot the peaks and lows of the number of 10,000K data packets that will be processed by the platform in a give 30min period?. I realise that when the 30min, 2h and 8h schedules align i'll have a peaks.

The second question is if then want to try and balance to peaks. Let's assume there is a priority on the 30m and 2 hour schedules. Rather than releasing all the tables of the 8 hour schedule in one go, is there an algorithm that will say how long the total data of the 8 hour schedule will take to process?

Any advise would be appreciated.

emeraldjava
  • 174
  • 6
  • Have you tried transferring data continuously so there aren't any peaks? – user253751 Dec 17 '20 at 21:53
  • I guess I'm trying to future plan the optimal schedules before we deploy or run any data across the platform. I suspect tracking a running-average value and checking if the incoming queue is larger will be an approach. I guess someone will have had the same issue and perhaps the algorithm/model exists in a library which i could reuse. – emeraldjava Dec 18 '20 at 08:08

0 Answers0