I have a reporting system which gets time-series data from numerous meters (here I am referring it as raw_data)
I need to generate several reports based on different combinations of the incoming raw_data
eg:
report1 = raw_data_1+raw_data34-raw_data15
report2 = ....
Also there are several higher order reports depending on other reports
report67 = report3+report5 .. etc
In these reports I am aggregating the data for all time units such as hour,day,month,year. These reports are now running once a day
Currently each report is processed one by one in a loop. which is not an efficient way of processing.
I am looking for a way to combine all the operations for all the reports and supply the whole raw_data to it in a single Spark job