We are running jobs whose parameters come from a web page and are executed on large files on a spark cluster. After processing, we want to display the data back, written to text files using
rdd.saveAsTextFile(path)
We have a session id that is a common root for the output folders. Meaning it is a random folder but linked to the user session id.
What is a good way to keep track of, pointers to the different files, send pages back to the front end?
Meaning so we can have a list of files and send the results back to a monitoring (summary) and detail page showing the contents of the files.