We have an activity metrics page where users can select a date period and see other user's aggregated activity (by action) and optionally filter everything by 4 or 5 fields. Actions happen sequentially, but one of the fields is "Tags" and the user may change old action's tags anytime. The data is in a Cassandra 3.7 with the partition key being company_id, action_year, action_week. For each week we have about 70k actions (there are 20 columns with long or int data for each action, each action with the partition key plus action_timestamp and action_key as row key).
PRIMARY KEY ((company_id, action_year, action_week), action_date, action_key)
) WITH CLUSTERING ORDER BY (action_date ASC, action_key ASC)
With a first version we are querying the full actions for a period and doing all the aggregations and filtering in memory. When the user selects a couple of weeks, the whole request takes like 10 or 15 secs. And we are expecting to scale to thousands of users requesting these analytics that should work as near real time analytics.
We thought of moving the filtering to C* using "allow filtering", but the WHERE clause seems very limited. And we are also worried about the frequent updates for the labels.
What other options do we have? We thought of Druid but maybe it's too much for what we need. Spark maybe? Are we not using C* right and we may need to cache full weeks elsewhere?