we provide a real-time analytics dashboard for Marketers. With our dashboard they invest $$$ budgets wisely. We aggregate some 8 billion daily events in real-time and our solution could not handle this load - dashboard just loaded forever and the Kafka lags were our daily and nightly headache. Product constantly demanded new features and guess what - we just couldn't do it! Moreover, we faced dangerous failures and the risk of losing serious data - something we obviously couldn't afford to do.
We started looking for a new infrastructure: We tried different databases and technologies and none of them provided the desired solution. We tried Cassandra, Mongo, Redis and Druid - with no success.
Join me on our journey and I will show you the current solution that implements real-time aggregation over MemSQL integrated with the batch processing over Apache Spark. The new architecture solved not only our pains but allowed us to aggregate X10 amount of data with much faster response times, keep up with product demands and it was a cheaper solution from the production cost perspective.
Some databases one might try
- Amazon Redshift