Berlin, (June 1, 2016) by Hagen Rother (Lead Architect, LiquidM) – Detailed reporting is a crucial factor in mobile advertising. In this article we describe how LiquidM implemented its reporting system with Druid.

We are in the ad tech business. Let’s simplify the reality to every single banner you see was run through an auction. Think ebay for “show advertisement now on this specific device and user”. An auction runs for a maximum of 100ms. Just to put that in perspective, at the speed of light a ping across the Atlantic is ~80ms. If you want to participate in these auctions, you need responsive software, you need to geo-distribute, and you need to be able to handle an enormous amount of data. LiquidM provide a white labeled solution to participate in such auctions, so you don’t have to worry about all the heavy lifting.

When I joined LiquidM, we were doing about 3.000 such auction requests per second. Today, we scaled to more than 500.000 req/s. Every single one of them is at least one event. Events we get money for. Events with 50+ dimensions. And every single one of them in every single combination is relevant to our customers in some form of aggregated report. In realtime, over a three months sliding window. We use kafka and gobblin to create a raw log in hdfs. That’s great for long-term archival, but we need snappy reporting, e.g. we need to pace budgets based on actual views and clicks. That’s not a small task. The first generation was mysql, it didn’t scale writing. The second attempt was mongodb, but it didn’t scale on indexing and replication. It’s also not reliable and we never managed to rebuild the current state from the raw log. We needed to fix reporting glitches by reindexing the filtered raw log. Asap of course.

If you read the lambda architecture papers, well that’s exactly what you need in this scenario. We use druid to implement it. I would not be surprised if we were the first outside of Metamarkets in production. We bet the company on it, it’s been a wild ride, we are still here, and growing fast. I can strongly recommend it. The footnotes on that statement are getting shorter by the release.

DevOps view

It’s all about scaling. For big data, you just have to scale horizontal: You buy lots of boxes. We are small from the perspective of the Internet; even we have more than 100 dedicated servers. That opens a can of worms. You need to think about replication and it must not source from the already overloaded nodes. Not fun if you ever encounter that in production.

But usually you don’t run into such issues, because you hit the single writer anti-pattern first. My aggregated event stream can exceed 10gbit, how am I possibly writing that all with a single writer?

Preindexing is also doomed to fail. The cardinality of the search options easily overwhelms your physical writing capability, both in bandwidth and storage costs. Not fun if you ever encounter that in production.

Finally, If you want to bottleneck on ram, try copying your data into memory rather than mmap’ing.

Druid’s architecture is a work of art. It tackles every single of these challenges and provides you with total flexibility. To the point that you can map request latency to ISP pricing for a given events/s * query window. Make scaling a management decision and automatize the ordering and setup. It’s fairly straight forward.

Let’s go through the anti-patterns and see how druid turns out:

Horizontal scaling
Druid can scale horizontally both on the intake as well as on the query side. As always, if you understand the details, you can do fine-tuning, but even to a novice, simply adding CPU will increase concurrent query performance.

Source-Replication
We have a set of active nodes. If queries saturate them, we need to add more servers. If they source from the existing nodes, well they are already overloaded to begin with. Druid avoids this problem by introducing the concept of a deep storage. Serving the database files itself and serving queries against them are distinct. We recommend HDFS as deep storage; there is S3 if you are on AWS. The deep storage is pluggable, so you will be able to adjust Druid to your setup.

Single-Writer
Druid uses the concept of segments. Each segment is essentially an independent database containing a subset of your total data. Queries in Druid are all designed to aggregatable across such segments. There is a whole range of sharding options in Druid which allows both an arbitrary number of writers as well as an arbitrary number of concurrent query processors (called historical nodes). It also allows for sliding windows of data, e.g. we offer 3 months to our customers and that’s just a simple configuration in Druid.

Pre-Indexing
Druid is column-oriented. While cardinality of each column is still something to watch out for, it means that pretty much any query comes back almost instantly. And when you have to think about cardinality, tricks like hyperloglog are readily available. You can also decrease the time resolution of your data to achieve a database size you are willing to support.

Mmap’ing (or the lack of it)
Mmap is one of these kernel features every developer should know about. In essence it’s using the kernel’s swapping feature for your data. Given a database size (see above), mmap allows me to balance between RAM, SSD, HDD, and now NVM-e. The later we are just introducing into our cluster, even the very first node in the cluster was already noticeable by user (no benchmarking necessary).

Summary

While Druid is pretty young compared to other databases, it’s already been used for several years in production. While at first, the plethora of daemons might be overwhelming, the net result is pretty much infinite scaling. Make your database performance a business decision (granularity * time window * cardinality / hardware), with very little effort you can put a price point on each of these decisions and leave it to the product people to decide. And that’s pretty much DevOps nirvana.

About LiquidM:
At LiquidM, we are software and technology experts who are passionate about bringing the world’s best products to the rapidly growing mobile advertising industry. We are focused on building the first and only white-labeled Mobile Advertising Management Platform.
Built from the ground up for mobile, LiquidM’s white-labeled Mobile Advertising Management Platform (MAMP) allows mobile media buyers to optimize their campaigns across the full range of premium to performance advertising. LiquidM’s modular, cloud-based SaaS replaces Build-Your-Own (BYO) or inadequate point solutions with a standardized, open platform that is customizable to individual needs.