Real-time bidding, also known as programmatic ad buying, is becoming an increasingly popular practice in the world of advertising technology. Digital media holding company CPX Interactive needed a better way to ingest and analyze data points during real-time bidding to produce accurate and valuable bids. This is how they found a distributed, in-memory database to solve their problem.
Industry:
Digital Media
Location:
New York, NY
Business Opportunity or Challenge Encountered
During real-time budding, a dynamic auction process, high volumes of data consisting of billions of data points must be ingested and analyzed in real-time to produce accurate and valuable bids. Such was the challenge for CPX Interactive (CPXi), a digital media holding company that includes bRealTime, AdReady, Consumed Media and Hatched.at. bRealTime provides programmatic solutions for both supply and demand side partners. AdReady provides programmatic technology and managed media services to brands, agencies and performance marketers. Consumed Media creates content opportunities for today’s evolving digital media ecosystem. Hatched.at serves as an internal incubator to solve tomorrow’s advertising challenges.
CPXi maintains 250 billion rows of real-time and historical data for its real-time bidding operations. The company provides multi-screen messaging that leverages display, social, mobile and video advertising to serve billions of managed impressions daily.
While real-time bidding results in faster bids and more targeted ads, many digital media companies are struggling to take full advantage of this new process due to the sheer volume and velocity of data that is associated with this process. For CPXi, collecting and processing billions of impression records every day was a slow and unreliable process. In-stream responses to live ad serving was an expensive and server-intensive implementation. Real-time reporting in the user interfaces used to optimize campaigns caused multi-second wait times for users.
CPXi also used an infrastructure consisting of several data stores. This architecture resulted in a lengthy Extract, Transform and Load (ETL) process that could last from 12 to 24 hours. The data aged as it was processed through the different data stores and caused the bidding process to become less accurate as the data lost relevancy. In addition, ETL was computationally expensive, requiring extra machines and storage to run this lengthy process. Ultimately, the system was not adequate for true, real-time bidding. It was unable to provide the performance and scalability required to meet CPXi’s bidding requirements.
How This Business Opportunity or Challenge was Met
To address these needs, CPXi deployed a distributed, in-memory database to achieve the speed, scale and simplicity afforded by its policy-driven data tiering. The solution, MemSQL, combines in-memory performance with the compression and analytical capabilities of a column store to provide a true real-time bidding solution. This new, consolidated tiered storage architecture with a unified Structured Query Language (SQL) interface reduces CPXi’s ETL process and simplifies the complexity of their database infrastructure by allowing data to easily flow between the row and column stores.
The solution enables instant access to real-time and historical data through a SQL interface and uses a horizontally scalable, distributed architecture that runs on commodity hardware.
CPXi currently runs six leaves and one aggregator on Amazon Elastic Compute Cloud (Amazon EC2), eliminating the expensive machines previously required to run costly ETL. With a lean system in the cloud, the company has bolstered the speed and scale of its operation. CPXi ingests billions of records a day and can execute complex algorithms on both brand-new and historical data in real-time.
Measurable/Quantifiable and “Soft” Benefits from This Initiative
With the new in-memory implementation, CPXi is able to leverage all of its data assets and is able to gain a more expansive view of bids as they flow through the system. “We no longer have data that is left unused,” said Gil Resh, Senior Vice President of Product and Technology at CPXi. The in-memory database technology “gives us the speed and capacity to do a front-line ingest into row store and then a real-time transfer of the data to our column store for analysis.”
Historical and real-time data can be analyzed and stored, which results in the ability to build more accurate data models around real-time bidding. The new system reduced CPXi’s ETL processing tasks from 12 to 24 hours to just seconds. In addition, switching from Hadoop to MemSQL allowed CPXi to cut 50 percent of its EC2 instances and 50 TB of storage in Amazon Cloud, saving hundreds of thousands of dollars per year.
Switching from Hadoop to in-memory technology enabled the company to eliminate six EC2 boxes and 50 TB of storage in Amazon Cloud, according to Sara Robertson, Vice President of Technology at CPXi. “On top of the cost savings, we’re now able to run aggregation jobs all day long and see results immediately, instead of having to run things overnight and wait until the next day to see if something worked or not,” Robertson explained.
Lessons Learned
Along with real-time data management, the new approach helps CPXi meet its skilled staffing requirements. As CPXi’s team continues to grow, new database administrators can apply their knowledge of SQL without having to learn new languages. This helps decrease the time and cost of onboarding new employees and ensures continuity of knowledge within the team. CPXi is building a data-driven culture by making data accessible, usable and valuable to everyone through a familiar SQL interface.
Source: MemSQL
Want more on this topic?
Research from Gartner: Real-Time Analytics with the Internet of Things
From the Center to the Edge: The IoT Decentralizes Computing
For Manufacturers, IoT Means the ‘Internet of Tools’
Becoming an ‘Always On’ Smart Business
Liked this article? Share it with your colleagues using the links below!