TimescaleDB, which is optimized for complex queries on time-series data, says it can solve some pressing challenges with IoT data analysis.
We’ve written frequently about how data integration is one of the most pressing challenges and largest expenses for IoT projects. Complex queries on time-series and geospatial data are also crucial for IoT use cases such as predictive maintenance, energy management, and logistics.
But when it comes to IoT analytics, businesses often turn to in-memory relational database systems or in-memory data grids. Both are designed for speed, though in terms of infrastructure, the RDBMS is designed to scale up (add more resources to an existing node) while the data grid is designed to scale out (add more nodes).
Generally the scale-out architecture is useful for handling spikes in data analysis workloads, such as transaction processing or web traffic. But data grids have limited support for SQL, which is the de facto standard of advanced data analysis.
With IoT, “the volumes are much greater than we ever had,” said Mike Freedman, a professor of computer science at Princeton University. “And so currently you’re basically asked to make a trade-off. You could scale something like a NoSQL database, but then there’s not that many questions that you could ask of your data. Or you could use a traditional relational database. And you can ask a lot of questions but you can’t scale.”
Another challenge that emerges with IoT projects is data silos. Often an IoT use case such as predictive maintenance depends on the ability to compare real-time with historical data. At a more basic level, consider fitness trackers. There’s tremendous value in fitness tracker data beyond merely measuring steps, heart rate, and other indicators. Software teams might want to uncover bugs; product teams may want to improve the device. That presents a challenge for companies who didn’t think about data integration and access from the start.
“In many cases there are situations that, because of the way people structured their data, even if they want to use it more than they use it now, they are inhibited in their ability to do so,” Freedman said.
So what’s the solution?
During Strata + Hadoop World on March 15, Freedman will be speaking about a database he helped develop, TimescaleDB, that tackles the challenge of scaling SQL and breaking down data silos.
TimescaleDB, an open-source database built on PostgreSQL, creates a hypertable for data, splitting it into chunks across different servers. The database, which has integrations with Kafka, Tableau, Pentaho and other tools, scales by partitioning hypertables in two dimensions: by time interval and by a “partitioning key” related to a primary data index—for example, device identifiers for sensor data, locations, customers, or users.
Standard SQL queries are optimized to access the right set of chunks. JOINs can then be used to link the time-series data from the sensors to data in a relational database. There’s also an extension to query geospatial data.
Why scale SQL queries for time-series data? TimescaleDB, formerly iobeam, offers the example of a smart home, where time-series data from device telemetry can be effectively stored and queried in real time. Questions might include: How many errors were generated in the last 24 hours by firmware version? How has the latest software update affected battery life? Answering such questions can improve device performance and customer retention.
In another example, a company used TimescaleDB to track a fleet of trucks and compare that to where the trucks are supposed to be – useful in cases such as traffic congestion management and supply-chain analytics. The data analysis was performed in a single query using a JOIN between TimescaleDB and a relational database.
Because most businesses already have databases and other tools compatible with SQL, companies “don’t need to write special software” or “learn a whole new set of things,” Freedman said. Another big argument for SQL is that “it’s the de facto language for creating data.”
And now, he says, companies can ask complex questions of fast-moving machine data at scale.