By optimizing for GPUs, SQream’s DB update offers up to 15 times faster queries, the company claims.
As more organizations employ graphical processor units (GPUs) as an alternative to x86 processors for processing massive amounts of data it turns out databases optimized for GPUs are starting to gain traction. The race is now on to see which of these databases can accelerate the processing of massive amounts of data best.
SQream has released an update to SQream DB that the company claims offers up to 15 times faster queries for multi-table joins and count distinct operations as well as nearly twice as fast load times. SQream DB 3.0 also now includes a new version of an Apache Spark connector, which has been specifically optimized for two-way interconnect using a communication protocol native to SQream DB.
The rate at which joins can be made is critical when using SQL to query massive amounts of data in near real-time, says SQream CEO Ami Gal. Those queries typically need to be processed in milliseconds, which is difficult to achieve and maintain when relying on traditional x86 processors.
See also: How open source GPUs could speed deep learning, big data analytics
Gal also notes organizations today waste a large amount of time and effort simply loading data into a Big Data platform. The latest update to SQream DB now makes it possible to now load somewhere between 3 to 6TB of data per hour, says Gal.
SQream DB 3.0 address that issues by making it possible to load and transform volumes of data faster by using compressed Parquet files. A SQream DB External Table syntax adds additional flexibility not found with traditional flat-file bulk loads.
“Compression is all about matching the right type of compression to the right data set or type of content,” says Gal.
The latest version of SQream also includes a set of Dynamic Workload Management tools, which enables on-the-fly changes to resource allocation. Workloads can now be prioritized in order of urgency and importance rather than on a first-come, first-served basis.
A database optimized for GPUs from, for example, NVIDIA makes it easier to process large numbers of queries in parallel. Reliance on GPUs to analyze massive amounts of data has increased significantly in public clouds. In addition to being able to store massive amounts of data inexpensively, the cost of accessing expensive GPUs on an hourly basis tends to be lower on a public cloud when a Big Data analytics application runs intermittently. SQream DB can also, however, be deployed in an on-premises environment that might prove to be less expensive for longer running applications. To facilitate usage of SQream in both scenarios SQream 3.0 can now be deployed using a Docker container image.
In general, Gal notes CIOs and chief data officers (CDOs) are starting to collaborate more. Availability of data science skills is limited. Organizations don’t want to hire data scientists that command six-figure salaries only to see them spend their time loading data. Many of those tasks can be handled by traditional IT staff.
It may be a while before databases optimized for GPUs are common across the enterprise. But as organizations embrace Big Data to drive digital business processes infused with artificial intelligence (AI) models that require access to massive amounts of data in near real-time, it’s now only a matter of time before databases that make more efficient use of expensive GPU resources become more widely employed.