Big Data Application with TIBCO
TIBCO’s Accelerator for Apache Spark consists of 40 out-of-the-box building blocks to speed implementations of Spark. In this video, Hayden Schultz, global architect for TIBCO, explains how the accelerator can work with big data and machine learning, and how it speeds time to value for business customers.
To learn more, visit www.tibco.com or contact TIBCO here.
Transcript
One way of thinking about this is while the accelerators and new thing that we’re giving out. The various components are not really new in a sense that any of our customers could have taken our products, and built systems on top of big data clusters. Built systems on top of Spark. They do that without us. What this does, is let someone who’s new to it, use these components. We have an example of customer, what they’re doing with StreamBase and Spark, is they have their currency trading algorithms, are written in StreamBase.
What they want to do, is they wanted to do back testing, which means they have a large amount of historical data, and they want to know whether their new algorithms performed better than their old algorithms. Now if they knew what the new currency data was going to be like, that would be easy, it would be no challenge at all. But they don’t, what they do is they store in their big data cluster, they store all their financial trades that happened. They store all of the raw ethics quotes that happened every day.
What they do to train their new algorithm, that were evaluated and compare it against their old algorithm. They take six months of data, they partition it into one day chunks which is turns out to be like 136 different eight hour chunks of data. Then they run the new algorithm, where StreamBase is running in a work with red inside a Spark Cluster.
They take 136 simultaneous partitions of data, and they run them all in their cluster, and they end up training six months worth of data in under an hour.
Adrian: Interesting.
Hayden: We use to take them, when they started doing this, it was many, many hours. They would typically think it was in days. Now it’s under an hour. They can do, a lot more experimentation, they can evaluate different changes, and get their new versions, of their algorithm to investor. The accelerator is a sample solution, it assumes that you are using the TIBCO products. Now you can swap in and out, other components if you want.
What we start with is, StreamBase for capturing the data. One typical example is you have a Kafka Bus. I mean that’s the initial source of the data or JMS Bus, or just a socket or web services. There’s a large number of adapters we used to connect to the data source. Those are very simple StreamBase applications that connected the data source, and then write it into directly HTFS or possibly write it out using Flume. Then it’s in the big data system. Once you have a large portfolio of data to look at. What you can do is you’re running your data analytics at the … On the data scientist that comes out, and uses in the TIBCO Stack, they would use Spotfire.
They’ll look at the data, connect using there’s a newly certified by Databricks Spark connector from Spotfire in Spark system. They can run Spark sql command, they could run our commands directly from Spotfire. They can even use care or the TIBCO enterprise our run time, to analyze the data as well. In the case of the accelerator for Apache Spark. What we do is, Spotfire prepares the data, understands the relationships and once that’s done, it uses a Spark in water, an H2O layer on top of Apache Spark.
That trains a machine learning model. Okay, and machine learning model is now saved, it’s trained and saved inside the big data. It’s basically serialized out HTFS.
Adrian: If I were to train sum it up, and if I were talking to potential customer of yours. It sounds like the important thing is that when you’re looking at from accelerator, the acceleration what you’re accelerating is the time to value, it’s another used term. But it’s the time to deliver something that’s usable, because you have this template.
Hayden: Exactly, if you look at why new projects fail, and at this large customers. This large companies a lot of projects failed. The reason is, it takes too much investment on the company’s part before they start showing some return. The idea here is, “We’re going to give you something that’s already, it’s front end already and all you do instead of all of the plumbing, that’s done. You work on the business logic.”
Adrian: For someone that hasn’t been working with you, or someone that hasn’t been looking with us. How they can get started?
Hayden: The download us from the TIBCO Accelerator download sites. They have the full source to everything. It’s a totally open product, do whatever you want with it.
Resources
-
TIBCO Blog:Learn more about TIBCO Accelerators for Apache Spark.
-
Use Case:TIBCO technologies and Big Data systems like Hadoop and Spark can be combined to act in real time when significant patterns are detected.
Connect with TIBCO
More about TIBCO
-
TIBCO Software takes businesses to their digital destinations by interconnecting everything in real time and providing augmented intelligence for everyone, from business users to data scientists. This combination delivers faster answers, better decisions, and smarter actions.
-
The TIBCO StreamBase® Complex Event Processing (CEP) platform is a high-performance system for rapidly building applications that analyze and act on real-time streaming data. Using StreamBase CEP, you can rapidly build real-time systems and deploy them at a fraction of the cost and risk of other alternatives.