Big Data Application with TIBCO

TIBCO’s Accelerator for Apache Spark consists of 40 out-of-the-box building blocks to speed implementations of Spark. In this video, Hayden Schultz, global architect for TIBCO, explains how the accelerator can work with big data and machine learning, and how it speeds time to value for business customers.

To learn more, visit www.tibco.com or contact TIBCO here.

Transcript

Adrian: We’re here today with Hayden Schultz, global architect for TIBCO. We’re going to talk about some new products that are coming out. Let’s start with, tell me a little bit about yourself, and how you came to TIBCO.

Hayden: I was working for StreamBase, a small startup in the Boston area that makes the StreamBase event processing product, that TIBCO acquired in 2013. I’ve been working on event processing for the past, for almost 13 years with StreamBase doing mostly financial market things, and high frequency trading, currency trading a lot of that, a lot of asset classes. Now at TIBCO on StreamBase. We move into all kinds of things. Vehicles, internet of things, and today we’ll talk about big data.

We see a lot of solutions, that we find repeatable that’s not necessarily obvious from the customer’s point of view. We have a lot of different products and rather than just giving the customer a bunch of pieces, a bunch of parts they can build complicated things together. What we’re doing is we’re putting in together in common application framework. We build an application in the case we’re going to talk about today, big data application. What it does for customers, it gives them an example of how the products put together, and how they work with other technologies in a scalable, reliable, best practices solution. That you can take this and customize it to something else.

We’ve been calling it a product, but this is something that we’re giving out. This is not something that we’re selling. Example implementations, there are full systems which you can take and customize to user software. We’re releasing them with an open source license, we can do whatever you want with them. We started focusing on Hadoop. Hadoop was in there, it’s part of fundamental building block that Apache Spark uses. When we look at what’s the dynamic growth in the big data arena. Right now we see it at Apache Spark.

One way of thinking about this is while the accelerators and new thing that we’re giving out. The various components are not really new in a sense that any of our customers could have taken our products, and built systems on top of big data clusters. Built systems on top of Spark. They do that without us. What this does, is let someone who’s new to it, use these components. We have an example of customer, what they’re doing with StreamBase and Spark, is they have their currency trading algorithms, are written in StreamBase.

What they want to do, is they wanted to do back testing, which means they have a large amount of historical data, and they want to know whether their new algorithms performed better than their old algorithms. Now if they knew what the new currency data was going to be like, that would be easy, it would be no challenge at all. But they don’t, what they do is they store in their big data cluster, they store all their financial trades that happened. They store all of the raw ethics quotes that happened every day.

What they do to train their new algorithm, that were evaluated and compare it against their old algorithm. They take six months of data, they partition it into one day chunks which is turns out to be like 136 different eight hour chunks of data. Then they run the new algorithm, where StreamBase is running in a work with red inside a Spark Cluster.

They take 136 simultaneous partitions of data, and they run them all in their cluster, and they end up training six months worth of data in under an hour.

Adrian: Interesting.

Hayden: We use to take them, when they started doing this, it was many, many hours. They would typically think it was in days. Now it’s under an hour. They can do, a lot more experimentation, they can evaluate different changes, and get their new versions, of their algorithm to investor. The accelerator is a sample solution, it assumes that you are using the TIBCO products. Now you can swap in and out, other components if you want.

What we start with is, StreamBase for capturing the data. One typical example is you have a Kafka Bus. I mean that’s the initial source of the data or JMS Bus, or just a socket or web services. There’s a large number of adapters we used to connect to the data source. Those are very simple StreamBase applications that connected the data source, and then write it into directly HTFS or possibly write it out using Flume. Then it’s in the big data system. Once you have a large portfolio of data to look at. What you can do is you’re running your data analytics at the … On the data scientist that comes out, and uses in the TIBCO Stack, they would use Spotfire.

They’ll look at the data, connect using there’s a newly certified by Databricks Spark connector from Spotfire in Spark system. They can run Spark sql command, they could run our commands directly from Spotfire. They can even use care or the TIBCO enterprise our run time, to analyze the data as well. In the case of the accelerator for Apache Spark. What we do is, Spotfire prepares the data, understands the relationships and once that’s done, it uses a Spark in water, an H2O layer on top of Apache Spark.

That trains a machine learning model. Okay, and machine learning model is now saved, it’s trained and saved inside the big data. It’s basically serialized out HTFS.

Adrian: If I were to train sum it up, and if I were talking to potential customer of yours. It sounds like the important thing is that when you’re looking at from accelerator, the acceleration what you’re accelerating is the time to value, it’s another used term. But it’s the time to deliver something that’s usable, because you have this template.

Hayden: Exactly, if you look at why new projects fail, and at this large customers. This large companies a lot of projects failed. The reason is, it takes too much investment on the company’s part before they start showing some return. The idea here is, “We’re going to give you something that’s already, it’s front end already and all you do instead of all of the plumbing, that’s done. You work on the business logic.”

Adrian: For someone that hasn’t been working with you, or someone that hasn’t been looking with us. How they can get started?

Hayden: The download us from the TIBCO Accelerator download sites. They have the full source to everything. It’s a totally open product, do whatever you want with it.

Show less

Resources

TIBCO Blog:

Accelerate Your Big Data Effort

Learn more about TIBCO Accelerators for Apache Spark.
Use Case:

Identifying Patterns in Big Data

TIBCO technologies and Big Data systems like Hadoop and Spark can be combined to act in real time when significant patterns are detected.

Connect with TIBCO

More about TIBCO

TIBCO Software takes businesses to their digital destinations by interconnecting everything in real time and providing augmented intelligence for everyone, from business users to data scientists. This combination delivers faster answers, better decisions, and smarter actions.
The TIBCO StreamBase® Complex Event Processing (CEP) platform is a high-performance system for rapidly building applications that analyze and act on real-time streaming data. Using StreamBase CEP, you can rapidly build real-time systems and deploy them at a fraction of the cost and risk of other alternatives.

Subscribe to our YouTube channel for more great videos on real-time insights: