For complex stock market analytics, a government agency used Apache software, the cloud, Amazon Web Services, and Cloudera. The result: faster analysis and millions in savings.
Name of Organization: Financial Industry Regulatory Authority
Industry: Government
Location: Washington, DC USA
Business Opportunity or Challenge Encountered:
The Financial Industry Regulatory Authority (FINRA) is an independent regulator authorized by Congress to promote investor protection and market integrity. The agency oversees every brokerage firm and broker doing business with the U.S. public and monitors trading on the U.S. stock markets.
The need for a stock market analytics solution is of course crucial to FINRA, which monitors approximately 30 billion market events every day – which spike to 75 billion on busy market days. “Our data fluctuates constantly,” says Matt Cardillo, senior director at FINRA. Market regulation “is a big data problem and it always has been.”
The agency oversees more than 4,000 brokerage firms and more than 370,000 registered securities representatives. Every day, the agency performs complex stock market analytics, looking for fraud and noncompliance through orders to buy, routings of orders, cancellations, trades (more than 13.9 billion were recorded on just one day in August), and quotes, all going into a repository with more than five petabytes of data. The agency looks to reconstruct various trades from broker dealers and exchanges, by integrating datasets from various feeds through scanning and pattern matching.
FINRA receives data feeds from securities firms and various exchanges. The first step is to validate and integrate that data. Once that’s done, the data must be carefully managed and versioned so that it can be available for various stock market analytics, including ad hoc access by analysts, as well as analysis based on an extensive library of surveillance patterns.
When a stock spikes before a deal is announced, FINRA investigators need to quickly react. “We analyze market data, we do that with our surveillance program, which essentially are algorithms that are combing through the data, and looking for things that are out of range of what we would consider normal,” Cardillo explains. “We have exceptions and alerts that kick out. We have analytics on top of that that help confirm to our users that there is a problem.”
While most of FINRA’s analysis are lookbacks over past weeks, months, or quarters, the agency seeks greater real-time awareness as well. “Things are moving toward a real-time scenario,” Cardillo says. The time the agency can engage in lookbacks has been dramatically compressed, from 90 days to 90 seconds.
The agency needed to redevelop its platform to provide analysts with more confidence with what they were looking at, as well as reduce manual efforts to accomplish analysis. The agency also wanted users to have direct access to data, without having it intermediated by a technology group.
How This Business Opportunity or Challenge Was Met:
Two years ago, FINRA started an effort to reinvent the agency’s platform from an in-house data center to a cloud-based environment that could support rapid access and viewing of key data. The agency also sought self-service tools, so its business users could access analytics without being bogged down with requests to the agency’s IT departments.
FINRA adopted Amazon Web Services (AWS) to support its new infrastructure, incorporated Cloudera, and used a host of open-source technologies to reduce costs. In a major upgrade to the Order Audit Trail System (OATS), FINRA uses Apache Hadoop to link 30 billion events per day into a holistic picture of market activity. This vast volume of data is made available to users by the new FastOLA application, which uses Hadoop and Hbase to respond to user queries that formerly took hours and were reduced to minutes. Hadoop, along with Hive, is also the foundation of the large library of surveillance analysis patterns that analyze trade sequences and relationships over different time periods to identify instances of non-compliance or market manipulation.
Apache Hadoop is integral to the validation and integration as well as to the analytics. Hadoop is used to integrate valid data and create order lifecycle graphs for each new order in the National Market System (NMS) and Over the Counter (OTC) equity markets. Even after the initial creation of these graphs, the database is progressively updated as both new and revised data is received from firms and exchanges.
Hadoop, along with Hive, is also the foundation of the process for executing surveillance patterns. These patterns analyze trades over specific time periods to identify instances of non-compliance or market manipulation. Finally, HBase is being used to support extremely rapid access to the order lifecycle graph database for the most common analyst queries.
Measurable/Quantifiable and “Soft” Benefits From This Initiative:
The use of HBase to access the order lifecycle graph database has reduced query response time by orders of magnitude. Queries that took hours now take seconds. This creates a far more interactive experience, allowing analysts to rapidly iterate and quickly converge on answers that would have been prohibitive in the prior system.
Redeveloping its market regulation systems on a Big Data/cloud platform is projected to bring a net cost benefit of $10 to $20 million annually. This benefit is divided in to savings derived from reduced costs of operational infrastructure as the specialized nature of the data appliances previously in use results in significant hardware and ongoing support requirements. In addition, the new platform deployment is helping to increase operational efficiency.
“As we make this move to the cloud, we will be a lot more agile in terms of responding to changes in the market,” says Cardillo. “We have our surveillance programs that kick out these alerts and exceptions and we want to make it as seamless as possible, perform the analytics on top of these alerts. Gone will be the days when there will be a bunch of intermediate technologists they have to call.”
(Sources: FINRA, Cloudera)
Want more? Check out our most-read content:
Frontiers in Artificial Intelligence for the IoT: White Paper
Beyond Sensors: IBM on Use Cases for Real-Time Data
Why Data Integration Needs to Evolve for the IoT
Real-Time Traffic Management With Road Signs
Liked this article? Share it with your colleagues!