Back to Search Results

Big Data App Store: Democratizing Data-Science with On-Demand Analytics

Web 2.0, the Internet as we know it today, was not constructed to handle the growing amounts of data and increasingly large data sets known as “Big Data.” Further complicating things, many of the data sets are either unstructured (80% by one estimate) or stored in distributed resources with different schemas, formats, and structures. Thus we aren’t realizing all of the advantages possible with Big Data and high-performance computing. Data-driven enterprises, from academia to government to private industry, need tools to seamlessly integrate intra- and inter-enterprise data sets to extract “nuggets” of data (patterns and behaviors) previously, not discovered because of format or location, for analysis.

Researchers at ORNL have created a suite of what they call “knowledge catalytic” tools to seamlessly integrate intra- and inter-enterprise data sources to extract these nuggets and create actionable information for decision making. Using the techniques of the emerging Web 3.0, such as semantic interfaces; powerful models and algorithms for manipulating data; and the power of high-performance parallel computing, these tools speed up the pattern discovery (feature extraction) process and the pattern recognition (learning) process across disparate data sources. The tools can be viewed as an app store of algorithms that can be combined to create virtual databases for data-driven discoveries and decisions.

Based on algorithms that scale to multiple platforms, the various tools include a fuzzy logic toolkit for matching schema attributes; graph-processing tools (“crosswalk keys”) for field-level linkages across databases and data sources; tools for transforming unstructured data into structured forms that can then be analyzed; tools for automated clustering of data elements across different data sources; and models and algorithms for sensing connections, anomalies, and emergent behaviors. Two key elements are (1) visual display capabilities for analyzing various potential links between data/data sets and (2) a knowledge recorder tool that captures search information and analyst comments, which ultimately will save enterprise resources and time by avoiding analytic silos and re-searching/rediscovering what other analysts have already learned from data.

In the era of Big Data, not everyone can afford to have an operational information technology department, and not everyone can afford expensive data scientists. The ORNL approach targets those customers that have data and cannot afford Big Data systems but can benefit from the insights. The app-store model allows people to rent computational tools and analytics on demand, paying for what they need.

Computational Sciences and Engineering Division
Oak Ridge National Laboratory
Technology Commercialization Manager Technology Commercialization
Oak Ridge National Laboratory
Phone: 865.241.3808
Search Home Help About InSpire