Data analysis and data mining are a subset of business intelligence (BI), which also incorporates data warehousing, database management systems, and Online Analytical Processing (OLAP).
The technologies are frequently used in customer relationship management (CRM) to analyze patterns and query customer databases. Large quantities of data are searched and analyzed to discover useful patterns or relationships, which are then used to predict future behavior.
Some estimates indicate that the amount of new information doubles every three years. To deal with the mountains of data, the information is stored in a repository of data gathered from various sources, including corporate databases, summarized information from internal systems, and data from external sources. Properly designed and implemented, and regularly updated, these repositories, called data warehouses, allow managers at all levels to extract and examine information about their company, such as its products, operations, and customers' buying habits.
With a central repository to keep the massive amounts of data, organizations need tools that can help them extract the most useful information from the data. A data warehouse can bring together data in a single format, supplemented by metadata through use of a set of input mechanisms known as extraction, transformation, and loading (ETL) tools. These and other BI tools enable organizations to quickly make knowledgeable business decisions based on good information analysis from the data.
Analysis of the data includes simple query and reporting functions, statistical analysis, more complex multidimensional analysis, and data mining (also known as knowledge discovery in databases, or KDD). Online analytical processing (OLAP) is most often associated with multidimensional analysis, which requires powerful data manipulation and computational capabilities.
With the increasing data being produced each year, BI has become a hot topic. The increasing focus on BI has caused a number of large organizations have begun to increase their presence in the space, leading to a consolidation around some of the largest software vendors in the world. Among the notable purchases in the BI market were Oracle's purchase of Hyperion Solutions; Open Text's acquisition of Hummingbird; IBM's buy of Cognos; and SAP's acquisition of Business Objects.
Definition
The purpose of gathering corporate information together in a single structure, typically an organization's data warehouse, is to facilitate analysis so that information that has been collected from a variety of different business activities may be used to enhance the understanding of underlying trends in their business. Analysis of the data can include simple query and reporting functions, statistical analysis, more complex multidimensional analysis, and data mining. OLAP, one of the fastest growing areas, is most often associated with multidimensional analysis. According to The BI Verdict (formerly The OLAP Report), the definition of the characteristics of an OLAP application is "fast analysis of shared multidimensional information.
Data warehouses are usually separate from production systems, as the production data is added to the data warehouse at intervals that vary, according to business needs and system constraints. Raw production data must be cleaned and qualified, so it often differs from the operational data from which it was extracted. The cleaning process may actually change field names and data characters in the data record to make the revised record compatible with the warehouse data rule set. This is the province of ETL.
A data warehouse also contains metadata (structure and sources of the raw data, essentially, data about data), the data model, rules for data aggregation, replication, distribution and exception handling, and any other information necessary to map the data warehouse, its inputs, and its outputs. As the complexity of data analysis grows, so does the amount of data being stored and analyzed; ever more powerful and faster analysis tools and hardware platforms are required to maintain the data warehouse.
A successful data warehousing strategy requires a powerful, fast, and easy way to develop useful information from raw data. Data analysis and data mining tools use quantitative analysis, cluster analysis, pattern recognition, correlation discovery, and associations to analyze data with little or no IT intervention. The resulting information is then presented to the user in an understandable form, processes collectively known as BI. Managers can choose between several types of analysis tools, including queries and reports, managed query environments, and OLAP and its variants (ROLAP, MOLAP, and HOLAP). These are supported by data mining, which develops patterns that may be used for later analysis, and completes the BI process.