Data warehouses aren’t exactly a new concept. They’ve been bringing value to organisations since the 1970s. What was thought to be a trend in the previous decades is now a mainstay, and it doesn’t look like data warehouses are going away any time soon.
This three-part series will take you through the basics of data warehousing and how to build one that fits your organisation. We approach this topic with two perspectives: first, in detail and then circling back to the reasons for having a data warehouse for a more holistic view.
Part one discusses the components of a data warehouse. Part two deals with modern advances in data warehousing such as cloud-based technologies, big data and modern unstructured data sources. The last instalment details the considerations involved in choosing the components of a data warehouse so that it works for your company.
In this first instalment, read on to learn what a data warehouse is and how each component brings value to your company of components can bring value to your company.
A data warehouse (DW) is an information management system created to support business intelligence activities. Data often contains patterns that aren’t apparent on the surface. DWs are used to create analyses on centralised, company-wide data to expose these trends. Business managers can then take advantage of these insights to support strategic decisions. The result are business strategies that are backed by science and observed data, rather than the gut feel of a selected few.
DWs are designed to accommodate efficient querying, even with large quantities of historical data accumulated over time. As a result, end users are encouraged to explore their data and perform analyses that can unearth insights that would normally go unnoticed. Roughly put, a data warehouse is a decision-support database, maintained separately from an organisation’s operational or transactional systems.
Database vs. ERP vs. Data Warehouse
A database is an organised collection of structured data stored in tables with fixed columns. The information collated by your departments’ transactional systems will most likely be stored in a operational database.
Databases are designed to optimise storage. They are typically normalised to reduce the repetition of the same data in different rows or records. Each line of business will have its preferred software or system. The underlying databases are often isolated from one another despite storing common information.
In the early 90s, enterprise resource planning (ERP) systems gained popularity, promising the integration and automation of a company’s data. ERPs consolidate input from departments including accounting, manufacturing, supply chain, sales, marketing and human resources. With the compiled central database, leaders could see the interconnections of the departments. As a result, managers introduce more holistic process improvements. ERP systems unified people and processes across an organisation through technology.
Enterprise Resource Planning (ERP) Diagram
Data warehouses also integrate data from disparate sources, standardising them to allow aggregation, comparisons and analysis across an organisation. IT practitioners consider a different set of priorities when designing data warehouses vs ERPs. Since DWs are used to create reports and queries efficiently, information is not normalised. Unlike a transactional database, end-users of a data warehouse rarely modify the records’ contents and exclusively view the data.
Notably, modern DWs can store unstructured data. With improved analytics capabilities, analysts can now use text-heavy information to provide insight for decision making. Part 2 of this series discusses unstructured data in more detail.
Databases and ERPs are not obsolete. They remain vital to an organisation, performing functions that data warehouses cannot. Databases and ERPs serve different purposes – efficiently storing transactional data – whereas data warehouses were built for querying and analysis.
A data warehouse is not a single tool that you select. Instead, it is a blend of components, technologies, and methods. The combination of which, when chosen wisely, facilitates business intelligence.
Below are the major components of a data warehouse. Prominent technologies and related tools are mentioned.
1. Data Management
The Load Manager is responsible for moving your data from source to target storage. It must robustly connect with various data sources like legacy database management systems, flat files, SaaS systems, message queues, unstructured data, and other data warehouses. The Load Manager must copy the information into a destination system, often in a different context from that of the source.
Data must be consistent and standardised before performing any analysis on it. Otherwise, the results may lead to incorrect conclusions. Because information comes from disparate sources, it may be necessary to manipulate your data to ensure consistency before loading it into the target data storage. Transformations can aggregate, join and denormalise your data for such purposes.
Data wrangling is more commonly known as Extract – Transform – Load (ETL or ELT for some). Popular ETL tools include Mattillion, Azure Data Factory and SAS Data Management.
2. Data Storage/ Warehouse Manager
Information derived from your various sources is consolidated into Data Storage. Storing both your structured and unstructured data in a centralised repository is always a good idea, whether or not you’ve modelled your data for a specific use case or if you are only storing it for future analysis. Creating indices and performing back ups are key functions of a Data Storage.
Some data storage solutions include relational databases ( Greenplum, MySQL, SQL Server, PostgreSQL, AWS RDS and AWS Redshift etc.), Big data solutions such as Hadoop platform or , NoSQL Database – MongoDB, Cassandra, MapR DB, etc.
3. Data Analysis
Once a subset is prepared the data can then be mathematically modelled and qualitative analyses can be performed.
Some examples are Python, R, PowerBI, Elastic Dashboards, as well as many others Scala, Apache Spark, RStudio, Stata, JMP, KNIME Analytics Platform, , SAS Base, SAS/STAT, SAS Enterprise Miner
4. End-User Access Tools / Presentation
Your data and models don’t mean much unless the necessary stakeholders understand it. This is why choosing the proper reporting tools, dashboards, web application, and executive information systems tools is crucial to your data warehouse’s success.
Some of the most popular reporting tools such as PowerBI, Elastic, SAS and many more.
5. Hardware & Environment Setup
To run all the software, you will need access to appropriate hardware. In previous years, data warehouses have become more accessible to smaller organisations because of cloud computing’s lower capital outlay. Regardless of your architecture: on-premise, cloud, or hybrid, hardware for operating your data warehouse components plays a significant role in the success of your data warehousing project.
Many companies offer end-to-end platforms and integration services, which make life simpler. This article, however, was written to let you know that you do have options. We hope that this article helps you understand the different components needed to build a data warehouse and allow you to make best decisions for your organisation.
While the main functions of a data warehouse have not changed over the years, modern data warehouse technologies are constantly being developed to improve data analytics and make it accessible to more businesses.
Keep tuned in for the next part of this series. Part two talks about cloud-based data warehouses and the analytics using unstructured data. It also discusses the criteria you should use to select the appropriate vendor for your use case.
We use analytics to discover stories, patterns and trends in your data which will help you formulate ideas and create positive outcomes for the future. Leave your details below so we can help find the best solution for your organisation. We’ll call you.