“Data warehousing is a business analyst's dream—all the information about the organization's activities gathered in one place, open to a single set of analytical tools. But how do you make the dream a reality (Wall & Scott, N.d.)?” There are the basics of building a data warehouse such as “understanding the business processes, the goals, determining the business objectives and then after constructing a data model, figuring out the data sources”, the decision on how to handle growth within the organization should be the next set of determining factors (Wall & Scott, N.d.).
When a company grows the storage of data must also increase or “expand”. Prior to expanding or “merging data” one must “explore new ways to grow, cut costs and increase profits”. Below are more details for a successful “warehouse construction”.
External feeds are very important but is usually neglected in the data warehouse. The “external data type is external to your company and is usually not in the data warehouse because no one has requested it yet (Mould, 2013)”.
This type of feed is can be based on “statistics and trends”. They also “can have a major impact upon ones business, so they should be tracked over time” (Mould, 2013). This is the area for increasing profits as external feeds allow for “surveys and ROI analysis” which allow the benefits of using this data (score uplift) outweigh the cost (Mould, 2013)”.
External data also allows for the company “to ensure one is making the right business decisions as it allows the company to see the big picture (Hammergren, N.d.)”. Not only is the big picture an important factor within a data warehouse so is having “clean data”.
Ensure Only Clean Data
There are a few ways to ensure for data accuracy and one is a little acronym named “ETL” which stand for “Extract/Cleans -Transform-Load” (ETL, N.d.).
The extraction is the part that pulls and technically “cleans” the data. Prior to merging the data it is usually reviewed to ensure there are no duplication's. When one extracts the data the following is performed:
Update notification – notification a record has been changed
Incremental extract - identify which records have been modified
Full extract – keeps a copy of the original to determine or identify changes
Making identifiers unique
Standardizes null values
Validates address fields
Applies the rules
Converts measured data
Disable constraints and indexes – pre load
Enable constraints and indexes – post load (ETL, N.d.)”.
So how often should one update the data in the data warehouse? “You must update your data warehouse on a regular basis to ensure that the information derived from it is current. This process of updating the data is called the refresh process (Defining, N.d.)”.
The ETL is usually put on a scheduled time and can be completed after hours, in the middle of the night or on weekends – this all depends on the company’s urgency of the data (10, N.d.). “Once all of this data has been loaded into the data warehouse, the materialized views must be updated to reflect the latest data (10, N.d.)”. While timing and scheduling the “refresh” is crucial the sequence and “parallel” is even more important to a data warehouse.
Parallel and Sequential Order
Order is key in a data warehouse as a system is literal. It only does as it is told. If a specific table is needed to pull data prior to another table the sequence is key and must be orderly. “Choose the partitioning strategy of the materialized view in sync with a table's partitioning strategy (10, N.d.)”. The following is an example of “refreshing of a materialized view and parallel updates” (10, N.d.).
1. “Table that will be the basis for the materialized view including partitioning by range
2. Load the initial table data
3. Create materialized view
4. Load a separate table to be exchanged with the new partition.
5. Create and load a separate table to be exchanged with a partition in the materialized view
a. INSERT /*+ PARALLEL
6. Gather statistics
7. Exchange the partitions
8. Inform the database that the materialized view is fresh again (10, N.d.)”.
The information given is a quick snapshot of how one can explore new ways to grow, cut costs, and increase profits in building a data warehouse. There are many techniques and rules to review but given the demonstration should allow one to determine how important data can be to an organization; how frequent to refresh the data in the data warehouse; and how external resources can help ones company.