Data warehouses, ETL, diverse data types, and big-data governance.
This chapter covers broader data environments that combine many sources for analysis and reporting. ISC focuses on the governance and control challenges that arise when scale, variety, and transformation complexity increase.
Large data environments can improve analysis while weakening auditability if lineage, transformation logic, access, and data definitions are not governed. The exam issue is whether the organization can rely on the data after it has been moved, reshaped, and combined.
| Data environment issue | What to verify | Common ISC trap |
|---|---|---|
| Warehouse, lake, or mart | Purpose, structure, users, and level of data curation. | Assuming every centralized store has the same control quality. |
| ETL process | Extraction completeness, transformation accuracy, load controls, and exception handling. | Trusting analytics output without testing transformation logic. |
| Data type | Whether structured, semi-structured, or unstructured data can be validated and governed. | Applying relational database controls to every data source. |
| Big data governance | Ownership, lineage, quality, retention, access, and monitoring at scale. | Valuing volume over reliability and accountability. |
| Step | ISC question to ask | Control implication |
|---|---|---|
| 1. Identify the source systems | Which operational systems, external feeds, or files provide the data? | Reliability begins with knowing where the data originated. |
| 2. Trace extraction and transformation | What logic extracts, cleans, joins, aggregates, or reshapes the data? | Transformation errors can create persuasive but unreliable analytics. |
| 3. Verify loading and reconciliation | How are completeness, accuracy, exception handling, and rejected records checked? | Load controls help prove that the intended population reached the target environment. |
| 4. Assess access and ownership | Who can define, change, query, export, or govern the data store? | Big data environments need accountability as well as storage capacity. |
| 5. Evaluate reporting use | Is the output used for monitoring, audit evidence, management reporting, or decision support? | The level of control needed depends on how much reliance is placed on the output. |
| Checkpoint | Ask before relying on output | Control effect |
|---|---|---|
| Source lineage | Can each data element be traced to an originating system, file, feed, or external source? | Lineage supports completeness, accountability, and error investigation. |
| Transformation control | Are extraction, cleansing, joins, aggregations, and business rules approved and tested? | Transformation logic can create unreliable output from reliable source data. |
| Reconciliation | Do load totals, rejected records, exception logs, and refresh timing tie back to source expectations? | Reconciliation helps prove the intended population reached the warehouse or lake. |
| Data ownership | Who defines fields, approves changes, manages quality, and resolves conflicts? | Large environments fail when ownership is unclear. |
| Reliance level | Is the data used for audit evidence, monitoring, management reporting, or exploratory analysis? | Higher reliance requires stronger governance and testing. |