This course introduces the concepts of database technology used in Business Intelligence. More precisely, this includes multidimensional databases and data warehouses, as well as ETL (Extraction, Transformation and Load) processes and basic concepts of dashboarding. Necessary techniques will be presented for designing, implementing, exploiting, and maintaining data warehouses, paying special attention to spatio-temporal data.
A particular focus will be given on the problems posed by heterogeneous data integration and data quality. The students will learn how to define, measure and maintain data quality in the context of data warehousing. Classical notions of data warehousing and OLAP are developed: ETL, architecture, conceptual and logical design, query processing and optimization. At the end of this course, the student will know how to efficiently design, construct and query a data warehouse to create effective visualizations.
Teachers
Others
Petar Jovanovic (
)
Weekly hours
Theory
1.9
Problems
0
Laboratory
1.9
Guided learning
0
Autonomous learning
96
Objectives
Be able to model multidimensional data warehouses and visually analyze their data
Related competences:
CB7,
CB8,
CB9,
CTR3,
CEC2,
CEE5.1,
CEE5.3,
CG3,
CB6,
Be able to apply specific physical design techniques for decisional systems
Related competences:
CB7,
CB8,
CB9,
CTR3,
CEC2,
CEE5.1,
CEE5.3,
CG3,
Be able to design and implement data migration processes (i.e., ETL)
Related competences:
CB6,
CB7,
CB8,
CB9,
CTR3,
CEC2,
CEE5.1,
CEE5.3,
CG3,
Contents
Introduction
Comparison of operational and decisional systems; Metadata
Data warehousing architectures
Corporate Information Factory; DW 2.0
Database physical desing for analytical queries
Star-join and join indexes; Bitmaps; Materialized views; Spatio-temporal data
Extraction, Transformation and Load
Data quality; Schema and Data Integration; ETL management
Visualization and descriptive analytics
Key Performance Indicators; Dashboarding
Activities
ActivityEvaluation act
Theoretical lectures
In these activities, the lecturer will introduce the main theoretical concepts of the subject. Besides lecturing, cooperative learning techniques will be used. These demand the active participation of the students, and consequently will be evaluated. Objectives:123 Contents:
The student will be asked to practice the different concepts introduced in the theoretical lectures. This includes problem solving either on the computer or on paper. Objectives:123 Contents:
Theory: Inverted class techniques will be used, which require that the student work on the provided multimedia materials before the class. Then, theory lectures comprise the teacher's complementary explanations and problem solving.
Laboratory: Some representative tools will be used for the application of theoretical concepts (e.g., Indyco Builder, PotgreSQL, Oracle, Pentaho Data Integration, Tableau). The course includes continuous hands-on through a course project, divided into three logical blocks: data warehouse modelling, data integration and migration (ETL), and descriptive visualisation, in which the students will work in teams. There will be three project deliverables outside the class hours, while in the class the students will be as well individually assessed about the knowledge acquired during each project block.
Evaluation methodology
Final Mark = min(10 ; 60%E + 40%L + 10%P)
L = Weighted average of the marks of the three lab deliverables
E = Final exam
P = Participation in the class
Transforming Data With Intelligence (former Data Warehouse Institute) http://tdwi.org
MSCA-ITN-Erasmus Joint Doctorate on Data Engineering for Data Science https://deds.ulb.ac.be
Previous capacities
Basic knowledge on relational databases and SQL.
Specifically, it will be assumed knowledge on:
- UML class diagrams
- Relational algebra
- SQL queries
- Relational views
- B-tree operations (i.e., insertion and splits)
- Basic concepts on physical query optimization