Database and Data Mining

IST-4-DBM1 - ECTS 3.0

Abstract Part 1: Databases

This course introduces the most important concepts concerning the design and the use of databases. It covers the main issues in data modeling when considering the relational data model (structure, languages like relational algebra and relational calculus) and the key features of attribute-oriented models with respect to constructor-oriented models (like, for instance, Extended Entity-Relationship models). The fundamental concepts like the constraints and the normal forms will be explained and their impact on the quality of designed databases will be discussed. Finally, practical aspects of relational database design and use will be discussed. It includes a presentation of the SQL query language but also a short introduction to data warehousing and On Line Analytical Processing queries (OLAP). To be concrete, the students will have to understand the design of a real relational database and to practice database querying with a professional DataBase Management System (DBMS).

Note: Slides are taken and partially modified from https://www.db-book.com/

Schedule

Date Day Type Content Material Comment
21/09 Thursday CM Introduction to DB Slides  
27/09 Wednesday CM Relational Languages + ER Slides Slides  
28/09 Thursday TP Entity-Relationship Model (TP) Class Exercises and Extra Exercises  
05/10 Thursday TP/CM Entity-Relationship Practice and SQL (TP) Slides and Exercises Video
17/10 Tuesday CM Normal Forms Slides Sudeepa Roy Both decks are mandatory
19/10 Tuesday TP SQL Practice Repo Basic With Codespace Video Soon
23/10 Monday CM Transaction (Notes) Slides  

The course exam consists into

  • 1h test including 10 questions and 3 exercises (ER Modelling, Relational Algebra, SQL)
  • a small project to perform in pairs . Each project should have a presentation and 2-5 pages project report (mandatory) and a presentation.

Final grade is the weighted average between exam (0.4) and project (0.6)

Project Submission and Setup

Examples

Project Checklist

  • Identify the domain of choice, e.g., internet memes
  • formulate some questions in natural language that you would like to answer
    • submit the questions for approval
  • design the database and perform data entry in Postgres
  • ER Diagram and comments about the design decision (using appropriate terminology)
  • Implement the questions in SQL, analyse the query plan in relational algebra, experiment alternative implemetations
  • Basic queries should cover standard RA operators, Joins, and aggregation functions
  • Advanced features nice to show example of
    • Indexes
    • Transactions

NB: Full-stack application is not required, but encouraged.

Project Grading

  • 1-15
    • Natural Language Queries
    • ER Diagram Design
    • SQL Implementation
      • DDL + Data Entry
      • Basic SQL Usage (no joins) + Relational Algebra (should also go in the presentation)
    • Intermediate SQL usages
      • Joins and Aggregations
      • Integrity Constratints (check or queries) + Ad-hoc data to test
      • Indexes and Transaction
  • 15-20
    • 10 minutes Presentation ( you both have to speak) + 5 minutes Questions (Individual)
      • Use of appropriate terminology (also when speaking)
      • Clarity of the explanation
      • Appropriateness of the answer
    • What to put in the report
      • a general presentation
      • design decisions
      • ER + Implementation
      • problems that you solved
    • what to put in the presentation (4-8 slides):
      • a general presentation
      • ER + Comments of entity
      • Example of SQL + RA
      • Demo is welcomed
  • yes, you can decide not to present (max grade is 15)
  • up to 3 extra points can be assigned to reward intercation, support to other projects, initiative in project execution ()

Proposed Dataset

Suggested Material

Pre Lab Installations (Courtesy of Mohamed Ragab)

Lab Repository

  1. Install PostgreSQL
    1. On Docker ``
    2. on Windows
    3. on linux
  2. Install the psycopg2 Python Package to connecto to PostgreSQL
    • use th command
      1
      
      pip install psycopg2
      
      in your command line.
  3. Accessing PostgreSQL from Command Line:
    • Add the PostgreSQL installation \/home\ and \/bin\ directories to the enviroment variables.
    • use the command
      1
      
      psql -U postgres postgres
      
      to connect to the the by-default created database \postgres\ with the user \postgres.
    • Enter your set \postgres\ password (i.e password of the default password that you have been asked at the time of installtion).
  4. Connect To a PostgreSQL Database Server
    • show you how to connect to the PostgreSQL using psql tool and pgAdmin 4 GUI tool. o

Data Mining

Abstract

This course introduces the basics notions of datamining. It surveys techniques for data preprocessing, i.e., cleansing, integration, data reduction. Then, it jumps into foundational aspects of data mining like: Frequent Patterns mining, classification, clustering, outliers detection. The course examination consists of both a project and an exam with open and multiple choice questions and exercises.

Schedule

Date Lecture Chapters Slides Notes
14/11/2023 Introduction to Data and Data Mining Chp 1, 2 Slides 1 Slides 2 skip part on data viz  
16/11/2023 Data Pre Processing Chp 3 Slides external repo
21/11/2023 Frequent Patterns Chp 6 Slides external repo
23/11/2023 Classification Chp 8 Slides blogpost
24/11/2023 Clustering Chp 10 Slides  
27/11/2023 Project Work Self-Chp 12 Slides  
28/11/2023 Project Work      
30/11/2023 Project Work      
12/12/2023 Exam   Example  
12/12/2023 Project Presentation + Greetings      
         

Example Projects

Project Checklist

-REGISTER YOUR GROUP

  • Identify the domain of choice, e.g., dataset presented above
    • YOU DO NOT HAVE TO PICK ONLY ONE DATASET
    • BUT BE CAREFUL BECAUSE YOU MIGHT HAVE TO CLEAN MORE THAN ONE
    • YOU CAN SUBMIT THE TAKS FOR APPROVAL
  • you can pick out of problems (preprocessing + 2)
    • cleansing/preprocessing (mandatory)
    • frequent pattern mining
    • classification
    • clustering
  • Explain why why you’re happy with the results.
    • evaluate the associations rules
    • evaluate the classification
    • evaluate the clustering proceedure
  • present a notebook with comments and discussion of your result (github)

For visualization

https://streamlit.io/

Suggested Material