Database and Data Mining
IST-4-DBM1 - ECTS 3.0
Abstract Part 1: Databases
This course introduces the most important concepts concerning the design and the use of databases. It covers the main issues in data modeling when considering the relational data model (structure, languages like relational algebra and relational calculus) and the key features of attribute-oriented models with respect to constructor-oriented models (like, for instance, Extended Entity-Relationship models). The fundamental concepts like the constraints and the normal forms will be explained and their impact on the quality of designed databases will be discussed. Finally, practical aspects of relational database design and use will be discussed. It includes a presentation of the SQL query language but also a short introduction to data warehousing and On Line Analytical Processing queries (OLAP). To be concrete, the students will have to understand the design of a real relational database and to practice database querying with a professional DataBase Management System (DBMS).
Note: Slides are taken and partially modified from https://www.db-book.com/
Schedule
Date | Day | Type | Content | Material | Comment |
---|---|---|---|---|---|
21/09 | Thursday | CM | Introduction to DB | Slides | |
27/09 | Wednesday | CM | Relational Languages + ER | Slides Slides | |
28/09 | Thursday | TP | Entity-Relationship Model (TP) | Class Exercises and Extra Exercises | |
05/10 | Thursday | TP/CM | Entity-Relationship Practice and SQL (TP) | Slides and Exercises | Video |
17/10 | Tuesday | CM | Normal Forms | Slides Sudeepa Roy | Both decks are mandatory |
19/10 | Tuesday | TP | SQL Practice | Repo Basic With Codespace | Video Soon |
23/10 | Monday | CM | Transaction (Notes) | Slides |
The course exam consists into
- 1h test including 10 questions and 3 exercises (ER Modelling, Relational Algebra, SQL)
- a small project to perform in pairs . Each project should have a presentation and 2-5 pages project report (mandatory) and a presentation.
Final grade is the weighted average between exam (0.4) and project (0.6)
Project Submission and Setup
Examples
Project Checklist
- Identify the domain of choice, e.g., internet memes
- formulate some questions in natural language that you would like to answer
- submit the questions for approval
- design the database and perform data entry in Postgres
- ER Diagram and comments about the design decision (using appropriate terminology)
- Implement the questions in SQL, analyse the query plan in relational algebra, experiment alternative implemetations
- Basic queries should cover standard RA operators, Joins, and aggregation functions
- Advanced features nice to show example of
- Indexes
- Transactions
NB: Full-stack application is not required, but encouraged.
Project Grading
- 1-15
- Natural Language Queries
- ER Diagram Design
- SQL Implementation
- DDL + Data Entry
- Basic SQL Usage (no joins) + Relational Algebra (should also go in the presentation)
- Intermediate SQL usages
- Joins and Aggregations
- Integrity Constratints (check or queries) + Ad-hoc data to test
- Indexes and Transaction
- 15-20
- 10 minutes Presentation ( you both have to speak) + 5 minutes Questions (Individual)
- Use of appropriate terminology (also when speaking)
- Clarity of the explanation
- Appropriateness of the answer
- What to put in the report
- a general presentation
- design decisions
- ER + Implementation
- problems that you solved
- what to put in the presentation (4-8 slides):
- a general presentation
- ER + Comments of entity
- Example of SQL + RA
- Demo is welcomed
- 10 minutes Presentation ( you both have to speak) + 5 minutes Questions (Individual)
- yes, you can decide not to present (max grade is 15)
- up to 3 extra points can be assigned to reward intercation, support to other projects, initiative in project execution ()
Proposed Dataset
-
Meme Datasets:
- Know Your Meme Scraped
- DBPedia Enrichment
- Google Vision Dataset
- Instances from Image Flip
- Industrial ERP (shared by email) (Courtesy of LEM International)
- Hackathon Dataset (Courtesy of Prof Alex Nolte)
Suggested Material
Pre Lab Installations (Courtesy of Mohamed Ragab)
- Install PostgreSQL
- On Docker ``
- on Windows
- Download and Install PostgreSQL for all platforms
- Follow this tutorial for more details (Windows installation).
- on linux
- Install the psycopg2 Python Package to connecto to PostgreSQL
- use th command
in your command line.1
pip install psycopg2
- use th command
- Accessing PostgreSQL from Command Line:
- Add the PostgreSQL installation \/home\ and \/bin\ directories to the enviroment variables.
- use the command
to connect to the the by-default created database \postgres\ with the user \postgres.1
psql -U postgres postgres
- Enter your set \postgres\ password (i.e password of the default password that you have been asked at the time of installtion).
- Connect To a PostgreSQL Database Server
- show you how to connect to the PostgreSQL using psql tool and pgAdmin 4 GUI tool. o
Data Mining
Abstract
This course introduces the basics notions of datamining. It surveys techniques for data preprocessing, i.e., cleansing, integration, data reduction. Then, it jumps into foundational aspects of data mining like: Frequent Patterns mining, classification, clustering, outliers detection. The course examination consists of both a project and an exam with open and multiple choice questions and exercises.
Schedule
Date | Lecture | Chapters | Slides | Notes |
---|---|---|---|---|
14/11/2023 | Introduction to Data and Data Mining | Chp 1, 2 | Slides 1 Slides 2 skip part on data viz | |
16/11/2023 | Data Pre Processing | Chp 3 | Slides | external repo |
21/11/2023 | Frequent Patterns | Chp 6 | Slides | external repo |
23/11/2023 | Classification | Chp 8 | Slides | blogpost |
24/11/2023 | Clustering | Chp 10 | Slides | |
27/11/2023 | Project Work | Self-Chp 12 | Slides | |
28/11/2023 | Project Work | |||
30/11/2023 | Project Work | |||
12/12/2023 | Exam | Example | ||
12/12/2023 | Project Presentation + Greetings | |||
Example Projects
Project Checklist
- Identify the domain of choice, e.g., dataset presented above
- YOU DO NOT HAVE TO PICK ONLY ONE DATASET
- BUT BE CAREFUL BECAUSE YOU MIGHT HAVE TO CLEAN MORE THAN ONE
- YOU CAN SUBMIT THE TAKS FOR APPROVAL
- you can pick out of problems (preprocessing + 2)
- cleansing/preprocessing (mandatory)
- frequent pattern mining
- classification
- clustering
- Explain why why you’re happy with the results.
- evaluate the associations rules
- evaluate the classification
- evaluate the clustering proceedure
- present a notebook with comments and discussion of your result (github)
For visualization
Suggested Material
- Professor’s Choice: Data Mining: Concepts and Techniques, 3rd ed.
- Slides and Lecture Notes https://wiki.illinois.edu//wiki/display/cs412/2.+Course+Syllabus+and+Schedule
- Exercise (we will do them) http://www.cse.msu.edu/~ptan/dmbook/software/
- https://dataminingbook.info/book_html/
- http://www.mmds.org/#support