students

I worked and I am working with


PhD


Gianluca Rossi

Université Claude Bernard Lyon 1

Co-Supervised with Angela Bonifati

Hygraph


students/mauro_fama.jpg
Mauro Fama

INSA Lyon

Co-Supervised with Angela Bonifati

TO Be Defined


students/samuele.jpg
Samuele Langhi

Université Claude Bernard Lyon 1

Co-Supervised with Angela Bonifati

Towards Streaming Consistency Management

Stream processing is designed to query unbounded and timely-ordered data flows in real time while guaranteeing low latency and high throughput. Despite its roots in the database community, the most recent research on data stream management has focused on performance and neglected the role of data quality. Moreover, quality management techniques are not always feasible in streaming since the data should “keep moving.” Nevertheless, data quality issues, e.g., timeliness and completeness, become progressively more important as stream processing is applied beyond the context of analytics. For this reason, we advocate for an approach that not only allows data quality intervention as done before but also allows tracking the consistency of the records across streaming queries through provenance annotations. These annotations are associated with the input and propagated to the query results, on which a degree of inconsistency is calculated.


students/ragab.jpg
Mohamed Ragab

University of Tartu

Benchranking

Leveraging Big Data (BD) frameworks to process large-scale RDF datasets can optimize query performance. Configuring these frameworks significantly impacts results, and benchmarking different configurations offers best practices for optimal settings. However, current benchmarking efforts often lack quantitative ranking techniques and are limited to descriptive or diagnostic analytics. This paper addresses this gap by proposing “Bench-ranking” criteria for prescriptive analytics, using ranking functions to evaluate performance across multiple dimensions. We validate these criteria through an empirical study with RDF datasets on Apache Spark-SQL, offering clear insights for practitioners to make informed decisions in complex BD environments.


students/kristo.jpg
Kristo Raun

University of Tartu

Co-Supervised with Ahmed Awad

Adaptive Out-of-order Handling in Streaming Conformance Checking

This thesis addresses challenges in streaming conformance checking for big data environments, where event streams are high in volume and prone to imperfections like out-of-order arrivals. The contributions include:

  • Trie Data Structure: Introduces a trie-based method for efficient real-time conformance checking.
  • I Will Survive (IWS) Algorithm: A low-latency algorithm built on the trie structure for near real-time processing of ongoing events.
  • C-3PA Algorithm: Enhances IWS to handle incomplete and evolving traces, providing confidence estimates and warm-start capabilities.
  • Adaptive Handling of Out-of-Order Events: Proposes a mechanism to dynamically adjust to stream imperfections, ensuring robust checking. This work is the first to address out-of-order events in streaming conformance checking, offering a comprehensive solution for complex digital infrastructures.

Master


Sadig Eyvazov

University of Tartu

Co-Supervised with Mohamed Ragab

Large RDF Graph Processing on top of Spark

Name: Sadig Eyvazov
Year: 2020
Duration: 12 months
Institution: University of Tartu
Project: Large RDF Graph Processing on top of Spark
Abstract: In recent years, we have witnessed an uncontrollable growth of data generated by machines or humans… (full abstract omitted for brevity)


Carlos Ramos

University of Tartu

Quarser a Graph-Aware JSON-LD Parser

Name: Carlos Ramos
Year: 2020
Duration: 12 months
Institution: University of Tartu
Project: Quarser a Graph-Aware JSON-LD Parser
Abstract: The continuous growth of the Web of Data has fueled the interest of performing analytical operations over Knowledge Graphs (KGs)… (full abstract omitted for brevity)


Philippe Scorsolini

Politecnico di Milano

Co-Supervised with Emanuele Della Valle

An Infrastructural View of Cascading Stream Reasoning Using Micro-Services

Name: Philippe Scorsolini
Year: 2018
Duration: 12 months
Institution: Politecnico di Milano
Project: An Infrastructural View of Cascading Stream Reasoning Using Micro-Services
Abstract: In the Web of Data, in which Data are increasing in Volume, Variety, and Velocity… (full abstract omitted for brevity)


Mario Scrocca

Politecnico di Milano

Co-Supervised with Emanuele Della Valle

Towards Observability with (RDF) Trace Stream Processing

Name: Mario Scrocca
Year: 2017
Duration: 12 months
Institution: Politecnico di Milano
Project: Towards Observability with (RDF) Trace Stream Processing
Abstract: Distributed software systems and cloud-based micro-service solutions are getting momentum… (full abstract omitted for brevity)


Yehia Abosedira

Politecnico di Milano

Co-Supervised with Emanuele Della Valle

Towards streaming data on the Web - vocabularies, catalogs and applications

Name: Yehia Abosedira
Year: 2017
Duration: 12 months
Institution: Politecnico di Milano
Project: Towards streaming data on the Web - vocabularies, catalogs and applications
Abstract: The data on the web are continuously evolving. The popularity of social media, internet of things… (full abstract omitted for brevity)


Bachelor


Tarmo Pungas

University of Tartu

Contextual captioning of internet meme templates

Name: Tarmo Pungas
Year: 2022
Duration: 6 months
Institution: University of Tartu
Project: Contextual captioning of internet meme templates


Tõnis Hendrik Hlebnikov

University of Tartu

Towards a Knowledge Graph of Internet Memes

Name: Tõnis Hendrik Hlebnikov
Year: 2021
Duration: 6 months
Institution: University of Tartu
Project: Towards a Knowledge Graph of Internet Memes
Abstract: In this thesis, the notion of considering internet memes as rich units of information is presented… (full abstract omitted for brevity)


Jonathan Karu

University of Tartu

WebSourcing - Connecting Kafka to the Web

Name: Jonathan Karu
Year: 2020
Duration: 6 months
Institution: University of Tartu
Project: WebSourcing - Connecting Kafka to the Web
Abstract: Apache Kafka is a open-source stream-processing framework which is quickly becoming an industry standard… (full abstract omitted for brevity)


Fred Boldin

University of Tartu

Kypher - Towards Continuous OpenCypher

Name: Fred Boldin
Year: 2020
Duration: 6 months
Institution: University of Tartu
Project: Kypher - Towards Continuous OpenCypher
Abstract: With the surge of data caused by the creation of internet, internet of things, growth of the computing power and storage… (full abstract omitted for brevity)


PFE


Toufiq Houda

Hubspot

Confidential

From 2024-02-06 to 2024-06-28


Ngo Ngoc Minh

Onepoint

Confidential

From 2024-02-05 to 2024-08-09


Mousset Maxime

ARHIS COMPETENCY CENTER

Confidential

From 2024-04-15 to 2024-09-13


Ducange Jules

Onepoint

Confidential

From 2024-02-19 to 2024-08-16


Candellieri Nicola

Descartes Underwriting

Confidential

From 2024-04-08 to 2024-09-27


Astete Hernandez Abner Abdiel

Université Claude Bernard Lyon 1

Causal Graphs to Predict and Explain Air Pollution Patterns

From 2024-03-11 to 2024-07-26


Roux Matthieu

Datadog Paris

Confidential

From 2023-02-20-2023-08-18


Montgomery Mathieu

Datadog France SAS

Confidential

From 2022-02-28-2022-08-05


PSAT


Ngoc Minh NGO, Quoc Viet PHAM, and Minh Duc PHUNG

INSA Lyon

LikeWines: Predictive and Comparative Analysis of Wine Quality


Jules Ducange and Erwan Soulier

INSA Lyon

Framing Internet Memes using LLMs