students
I worked and I am working with
PhD
Université Claude Bernard Lyon 1
Co-Supervised with Angela Bonifati
Hygraph
INSA Lyon
Co-Supervised with Angela Bonifati
TO Be Defined
Université Claude Bernard Lyon 1
Co-Supervised with Angela Bonifati
Towards Streaming Consistency Management
Stream processing is designed to query unbounded and timely-ordered data flows in real time while guaranteeing low latency and high throughput. Despite its roots in the database community, the most recent research on data stream management has focused on performance and neglected the role of data quality. Moreover, quality management techniques are not always feasible in streaming since the data should “keep moving.” Nevertheless, data quality issues, e.g., timeliness and completeness, become progressively more important as stream processing is applied beyond the context of analytics. For this reason, we advocate for an approach that not only allows data quality intervention as done before but also allows tracking the consistency of the records across streaming queries through provenance annotations. These annotations are associated with the input and propagated to the query results, on which a degree of inconsistency is calculated.
University of Tartu
Benchranking
Leveraging Big Data (BD) frameworks to process large-scale RDF datasets can optimize query performance. Configuring these frameworks significantly impacts results, and benchmarking different configurations offers best practices for optimal settings. However, current benchmarking efforts often lack quantitative ranking techniques and are limited to descriptive or diagnostic analytics. This paper addresses this gap by proposing “Bench-ranking” criteria for prescriptive analytics, using ranking functions to evaluate performance across multiple dimensions. We validate these criteria through an empirical study with RDF datasets on Apache Spark-SQL, offering clear insights for practitioners to make informed decisions in complex BD environments.
University of Tartu
Co-Supervised with Ahmed Awad
Adaptive Out-of-order Handling in Streaming Conformance Checking
This thesis addresses challenges in streaming conformance checking for big data environments, where event streams are high in volume and prone to imperfections like out-of-order arrivals. The contributions include:
- Trie Data Structure: Introduces a trie-based method for efficient real-time conformance checking.
- I Will Survive (IWS) Algorithm: A low-latency algorithm built on the trie structure for near real-time processing of ongoing events.
- C-3PA Algorithm: Enhances IWS to handle incomplete and evolving traces, providing confidence estimates and warm-start capabilities.
- Adaptive Handling of Out-of-Order Events: Proposes a mechanism to dynamically adjust to stream imperfections, ensuring robust checking. This work is the first to address out-of-order events in streaming conformance checking, offering a comprehensive solution for complex digital infrastructures.
Master
University of Tartu
Co-Supervised with Mohamed Ragab
Large RDF Graph Processing on top of Spark
Name: Sadig Eyvazov
Year: 2020
Duration: 12 months
Institution: University of Tartu
Project: Large RDF Graph Processing on top of Spark
Abstract: In recent years, we have witnessed an uncontrollable growth of data generated by machines or humans… (full abstract omitted for brevity)
University of Tartu
Quarser a Graph-Aware JSON-LD Parser
Name: Carlos Ramos
Year: 2020
Duration: 12 months
Institution: University of Tartu
Project: Quarser a Graph-Aware JSON-LD Parser
Abstract: The continuous growth of the Web of Data has fueled the interest of performing analytical operations over Knowledge Graphs (KGs)… (full abstract omitted for brevity)
Politecnico di Milano
Co-Supervised with Emanuele Della Valle
An Infrastructural View of Cascading Stream Reasoning Using Micro-Services
Name: Philippe Scorsolini
Year: 2018
Duration: 12 months
Institution: Politecnico di Milano
Project: An Infrastructural View of Cascading Stream Reasoning Using Micro-Services
Abstract: In the Web of Data, in which Data are increasing in Volume, Variety, and Velocity… (full abstract omitted for brevity)
Politecnico di Milano
Co-Supervised with Emanuele Della Valle
Towards Observability with (RDF) Trace Stream Processing
Name: Mario Scrocca
Year: 2017
Duration: 12 months
Institution: Politecnico di Milano
Project: Towards Observability with (RDF) Trace Stream Processing
Abstract: Distributed software systems and cloud-based micro-service solutions are getting momentum… (full abstract omitted for brevity)
Politecnico di Milano
Co-Supervised with Emanuele Della Valle
Towards streaming data on the Web - vocabularies, catalogs and applications
Name: Yehia Abosedira
Year: 2017
Duration: 12 months
Institution: Politecnico di Milano
Project: Towards streaming data on the Web - vocabularies, catalogs and applications
Abstract: The data on the web are continuously evolving. The popularity of social media, internet of things… (full abstract omitted for brevity)
Bachelor
University of Tartu
Contextual captioning of internet meme templates
Name: Tarmo Pungas
Year: 2022
Duration: 6 months
Institution: University of Tartu
Project: Contextual captioning of internet meme templates
University of Tartu
Towards a Knowledge Graph of Internet Memes
Name: Tõnis Hendrik Hlebnikov
Year: 2021
Duration: 6 months
Institution: University of Tartu
Project: Towards a Knowledge Graph of Internet Memes
Abstract: In this thesis, the notion of considering internet memes as rich units of information is presented… (full abstract omitted for brevity)
University of Tartu
WebSourcing - Connecting Kafka to the Web
Name: Jonathan Karu
Year: 2020
Duration: 6 months
Institution: University of Tartu
Project: WebSourcing - Connecting Kafka to the Web
Abstract: Apache Kafka is a open-source stream-processing framework which is quickly becoming an industry standard… (full abstract omitted for brevity)
University of Tartu
Kypher - Towards Continuous OpenCypher
Name: Fred Boldin
Year: 2020
Duration: 6 months
Institution: University of Tartu
Project: Kypher - Towards Continuous OpenCypher
Abstract: With the surge of data caused by the creation of internet, internet of things, growth of the computing power and storage… (full abstract omitted for brevity)
PFE
Hubspot
Confidential
From 2024-02-06 to 2024-06-28
Onepoint
Confidential
From 2024-02-05 to 2024-08-09
ARHIS COMPETENCY CENTER
Confidential
From 2024-04-15 to 2024-09-13
Onepoint
Confidential
From 2024-02-19 to 2024-08-16
Descartes Underwriting
Confidential
From 2024-04-08 to 2024-09-27
Université Claude Bernard Lyon 1
Causal Graphs to Predict and Explain Air Pollution Patterns
From 2024-03-11 to 2024-07-26
Datadog Paris
Confidential
From 2023-02-20-2023-08-18
Datadog France SAS
Confidential
From 2022-02-28-2022-08-05
PSAT
INSA Lyon
LikeWines: Predictive and Comparative Analysis of Wine Quality
INSA Lyon