Results of KROWN: Knowledge Graph Construction Benchmark

  1. Van Assche, Dylan 123
  2. Chaves-Fraga, David 4
  3. Dimou, Anastasia 5
  1. 1 IDLab
  2. 2 Ghent University
    info

    Ghent University

    Gante, BĂ©lgica

    ROR https://ror.org/00cv9y106

  3. 3 IMEC
  4. 4 Universidade de Santiago de Compostella
  5. 5 KU Leuven
    info

    KU Leuven

    Lovaina, BĂ©lgica

    ROR https://ror.org/05f950310

Editor: Zenodo

Ano de publicaciĂłn: 2024

Tipo: Dataset

CC BY 4.0

Resumo

In this Zenodo repository we present the results of using KROWN to benchmark popular RDF Graph Materialization systems such as RMLMapper, RMLStreamer, Morph-KGC, SDM-RDFizer, and Ontop (in materialization mode).  What is KROWN đź‘‘? KROWN đź‘‘ is a benchmark for materialization systems to construct Knowledge Graphs from (semi-)heterogeneous data sources using declarative mappings such as RML. Many benchmarks already exist for virtualization systems e.g. GTFS-Madrid-Bench, NPD, BSBM which focus on complex queries with a single declarative mapping. However, materialization systems are unaffected by complex queries since their input is the dataset and the mappings to generate a Knowledge Graph. Some specialized datasets exist to benchmark specific limitations of materialization systems such as duplicated or empty values in datasets e.g. GENOMICS, but they do not cover all aspects of materialization systems. Therefore, it is hard to compare materialization systems among each other in general which is where KROWN đź‘‘ comes in!        Results The raw results are available as ZIP archives, the analysis of the results are available in the spreadsheet results.ods. Evaluation setup We generated several scenarios using KROWN’s data generator and executed them 5 times with KROWN’s execution framework. All experiments were performed on Ubuntu 22.04 LTS machines (Linux 5.15.0, x86_64) with each Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz, 48 GB RAM memory, and 2 GB swap memory. The output of each materialization system was set to N-Triples. Materialization systems We selected the most popular maintained materialization systems for constructing RDF graphs for performing our experiments with KROWN: RMLMapper RMLStreamer Morph-KGC SDM-RDFizer OntopM (Ontop in materialization mode) Note: KROWN is flexible and allows adding any other materialization system, see KROWN’s execution framework documentation for more information. Scenarios We consider the following scenarios: Raw data: number of rows, columns and cell size Duplicates & empty values: percentage of the data containing duplicates or empty values Mappings: Triples Maps (TM), Predicate Object Maps (POM), Named Graph Maps (NG). Joins: relations (1-N, N-1, N-M), conditions, and duplicates during joins Note: KROWN is flexible and allows adding any other scenario, see KROWN’s data generator documentation for more information. In the table below we list all parameter values we used to configure our scenarios: Scenario Parameter values Raw data: rows 10K, 100K, 1M, 10M Raw data: columns 1, 10, 20, 30 Raw data: cell size 500, 1K, 5K, 10K  Duplicates: percentage 0%, 25%, 50%, 75%, 100% Empty values: percentage 0%, 25%, 50%, 75%, 100% Mappings: TMs + 5POMs 1, 10, 20, 30 TMs Mappings: 20TMs + POMs 1, 3, 5, 10 POMs Mappings: NG in SM 1, 5, 10, 15 NGs Mappings: NG in POM 1, 5, 10, 15 NGs Mappings: NG in SM/POM 1/1, 5/5, 10/10, 15/15 NGs Joins: 1-N relations 1-1, 1-5, 1-10, 1-15 Joins: N-1 relations 1-1, 5-1, 10-1, 15-1 Joins: N-M relations  3-3, 3-5, 5-3, 10-5, 5-10 Joins: join conditions 1, 5, 10, 15 Joins: join duplicates 0, 5, 10, 15 Â