Skip to content
Change the repository type filter

All

    Repositories list

    • MatchGPT

      Public
      This repository contains code and extensive prompt examples to reproduce and extend the experiments in our papers "Using ChatGPT for Entity Matching" and "Entity Matching using Large Language Models".
      Jupyter Notebook
      114900Updated Oct 18, 2024Oct 18, 2024
    • This repository contains code and comprehensive examples to replicate and build upon the experiments presented in our paper “Fine-tuning Large Language Models for Entity Matching” The repository provides resources for implementing fine-tuning techniques on large language models specifically for entity matching tasks.
      Jupyter Notebook
      1700Updated Sep 13, 2024Sep 13, 2024
    • wdc-pave

      Public
      Web Data Commons - Using LLMs for Product Attribute Value Extraction and Normalization
      Python
      1800Updated Jul 3, 2024Jul 3, 2024
    • wdc-page

      Public
      This repository contains the source files of the Web Data Commons website and is used to maintain the site. The Web Data Commons project extracts structured data from the Common Crawl
      HTML
      1100Updated Jul 3, 2024Jul 3, 2024
    • SC-Block

      Public
      SC-Block is a supervised contrastive blocking method which combines supervised contrastive learning for positioning records in an embedding space and nearest neighbour search for candidate set building.
      Python
      BSD 3-Clause "New" or "Revised" License
      2800Updated Jun 10, 2024Jun 10, 2024
    • wdc-sotab

      Public
      Jupyter Notebook
      0410Updated Jun 3, 2024Jun 3, 2024
    • TabAnnGPT

      Public
      This repository contains the code for the experiments run in the papers "Column Type Annotation using ChatGPT" and "Column Property Annotation using Large Language Models".
      Jupyter Notebook
      2900Updated May 28, 2024May 28, 2024
    • Attribute Value Extraction using Large Language Models
      Python
      Apache License 2.0
      72200Updated May 24, 2024May 24, 2024
    • wdc-smb

      Public
      This repository contains the code and data download links to reproduce building the WDC SMB Benchmark.
      BSD 3-Clause "New" or "Revised" License
      0000Updated Dec 11, 2023Dec 11, 2023
    • Product Information Extraction using ChatGPT
      Jupyter Notebook
      0200Updated Oct 4, 2023Oct 4, 2023
    • This repository contains code and data download scripts for the paper "Using schema.org annotations for training and maintaining product matchers" by Ralph Peeters, Anna Primpeli, Benedikt Wichtlhuber and Christian Bizer.
      Jupyter Notebook
      BSD 3-Clause "New" or "Revised" License
      31510Updated Aug 29, 2023Aug 29, 2023
    • This repository contains the code and data download links to reproduce building the WDC Products Benchmark.
      Python
      BSD 3-Clause "New" or "Revised" License
      31200Updated Jul 13, 2023Jul 13, 2023
    • Jupyter notebooks used to create the schema.org subsets from the MD and JSON-LD corpus for the WDC 2020 structured data extraction.
      Python
      1300Updated Feb 28, 2023Feb 28, 2023
    • Java Framework which is used by the Web Data Commons project to extract Microdata, Microformats and RDFa data, Web graphs, and HTML tables from the web crawls provided by the Common Crawl Foundation.
      Java
      1800Updated Dec 13, 2022Dec 13, 2022
    • This repository contains code and data download scripts for the paper "Intermediate Training of BERT for Product Matching" by Ralph Peeters, Christian Bizer and Goran Glavaš.
      Python
      BSD 3-Clause "New" or "Revised" License
      1135020Updated Dec 8, 2022Dec 8, 2022
    • Java project for profiling the results of the yearly Web Data Commons extraction of structured data with RDFa, Microdata, Microformat, and Embedded JSON-LD annotations.
      Java
      0000Updated Oct 17, 2022Oct 17, 2022
    • winter

      Public
      WInte.r is a Java framework for end-to-end data integration. The WInte.r framework implements well-known methods for data pre-processing, schema matching, identity resolution, data fusion, and result evaluation.
      Java
      Apache License 2.0
      32805Updated Sep 1, 2022Sep 1, 2022
    • Unsupervised Bootstrapping of Active Learning for Entity Resolution
      Jupyter Notebook
      1601Updated Mar 25, 2022Mar 25, 2022
    • Code for profiling entity matching tasks using the dimensions described in the following paper: Primpeli, Anna, and Christian Bizer. "Profiling entity matching benchmark tasks." Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2020.
      Python
      0600Updated Mar 8, 2022Mar 8, 2022
    • This repository contains the code to reproduce the experiments of the poster "Supervised Contrastive Learning for Product Matching"
      Python
      BSD 3-Clause "New" or "Revised" License
      143640Updated Feb 11, 2022Feb 11, 2022
    • This repository contains the code and data for reproducing the results of the paper "Active Learning for Multi-Source Entity Matching: How do the Characteristics of the Task Impact Performance?" .
      Python
      1200Updated Nov 30, 2021Nov 30, 2021
    • ALMSER-GB

      Public
      This repository contains the code and data for reproducing the results of the paper "Graph-boosted Active Learning for Multi-Source Entity Resolution" presented at ISWC2021.
      Jupyter Notebook
      2400Updated Oct 1, 2021Oct 1, 2021
    • Code and Data to reproduce the results of the Master Thesis of Stephan Waitz on "Combining Deep Learning and Active Learning for Entity Resolution"
      HTML
      2000Updated Sep 2, 2021Sep 2, 2021
    • jointbert

      Public
      This repository contains the code and data download links to reproduce the experiments of the PVLDB paper "Dual-Objective Fine-Tuning of BERT for Entity Matching" by Ralph Peeters and Christian Bizer.
      Python
      BSD 3-Clause "New" or "Revised" License
      61400Updated Jun 7, 2021Jun 7, 2021
    • This repository contains the code and data download links to reproduce the building process of the 2021 Schema.org Table Corpus.
      Python
      BSD 3-Clause "New" or "Revised" License
      2300Updated May 12, 2021May 12, 2021
    • A Search Join is a join operation which extends a user-provided table with additional attributes based on a large corpus of heterogeneous data originating from the Web or corporate intranets.
      Java
      3100Updated May 11, 2021May 11, 2021
    • This repository contains code and data download instructions for the workshop paper "Improving Hierarchical Product Classification using Domain-specific Language Modelling" by Alexander Brinkmann and Christian Bizer.
      Python
      MIT License
      21700Updated Apr 30, 2021Apr 30, 2021