Publications

Dense Retrieval with Continuous Explicit Feedback for Systematic Review Screening Prioritisation

Published in Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2024

Abstract

The goal of screening prioritisation in systematic reviews is to identify relevant documents with high recall and rank them in early positions for review. This saves reviewing effort if paired with a stopping criterion, and speeds up review completion if performed alongside downstream tasks. Recent studies have shown that neural models have good potential on this task, but their time-consuming fine-tuning and inference discourage their widespread use for screening prioritisation. In this paper, we propose an alternative approach that still relies on neural models, but leverages dense representations and relevance feedback to enhance screening prioritisation, without the need for costly model fine-tuning and inference. This method exploits continuous relevance feedback from reviewers during document screening to efficiently update the dense query representation, which is then applied to rank the remaining documents to be screened. We evaluate this approach across the CLEF TAR datasets for this task. Results suggest that the investigated dense query-driven approach is more efficient than directly using neural models and shows promising effectiveness compared to previous methods developed on the considered datasets. Our code is available at https://github.com/ielab/dense-screening-feedback.

Recommended citation: Xinyu Mao, Shengyao Zhuang, Bevan Koopman and Guido Zuccon. 2024. Dense Retrieval with Continuous Explicit Feedback for Systematic Review Screening Prioritisation. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2024). https://arxiv.org/pdf/2407.00635

A Reproducibility Study of Goldilocks: Just-Right Tuning of BERT for TAR

Published in Proceedings of the 46th European Conference on Information Retrieval (ECIR), 2024

Abstract

Screening documents is a tedious and time-consuming aspect of high-recall retrieval tasks, such as compiling a systematic literature review, where the goal is to identify all relevant documents for a topic. To help streamline this process, many Technology-Assisted Review (TAR) methods leverage active learning techniques to reduce the number of documents requiring review. BERT-based models have shown high effectiveness in text classification, leading to interest in their potential use in TAR workflows. In this paper, we investigate recent work that examined the impact of further pre-training epochs on the effectiveness and efficiency of a BERT-based active learning pipeline. We first report that we could replicate the original experiments on two specific TAR datasets, confirming some of the findings: importantly, that further pre-training is critical to high effectiveness, but requires attention in terms of selecting the correct training epoch. We then investigate the generalisability of the pipeline on a different TAR task, that of medical systematic reviews. In this context, we show that there is no need for further pre-training if a domain-specific BERT backbone is used within the active learning pipeline. This finding provides practical implications for using the studied active learning pipeline within domain-specific TAR tasks.

Recommended citation: Xinyu Mao, Bevan Koopman and Guido Zuccon. 2024. A Reproducibility Study of Goldilocks: Just-Right Tuning of BERT for TAR. In Proceedings of the 46th European Conference on Information Retrieval (ECIR 2024). https://arxiv.org/pdf/2401.08104

Robustness of Neural Rankers to Typos: A Comparative Study

Published in Proceedings of the 26th Australasian Document Computing Symposium (ADCS), 2022

Abstract

Recent advances in passage retrieval have seen the introduction of pre-trained language models (PLMs) based neural rankers. While generally very effective, little attention has been paid to the robustness of these rankers. In this paper, we study the effectiveness of state-of-the-art PLM rankers in presence of typos in queries, as an indication of the rankers’ robustness. As of PLM rankers, we consider the two most promising directions explored in previous work: dense retrievers vs. sparse retrievers. We find that both types of rankers are very sensitive to queries with typos. We then apply an existing augmentation-based typos-aware training technique with the aim of creating typo-robust dense and sparse retrievers. We find that this simple technique only works for dense retrievers, while it hurts effectiveness when used on sparse retrievers.

Recommended citation: Shengyao Zhuang, Xinyu Mao and Guido Zuccon. 2022. Robustness of Neural Rankers to Typos: A Comparative Study. In Proceedings of the 26th Australasian Document Computing Symposium (ADCS 2022). https://ielab.io/publications/adcs2022-typos/adcs2022-comparative-study.pdf

Preserving the Privacy and Cybersecurity of Home Energy Data

Published in Emerging Trends in Cybersecurity Applications, 2022

Abstract

The field of energy data presents many opportunities for applying the principles of privacy and cybersecurity. In this chapter, we focus on home electricity data and the possible use and misuse of this data for attacks and corresponding protection mechanisms. If an attacker can deduce sufficiently precise information about a house location and its occupancy at given times, this may present a physical security threat. We review previous literature in this area. We then obtain hourly solar generation data from over 2300 houses and develop an attack to identify the location of the houses using historical weather data. We discuss common use cases of home energy data and suggest defences against the proposed attack using privacy and cryptographic techniques.

Recommended citation: Richard Bean, Yanjun Zhang, Ryan KL Ko, Xinyu Mao and Guangdong Bai. 2023. Preserving the Privacy and Cybersecurity of Home Energy Data. Emerging Trends in Cybersecurity Applications. Springer. https://pure.rug.nl/ws/portalfiles/portal/563519192/978_3_031_09640_2.pdf#page=328