Sitemap
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Pages
Posts
Future Blog Post
Published:
This post will show up by default. To disable scheduling of future posts, edit config.yml
and set future: false
.
Blog Post number 4
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 3
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 2
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 1
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
awards
Dean’s commendation for academic excellence
For academic achievement in sem 2, 2022, The University of Queensland
ADCS Travel Grant
$515 Travel Grant for ADCS 2022, Adelaide, Australia
UQ Earmarked PhD Scholarship
UQ Earmarked scholarship for tuition fee and living stipend
Best Student Paper Award
Best student paper award on ADCS 2022 for paper Robustness of Neural Rankers to Typos: A Comparative Study
publications
Preserving the Privacy and Cybersecurity of Home Energy Data
Published in Emerging Trends in Cybersecurity Applications, 2022
Abstract
The field of energy data presents many opportunities for applying the principles of privacy and cybersecurity. In this chapter, we focus on home electricity data and the possible use and misuse of this data for attacks and corresponding protection mechanisms. If an attacker can deduce sufficiently precise information about a house location and its occupancy at given times, this may present a physical security threat. We review previous literature in this area. We then obtain hourly solar generation data from over 2300 houses and develop an attack to identify the location of the houses using historical weather data. We discuss common use cases of home energy data and suggest defences against the proposed attack using privacy and cryptographic techniques.
Recommended citation: Richard Bean, Yanjun Zhang, Ryan KL Ko, Xinyu Mao and Guangdong Bai. 2023. Preserving the Privacy and Cybersecurity of Home Energy Data. Emerging Trends in Cybersecurity Applications. Springer. https://pure.rug.nl/ws/portalfiles/portal/563519192/978_3_031_09640_2.pdf#page=328
Robustness of Neural Rankers to Typos: A Comparative Study
Published in Proceedings of the 26th Australasian Document Computing Symposium (ADCS), 2022
Abstract
Recent advances in passage retrieval have seen the introduction of pre-trained language models (PLMs) based neural rankers. While generally very effective, little attention has been paid to the robustness of these rankers. In this paper, we study the effectiveness of state-of-the-art PLM rankers in presence of typos in queries, as an indication of the rankers’ robustness. As of PLM rankers, we consider the two most promising directions explored in previous work: dense retrievers vs. sparse retrievers. We find that both types of rankers are very sensitive to queries with typos. We then apply an existing augmentation-based typos-aware training technique with the aim of creating typo-robust dense and sparse retrievers. We find that this simple technique only works for dense retrievers, while it hurts effectiveness when used on sparse retrievers.
Recommended citation: Shengyao Zhuang, Xinyu Mao and Guido Zuccon. 2022. Robustness of Neural Rankers to Typos: A Comparative Study. In Proceedings of the 26th Australasian Document Computing Symposium (ADCS 2022). https://ielab.io/publications/adcs2022-typos/adcs2022-comparative-study.pdf
A Reproducibility Study of Goldilocks: Just-Right Tuning of BERT for TAR
Published in Proceedings of the 46th European Conference on Information Retrieval (ECIR), 2024
Abstract
Screening documents is a tedious and time-consuming aspect of high-recall retrieval tasks, such as compiling a systematic literature review, where the goal is to identify all relevant documents for a topic. To help streamline this process, many Technology-Assisted Review (TAR) methods leverage active learning techniques to reduce the number of documents requiring review. BERT-based models have shown high effectiveness in text classification, leading to interest in their potential use in TAR workflows. In this paper, we investigate recent work that examined the impact of further pre-training epochs on the effectiveness and efficiency of a BERT-based active learning pipeline. We first report that we could replicate the original experiments on two specific TAR datasets, confirming some of the findings: importantly, that further pre-training is critical to high effectiveness, but requires attention in terms of selecting the correct training epoch. We then investigate the generalisability of the pipeline on a different TAR task, that of medical systematic reviews. In this context, we show that there is no need for further pre-training if a domain-specific BERT backbone is used within the active learning pipeline. This finding provides practical implications for using the studied active learning pipeline within domain-specific TAR tasks.
Recommended citation: Xinyu Mao, Bevan Koopman and Guido Zuccon. 2024. A Reproducibility Study of Goldilocks: Just-Right Tuning of BERT for TAR. In Proceedings of the 46th European Conference on Information Retrieval (ECIR 2024). https://arxiv.org/pdf/2401.08104
Dense Retrieval with Continuous Explicit Feedback for Systematic Review Screening Prioritisation
Published in Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2024
Abstract
The goal of screening prioritisation in systematic reviews is to identify relevant documents with high recall and rank them in early positions for review. This saves reviewing effort if paired with a stopping criterion, and speeds up review completion if performed alongside downstream tasks. Recent studies have shown that neural models have good potential on this task, but their time-consuming fine-tuning and inference discourage their widespread use for screening prioritisation. In this paper, we propose an alternative approach that still relies on neural models, but leverages dense representations and relevance feedback to enhance screening prioritisation, without the need for costly model fine-tuning and inference. This method exploits continuous relevance feedback from reviewers during document screening to efficiently update the dense query representation, which is then applied to rank the remaining documents to be screened. We evaluate this approach across the CLEF TAR datasets for this task. Results suggest that the investigated dense query-driven approach is more efficient than directly using neural models and shows promising effectiveness compared to previous methods developed on the considered datasets. Our code is available at https://github.com/ielab/dense-screening-feedback.
Recommended citation: Xinyu Mao, Shengyao Zhuang, Bevan Koopman and Guido Zuccon. 2024. Dense Retrieval with Continuous Explicit Feedback for Systematic Review Screening Prioritisation. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2024). https://arxiv.org/pdf/2407.00635
teaching
DATA7001 Introduction to Data Science
Postgraduate course, The University of Queensland, 2023
Check course profile
INFS7410 Information Retrieval and Web Search
Postgraduate course, The University of Queensland, 2023
Check course profile
INFS7410 Information Retrieval and Web Search
Postgraduate course, The University of Queensland, 2024
Check course profile