publications | Hao-Yuan He

2025

NeurIPS
A Learnability Analysis on Neuro-Symbolic Learning

Hao-Yuan He, and Ming Li

In Advances in Neural Information Processing Systems 38, 2025 Spotlight

Abs arXiv Bib PDF

This paper presents a comprehensive theoretical analysis of the learnability of neuro-symbolic (NeSy) tasks within hybrid systems. We characterize the learnability of NeSy tasks by their derived constraint satisfaction problems (DCSPs), demonstrating that a task is learnable if and only if its corresponding DCSP admits a unique solution. Under mild assumptions, we establish the sample complexity for learnable tasks and show that, for general tasks, the asymptotic expected concept error is controlled by the degree of disagreement among DCSP solutions. Our findings unify the characterization of learnability and the phenomenon of reasoning shortcuts, providing theoretical guarantees and actionable guidance for the principled design of NeSy systems.
@inproceedings{he2025AnalysisOnNeSy, address = {San Diego, CA, USA}, author = {He, Hao-Yuan and Li, Ming}, booktitle = {Advances in Neural Information Processing Systems 38}, title = {A Learnability Analysis on Neuro-Symbolic Learning}, year = {2025} }
MLJ
Probabilistic Instance Dependent Label Refinement for Noisy Label Learning

Hao-Yuan He, Yu Liu, Ren-Biao Liu, Zheng Xie, and Ming Li

Machine Learning, 2025

Abs Bib PDF Code Poster Slides

Label refinement methods are designed to improve the quality of training labels by incorporating model predictions into the original training labels. By adjusting the combination coefficient of the noisy label, the impact of noise is reduced, which in turn makes the training process more robust. However, previous label refinement methods are unable to model instance-dependent noise, which is the most realistic type of noise. To address this limitation, we propose a simple approach, probabilistic instance-dependent label refinement (denoted as \pi-LR). Inspired by the fact that humans are more likely to make mistakes when annotating confusing instances, we propose to estimate the probability of whether a sample is confusing, which can be beneficial for modeling noise generation. Our approach utilizes this concept by assigning a confusing probability \eta_i to each instance \boldsymbolx_i from a probabilistic perspective. This provides a clear understanding of how instance-dependent noise affects true labels. Empirical evaluations show that \pi-LR enhances the robustness of the model in the presence of label noise and outperforms all compared methods on both realistic and synthetic label noise datasets.
@article{he2025piLR, author = {He, Hao-Yuan and Liu, Yu and Liu, Ren-Biao and Xie, Zheng and Li, Ming}, doi = {10.1007/s10994-024-06668-y}, journal = {Machine Learning}, publisher = {Springer}, title = {Probabilistic Instance Dependent Label Refinement for Noisy Label Learning}, volume = {124}, year = {2025} }

2024

ICML
Ambiguity-Aware Abductive Learning

Hao-Yuan He, Hui Sun, Zheng Xie, and Ming Li

In Proceedings of The 41st International Conference on Machine Learning, 2024

Abs Bib PDF Code Poster

Abductive Learning (ABL) is a promising framework for integrating sub-symbolic perception and logical reasoning through abduction. In this case, the abduction process provides supervision for the perception model from the background knowledge. Nevertheless, this process naturally contains uncertainty, since the knowledge base may be satisfied by numerous potential candidates. This implies that the result of the abduction process, i.e., a set of candidates, is ambiguous; both correct and incorrect candidates are mixed in this set. The prior art of abductive learning selects the candidate that has the minimal inconsistency of the knowledge base. However, this method overlooks the ambiguity in the abduction process and is prone to error when it fails to identify the correct candidates. To address this, we propose Ambiguity-Aware Abductive Learning (A3BL), which evaluates all potential candidates and their probabilities, thus preventing the model from falling into sub-optimal solutions. Both experimental results and theoretical analyses prove that A3BL markedly enhances ABL by efficiently exploiting the ambiguous abduced supervision.
@inproceedings{he2024a3bl, address = {Vienna, Austria}, author = {He, Hao-Yuan and Sun, Hui and Xie, Zheng and Li, Ming}, booktitle = {Proceedings of The 41st International Conference on Machine Learning}, pages = {18019--18042}, publisher = {PMLR}, title = {Ambiguity-Aware Abductive Learning}, volume = {235}, year = {2024} }
MLJ
Reduced Implication-bias Logic Loss for Neuro-Symbolic Learning

Hao-Yuan He, Wang-Zhou Dai, and Ming Li

Machine Learning, 2024

Abs Bib PDF Poster Slides

Integrating logical reasoning and machine learning by approximating logical inference with differentiable operators is a widely used technique in the field of Neuro-Symbolic Learning. However, some differentiable operators could introduce significant biases during backpropagation, which can degrade the performance of Neuro-Symbolic systems. In this paper, we demonstrate that the loss functions derived from fuzzy logic operators commonly exhibit a bias, referred to as Implication Bias. To mitigate this bias, we propose a simple yet efficient method to transform the biased loss functions into Reduced Implication-bias Logic Loss (RILL). Empirical studies demonstrate that RILL outperforms the biased logic loss functions, especially when the knowledge base is incomplete or the supervised training data is insufficient.
@article{he2024RILL, author = {He, Hao-Yuan and Dai, Wang-Zhou and Li, Ming}, doi = {10.1007/s10994-023-06436-4}, journal = {Machine Learning}, pages = {3357–-3377}, publisher = {Springer}, volume = {113}, year = {2024} }
TPAMI
Weakly Supervised AUC Optimization: A Unified Partial AUC Approach

Zheng Xie, Yu Liu, Hao-Yuan He, Ming Li, and Zhi-Hua Zhou

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

Abs Bib PDF Poster

Since acquiring perfect supervision is usually difficult, real-world machine learning tasks often confront inaccurate, incomplete, or inexact supervision, collectively referred to as weak supervision. In this work, we present WSAUC, a unified framework for weakly supervised AUC optimization problems, which covers noisy label learning, positive-unlabeled learning, multi-instance learning, and semi-supervised learning scenarios. Within the WSAUC framework, we first frame the AUC optimization problems in various weakly supervised scenarios as a common formulation of minimizing the AUC risk on contaminated sets, and demonstrate that the empirical risk minimization problems are consistent with the true AUC. Then, we introduce a new type of partial AUC, specifically, the reversed partial AUC (rpAUC), which serves as a robust training objective for AUC maximization in the presence of contaminated labels. WSAUC offers a universal solution for AUC optimization in various weakly supervised scenarios by maximizing the empirical rpAUC. Theoretical and experimental results under multiple settings support the effectiveness of WSAUC on a range of weakly supervised AUC optimization tasks.
@article{xie2024weakly, address = {Los Alamitos, CA, USA}, author = {Xie, Zheng and Liu, Yu and He, Hao-Yuan and Li, Ming and Zhou, Zhi-Hua}, doi = {10.1109/TPAMI.2024.3357814}, issn = {1939-3539}, journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence}, number = {07}, pages = {4780-4795}, publisher = {IEEE Computer Society}, title = {Weakly Supervised AUC Optimization: A Unified Partial AUC Approach}, volume = {46}, year = {2024} }