References

A

Aha,1991

Aha, D.W., Kibler, D. & Albert, M.K. Instance-based learning algorithms. Mach Learn 6, 37–66 (1991). https://doi.org/10.1007/BF00153759

Amarel,1960s

S. Amarel, "Program Synthesis as a Theory Formation Task: Problem Representations and Solution Methods" in Machine Learning: An Artificial Intelligence Approach: Volume II, Morgan Kaufmann, pp. 499-569, 1986. THis paper, written in the 1960s, was reprinted in this collection.

C

Chapelle,2006

Chapelle, Olivier, Bernhard Scholkopf, and Alexander Zien. "Semi-supervised learning (chapelle, o. et al., eds.; 2006)[book reviews]." IEEE Transactions on Neural Networks 20.3 (2009): 542-542.

Columbia,2003

Columbia Accident Investigation Board (CAIB). (2003). Report Volume I. National Aeronautics and Space Administration.

Crawford,1994

J. Crawford and A. Baker, "Experimental Results on the Application of Satisfiability Algorithms to Scheduling Problems", Proc. American Assoc. Artificial Intelligence (AAAI 94), pp. 1092-1097, 1994.

F

Fisher,2012

Danyel Fisher, Rob DeLine, Mary Czerwinski, and Steven Drucker. 2012. Interactions with big data analytics. interactions 19, 3 (May + June 2012), 50–59. https://doi.org/10.1145/2168931.2168943

Fu,2017

Fu, W., Menzies (2017). Easy over Hard: A Case Study on Deep Learning, In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2017). Association for Computing Machinery, New York, NY, USA, 49–60. https://doi.org/10.1145/3106237.3106256

H

Hindle,2012

A. Hindle, E. T. Barr, Z. Su, M. Gabel, and P. Devanbu, “On the naturalness of software,” in Proc. 34th Int. Conf. Softw. Eng. (ICSE), Piscataway, NJ, USA: IEEE Press, 2012, pp. 837–847.

Hou,2024

C. Hou, Y. Zhao, Y. Liu, Z. Yang, K. Wang, and et al. 2024. Large Language Models for SE: A Systematic Literature Review. TOSEM 33, 8 (Sept. 2024).

Hutson,2018

Matthew Hutson , Artificial intelligence faces reproducibility crisis. Science 359, 725-726 (2018).DOI:10.1126/science.359.6377.725

J

Johnson,1984

W. B. Johnson and J. Lindenstrauss, “Extensions of lipschitz mappings into a hilbert space,” Contemporary Math., vol. 26, pp. 189–206, 1984.

K

Kocaguneli,2013

E. Kocaguneli, T. Menzies, J. Keung, D. Cok, and R. Madachy, “Active learning and effort estimation: Finding the essential content of software effort estimation data,” IEEE Trans. Softw. Eng., vol. 39, no. 8, pp. 1040–1053, Aug.2013.

Kohavi,1997

R. Kohavi and G. H. John, "Wrappers for Feature Subset Selection", Artificial Intelligence, vol. 97, no. 1–2, pp. 273-324, 1997.

L

Lin,2015

Z. Lin and J. Whitehead, “Why power laws? An explanation from fine-grained code changes,” in Proc. IEEE/ACM 12th Work. Conf. Mining Softw. Repositories, Piscataway, NJ, USA: IEEE Press, 2015, pp. 68–75.

Lustosa,2025

A. Lustosa and T. Menzies. 2025. Less Noise, More Signal: DRR for Better Optimizations of SE Tasks arXiv:2503.21086 https://arxiv.org/abs/2503.21086

M

Majumder,2018

Majumder, S., Balaji, N., Brey, K., Fu, W., & Menzies, T. (2018). 500+ times faster than deep learning. In Proceedings of the 15th International Conference on Mining Software Repositories. ACM.

Menzies,2007a

T. Menzies, D. Owen, and J. Richardson. 2007. The Strangest Thing About Software Computer 40, 1 (2007), 54–60.

Menzies,BL

T. Menzies. 2025. BareLogic Python Source Code. https://github.com/timm/ barelogic/blob/main/src/bl.py.

Menzies,MOOT

T. Menzies. 2025. MOOT= Many multi-objective optimization tests. https://github. com/timm/moot/tree/master/optimize.

Menzies,2008

T. Menzies, B. Turhan, A. Bener, G. Gay, B. Cukic, and Y. Jiang, “Implications of ceiling effects in defect predictors,” in Proc. 4th Int. Workshop Predictor Models Softw. Eng., 2008, pp. 47–54.

Menzies,2025a

"Retrospective: Data Mining Static Code Attributes to Learn Defect Predictors" in IEEE Transactions on Software Engineering, vol. 51, no. 03, pp. 858-863, March 2025, doi: 10.1109/TSE.2025.3537406.

N

Nair,2018

Nair, V., Menzies, T., Siegmund, N., & Apel, S. (2018). Using bad learners to find good configurations. Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 448-459.

O

Ostrand,2004

Thomas J. Ostrand, Elaine J. Weyuker, and Robert M. Bell. 2004. Where the bugs are. SIGSOFT Softw. Eng. Notes 29, 4 (July 2004), 86–96. https://doi.org/10.1145/1013886.1007524

P

Pearson,1902

Pearson, K. (1902). On Lines and Planes of Closest Fit to Systems of Points in Space. Philosophical Magazine, 6(2), 559-572. https://doi.org/10.1080/14786440209478072

Peters,2015

F. Peters, T. Menzies, and L. Layman, “Lace2: Better privacy-preserving data sharing for cross project defect prediction,” in Proc. IEEE/ACM 37th IEEE Int. Conf. Softw. Eng., vol. 1, 2015, pp. 801–811.

S

Settles,2009

D. Settles. 2009. Active learning literature survey. Technical Report 1648. U. Wisconsin-Madison Dept of CS.

Strubell,2018

Strubell, E., Ganesh, A., & McCallum, A. (2019). Energy and Policy Considerations for Deep Learning in NLP. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 3645-3650.

T

Tawosi,2023

V. Tawosi, R. Moussa, and F. Sarro. 2023. Agile Effort Estimation: Have We Solved the Problem Yet? IEEE Trans SE 49, 4 (2023), 2677–2697.

Tu,2022

H. Tu and T. Menzies, “Frugal: unlocking semi-supervised learning for software analytics,” in Proc. ASE, Piscataway, NJ, USA: IEEE Press, 2022, pp. 394–406.

V

Van Engelen,2020

Van Engelen, Jesper E., and Holger H. Hoos. "A survey on semi-supervised learning." Machine learning 109.2 (2020): 373-440.

W

Williams,2002

R. Williams, C. P. Gomes and B. Selman, "Backdoors to Typical Case Complexity", Proc. IJCAI, 2003, [online] Available: www.cs.cornell.edu/gomes/papers/backdoors.pdf.

X

Xu,2015

Tianyin Xu, Long Jin, Xuepeng Fan, Yuanyuan Zhou, Shankar Pasupathy, and Rukma Talwadker. 2015. Hey, you have given me too many knobs!: understanding and dealing with over-designed configuration in system software. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015). Association for Computing Machinery, New York, NY, USA, 307–319. https://doi.org/10.1145/2786805.2786852

Xu,2021

Z. Xu , “A comprehensive comparative study of clustering-based unsupervised defect prediction models,” J. Syst. Softw., vol. 172, 2021, Art. no. 110862.

Z

Zeming,2024

L. Zeming and et al. 2024. A Survey on Knowledge Distillation for Large Language Models. ACM Trans. Int. Systems & Technology (2024).