A supervised machine learning tool to predict the bactericidal efficiency of nanostructured surface

Chen, Yaxi; Chen, Hongyi; Harker, Anthony; Liu, Yuanchang; Huang, Jie

doi:10.1186/s12951-024-02974-8

Research
Open access
Published: 03 December 2024

A supervised machine learning tool to predict the bactericidal efficiency of nanostructured surface

Yaxi Chen¹,
Hongyi Chen²,
Anthony Harker³,
Yuanchang Liu¹ &
…
Jie Huang¹

Journal of Nanobiotechnology volume 22, Article number: 748 (2024) Cite this article

1225 Accesses
1 Altmetric
Metrics details

Abstract

The emergence and rapid spread of multidrug-resistant bacterial strains is a growing concern of public health. Inspired by the natural bactericidal surfaces of lotus leaves and shark skin, increasing attention has been focused on the use of mechano-bactericidal methods to create surfaces with antibacterial and/or bactericidal effects. There have been several studies exploring the bactericidal effect of nanostructured surfaces under various combinations of parameters. However, the correlation and synergies between these factors still need to be clarified. Recently machine learning (ML), which enables prediction or decision-making based on data, has been used in the field of biomaterials with promising results. In this study, we explored ML in nanotechnology to investigate the antimicrobial potential of nanostructured surfaces. A dataset of nanostructured surfaces and their antimicrobial properties was built by extracting the published literature. Based on the literature review and the distribution of our dataset, 70% bactericidal efficiency was selected as a practical benchmark for our classification model that balances stringent bactericidal performance with achievable targets in diverse conditions. Subsequently, we developed an ML classification model, which demonstrated an 81% accuracy in its predictive capability. A regression model was further developed to predict the value of bactericidal efficiency for nanostructured surfaces. Feature importance analysis of the ML models suggested that nanotopographical features have a greater influence on bactericidal properties than material properties, thus providing insight into the principles of the mechano-bactericidal effect of nanostructured surfaces. Overall, this ML model tool could help researchers to effectively select and design the parameters of the surface structure prior to experimentation, thereby improving the timeliness and reducing the number of experiments and the associated costs.

Graphical Abstract

Introduction

In healthcare environments, infections can lead to increased morbidity, mortality, and healthcare costs, highlighting the need for stringent infection control practices [1]. Effective infection control measures are essential to prevent the spread of harmful pathogens and reduce the risk of nosocomial infections [2]. Within the context of bacterial contamination in medical and surgical fields, biofilm formation (surface-attached, structured microbial communities containing bacteria [3]) poses a significant challenge, as it can lead to the destruction of adjacent tissue, poor vascularization, implant loosening, detachment, and even dislocations [4]. These complications often arise during surgical procedures when implants become particularly vulnerable to bacterial contamination from the skin and mucous membranes. This susceptibility allows for the rapid progression of device-associated infections, as planktonic bacteria initially adhere to the implant interface and eventually evolve into biofilms, further exacerbating the problem. The use of coatings on implant surfaces containing antibiotics or other bactericidal agents, such as heavy metals like silver, copper, or zinc, has gained popularity as a method to prevent microbial colonization on implants [5]. However, the emergence of antibiotic-resistant bacteria and the limitations of conventional disinfection methods have led to a growing need for alternative solutions to address bacterial contamination effectively [6].

Over the past decade, researchers have gradually begun to focus on the use of mechano-physical methods to create surfaces with antibacterial and/or bactericidal effects. These mechano-bactericidal surfaces were bioinspired, and they were originally found on lotus leaves, shark skin, gecko skin, cicada wings, and dragonfly wings [7, 8]. The denticles (diamond-shaped scales covering the outer surface) and arrangement on shark skin create a superhydrophobic surface that exhibits good anti-fouling properties by creating a surface that is not conducive to bacterial adhesion. In contrast, the highly ordered array of nanopillars or nanocones of different sizes, heights, and spatial distribution on the wings of the cicada is capable of killing bacteria upon contact, creating a bactericidal effect [5, 9, 10]. It was proposed that the mechano-bactericidal activity of nanostructured surfaces is the result of the mechanical interaction between the bacteria and the nanopatterns, which is governed by surface geometry. This creates the possibility of drug-free bactericidal surfaces [11].

One of the generally accepted theories of the bactericidal mechanism of nanostructured surfaces is a biophysical interaction between the bacterial cell wall and the nanopatterns. The geometry (e.g., diameters, height) and spacing of nanopatterns influence how the cell wall is ruptured: Dense and blunt nanopillars tend to overstretch the membrane of adhered bacteria, whereas sharp ones tend to directly impale the cells [5]. In addition, similar nanostructured surfaces may have different bactericidal efficiencies for different bacteria because the membrane structures of Gram-negative and Gram-positive bacteria are considerably different: Gram-positive bacteria have an inner membrane and a thick peptidoglycan layer (20–100 nm), while Gram-negative bacteria have an outer and inner membrane and an intermediate peptidoglycan layer (only a few nanometres thick). Besides cell wall rupture, an alternative mode of Gram-negative cell wall disruption involves the separation of the cytoplasmic membrane from the cell wall, which may be mediated by adhesive extracellular polymeric substances (EPSs) and triggered by the forces originating from the movement of bacterial cells trapped on the crest of nanopillars [12]. There is now growing evidence that other factors may also be involved in the bactericidal mechanism of nanostructured surfaces. For example, it has been discovered that mechanical injury is not sufficient to kill the bacteria immediately due to the survival of the inner plasma membrane. Instead, such sublethal mechanical injury leads to apoptosis-like death in affected bacteria [10].

To date, these antibacterial and/or bactericidal nanostructured surfaces have been fabricated from a wide range of materials including silicon [13], metals (e.g., titanium [14], gold [15], and ZnO [16]), and polymers [17, 18]. The fabrication methods for these synthetic nanostructured surfaces often allow close control over the parameters that define the nanostructured surfaces (the height, spacing, and diameter of the nanopatterns) and lead to bactericidal efficiencies that frequently exceed those of natural nanostructured surfaces [19]. The key factors affecting mechano-bactericidal properties of the nanostructured surface can be divided into two aspects: [1] Topography: shape and size of nanopattern, surface roughness, and wettability [2, 5] Materials: elasticity, flexibility, hydrophobicity, and lipophilicity. The bactericidal efficiency of nanostructured surfaces varies depending on the specific type of bacteria they are targeting. Additionally, one crucial inquiry concerning the bactericidal properties of nanopatterns involves determining the ideal design parameters that can maximize their bactericidal effectiveness while minimizing any potential negative impacts such as cytotoxicity [20]. However, the factors that determine the precise bactericidal activity of nanostructured surfaces are intricate and remain largely unexplored, giving rise to numerous contentious conclusions. For example, wettability is an important parameter for the early attachment of bacteria to a surface, however, the exact relationship between hydrophobicity and bacterial adhesion remains debatable. It has been shown that E. coli adhesion levels are highest on moderately hydrophobic surfaces (WCA = 95°) and lowest on hydrophilic (WCA < 30°) and superhydrophobic (WCA > 120°) surfaces [21]. Whereas antifouling surfaces usually prevent the adhesion of bacteria through their superhydrophobic properties, the same is true for many nanostructured surfaces. Another example is the currently prevailing postulation suggesting that the extent of membrane stretch increases with increasing diameters of nanopillars, but that the effect is minimal when the diameter exceeds 20 nm [22]. However, there are also results from in silico studies showing that reducing the diameters from 30 to 10 nm can increase the strain on the bacterial envelope by 25% [23]. These show that individual parameters of nanostructured surface are often entangled with other parameters, which makes the principle of bactericidal activity ambiguous.

There have been several studies exploring the bactericidal effect of nanostructured surfaces under various combinations of parameters. Some researchers conducted a comprehensive analysis to examine the relationship between the bactericidal activity of bacterial species and the various parameters tested on nanostructured surfaces [24]. However, their analysis was limited to a single-factor examination, and although it revealed the impact of different factors on bactericidal performance, it did not elucidate the direct correlation and synergistic effects among these factors. Understanding the impact of these factors on bactericidal efficiency is of great significance because it can help researchers better design nanostructured surfaces with targeted bactericidal effects. Machine learning (ML) is a branch of artificial intelligence (AI) that involves the development of algorithms and models capable of learning from and making predictions or decisions based on data [25]. Employing data analysis techniques based on ML principles could be a useful solution to this problem. Therefore, ML models are increasingly being utilized to address a diverse range of challenges in biomedical science and material science research [26]. For example, ML models have been used for predicting the cytotoxicity of nanomaterials [27] and predicting nanoparticle delivery to tumours [28]. ML methods use mathematical and statistical tools to identify and utilize the connections within the data, allowing for the creation of intricate models to describe the system [29].

Although ML has demonstrated its potential for widespread application, only a few studies applied ML to the development of nanosurfaces. To the best of our knowledge, our study is the first to use ML techniques to predict the antimicrobial properties of nanostructured surfaces. We aim to use ML to aid the prediction of the bactericidal efficiency of nanostructured surfaces and determine the optimum parameters for nanostructured surfaces to achieve superior mechano-bactericidal properties. An essential advantage of utilizing ML is that it enables experimenters to precisely define the range of experimental parameters, thereby substantially reducing the time, resources, and laboratory animal usage involved in these endeavours. Furthermore, we focus on the feature importance of the ML model and interpretation, and explore the mechanisms of bactericidal activity on nanostructured surfaces.

Results

Data acquisition

After a comprehensive review of 2,919 publication literature, 45 papers were selected and considered relevant to this research. 293 different nanostructured surfaces were studied in terms of substrate material, nanostructure shape and size, and surface hydrophobicity. The raw dataset is provided in Table S5. Data distribution of experiment parameters in the database was visualized by histograms and kernel density estimation (KDE) plots (Fig. S1). As depicted in the figure, some outliers existed in the database. For example, most nanopatterns are found in the height range 0–6500 nm, but a few reached 32,000 nm.

Titanium and silicon were the main choices of substrate materials for the fabrication of nanostructures. In contrast, the dataset is more evenly distributed among the bacterial species, centred on E. coli, P. aeruginosa, and S. aureus (Fig. 1). Of these, 121 were studies of Gram-positive bacteria and 173 were studies of Gram-negative bacteria. The nanopattern is also more evenly distributed in terms of shape, consisting mainly of pillar, but also partly of tube, cone, wire, spike, etc. There are 192 surfaces that are hydrophilic with a WCA ≤ 90° and 102 hydrophobic surfaces with a WCA > 90°. Details of the dataset can be found in the supplementary information.

Data pre-processing

The primary dataset comprised 293 rows and 12 columns (11 inputs, 1 output). The input data consisted of diameter (nm), height (nm), spacing (nm), aspect ratio, surface roughness (nm), water contact angle (WCA) (°) reported in numeric values. Variables with nominal values included materials, shape of nanopatterns, bacteria Strain, Gram-stain type motility, and shape of bacteria as summarized in Tables 1, 2 and 3.

Input transformation

For materials of nanostructured surfaces, a simplified classification has been made due to the wide variety contained, e.g. Ti, Ti6Al4V, TiOH and TiO₂ are classified as Ti-based.

Table 1 Summary of the primary and final input variables of materials parameters in data pre-processing

Full size table

For nanotopogrpahy, the features such as diameter, height, spacing and aspect ratio are a good representation of the shape of the nanopattern, thus these features have been retained and the shape of the nanopattern has been eliminated. Surface roughness has approximately 90% or more missing values and was therefore excluded. Diameter, height, spacing, aspect ratio, and WCA all had less than 30% missing values and were retained for the next data imputation process.

Table 2 Summary of the primary and final input variables of nanotopography parameters in data pre-processing

Full size table

Similarly, the Gram-stain type, motility and shape are representative of the bacterial membrane structure, therefore these three features are selected as input and the name of bacterial species is eliminated.

Table 3 Summary of the primary and final input variables of bacteria features in data pre-processing

Full size table

Output transformation

We chose 70% as a threshold for our classification model building. This threshold is not arbitrarily set but is a reflection of a consensus within the nanobactericidal surface research community. We specifically referenced several articles that included nanobactericidal surfaces with more than five different parameters rather than a single morphology [30,31,32,33,34,35,36,37,38]. The distribution of bactericidal efficiency in these experiments was relatively uniform from 0 to 100%, with efficacious surfaces concentrated in the range of 60–80%, with 70% emerging as a practical benchmark that balances stringent bactericidal performance with achievable targets in diverse conditions. Thus, for regression models we kept the percentage of bactericidal efficiency as output features; for binary classification models we simplified the numeric bactericidal efficiency to 2 classes, i.e. whether it is a successful bactericidal surface.

Classification model building

Model selection was critical for the accuracy of ML prediction, and we have chosen seven state-of-the-art algorithmic models for predicting the bactericidal efficiency, which included K-nearest neighbor (KNN), support vector machine (SVM), extreme gradient boost (XGBoost), gradient boosting machine (GBM), random forest (RF), multilayer perceptron (MLP) for classification modelling and ridge regression (RR), XGBoost, GBM, KNN for regression modelling [30,31,32,33]. A brief summary is illustrated in Fig. 2 and explained in Table 4.

Table 4 Enumeration dataset parameters for Ti-based nanostructured surfaces targeting Gram-negative bacteria

Full size table

Preliminary modelling

After the initial screening, the missing values were imputed, using 5 different imputation strategies: None, Leave empty, Mean, KNN and RF (Explained in detail in the method section). Performances of different data imputation methods were compared, as shown in Fig. 3. It can be seen from the plots that different data imputation methods did affect model performance. Of the three active filling blank methods, RF performed the best, with the highest accuracy and F1 scores. The ‘None’ group had a high precision, which means the high credibility of a claim that a case is positive. However, it has a relatively low recall, which indicates some false positives. While the ‘leave empty’ group was more evenly split across all indicators. Further comparison of the results of their 10-fold cross-validation revealed that the mean accuracy of the different imputations showed little difference, stabilising at around 78%. Therefore, the ‘None’ group, the ‘leave empty’ group and the RF group were retained for the model building to further compare the impact of the data imputation methods on the performance of the models.

After data transformation the following three datasets were obtained for the model building step: Dataset I (n = 294, Leave empty group); Dataset II (n = 294, RF group); Dataset III (n = 140, None group). To further build a regression model to predict the bactericidal efficiency of successfully bactericidal surfaces, we extracted data for the RF group with a bactericidal efficiency greater than 70% as Dataset IV (n = 105).

Classification model building

Following preliminary modelling, we trained various classification models, and all model parameters were tuned to the best combination. By traversing all the model parameters, the best combination of parameters is selected (see Table S1). Model performance results are summarized in Fig. 4 and Table S3. The results suggest that the XGBoost and GBM models exhibit overall higher accuracy and less fluctuation, which indicated a more stable performance compared to the other algorithms employed (KNN, SVM, and MLP). It is quite interesting to note that most of the models built are high-accuracy but low-recall systems, returning very few results, but most of its predicted labels are correct when compared to the training labels. In comparison, XGBoost-I, II and GBM-III show high accuracy rates of 0.76, 0.78 and 0.93 respectively, and relatively high precision and recall.

We then compared the 10-fold validation results of the XGBoost and GBM models (Fig. S2). The GBM-III and XGBoost-III models have the highest average accuracy of 0.81 and 0.80 respectively, while XGBoost-III has smaller variation, representing greater precision. Therefore, the GBM-III model had the best overall performance, with an average accuracy of 0.81.

To further test the performance of the model with different data imputation methods, we compared the confusion matrixes to assess the performance of XGBoost models (XGBoost-I, II, III). The confusion matrices for XGBoost-I and II are identical (Fig. S3), indicating that using RF as a data imputation in this study is a non-inferior approach.

Subsequently, we utilised four new enumeration datasets (Ti-based nanostructured surfaces against Gram-negative bacteria, Ti-based nanostructured surfaces against Gram-positive bacteria, Si-based nanostructured surfaces against Gram-positive bacteria and Si-based nanostructured surfaces against Gram-negative bacteria with 829,448 datapoints in each dataset) to gain further insights into the nanostructured parameters and bactericidal efficiency of the nanostructure parameters and bactericidal efficiency. Based on the GBM-III models, we used the enumerated dataset to create a bactericidal efficiency map (Fig. 5). According to the figure, most of the high bactericidal efficiency surfaces, both Ti-based and Si-based materials, have polar WCAs, i.e., superhydrophilic and superhydrophobic. The nanostructured surfaces are overall more efficient in bactericidal activities for Gram-negative bacteria than for Gram-positive bacteria. In addition, the diameter of highly bactericidal surfaces is typically less than 200 nm.

Feature importance analysis and model interpretation

Overview of feature importance

Interpreting the model provides valuable insights into its learning characteristics. Feature importance learnt by the GBM-III model was plotted to represent the ML’s interpretation of the correlation between different features and bactericidal efficiency. The feature importance of the XGBoost-I, III; models were also analysed and used to compare the differences between the conclusions drawn under the different algorithms. The feature importance analysis for both models yielded similar conclusions (Fig. 6), showing that the top four importance rankings for both models were WCA, height, diameter and aspect ratio, all of which are features of nanotopography. This suggests that nanotopography is indeed the main factor dominating the bactericidal activity of nanostructured surfaces, which is also consistent with the mechano-bactericidal concept mentioned previously. For WCA, the feature importance is 20.8%, 27.7%, and 20.6% in the XGBoost-I, III; and GBM-III models, respectively. Although the majority of surfaces in the dataset were hydrophilic, the least-tested hydrophobic surfaces have shown higher success rates than their hydrophilic counterparts. The possible reason is that hydrophobic and hydrophilic surfaces have different mechanisms of bacterial inhibition, as mentioned previously, one preventing bacteria from adhering and the other killing them when they do, but the different inhibition mechanisms achieve the same purpose.

Model interpretation for topographical features

Figure 7 shows the Shapley additive explanations (SHAP) of topographical features. SHAP values is a unified framework to interpret ML predictions proposed by Lundberg and Lee [30], to describe how much each feature contributes to the predictions. In this ML model, the SHAP and feature values of the WCA are evenly distributed on the x-axis (Fig. 7a), while it can be concluded from the distribution of high feature value points that high WCA has a certain positive effect on bactericidal efficiency. Figure 7b elaborates on the variability in the impact of WCA on the model’s output across different samples. The analysis highlights that WCA values contributing positively to the model’s output predominantly fall within the ranges of 0–10 degrees or 160–180 degrees, as indicated by the red zones in the plot. These ranges correspond to surfaces that are extremely hydrophilic or hydrophobic, respectively, both of which are considered beneficial for bactericidal activity. Conversely, WCA values situated around the median, predominantly encapsulated within the blue zones of the plot, are associated with a negative impact on the output value. This suggests that surfaces with median WCA values may represent a less effective or undesirable range for bactericidal applications, indicating a complex relationship between surface wettability and bactericidal efficiency that is dependent on the extremity of the hydrophilic or hydrophobic nature of the surface.

Height and diameter are directly related to the bacteria-nanopattern contact area, while the tip size of the nanopattern is very important as it is the first point of contact between the bacteria and the surface [43]. The ML model shows that both diameter and height are positively correlated with bactericidal efficiency. Some studies based on analytical models support our conclusions, which suggest that a larger radius provides a wider contact area, driving the suspended region of the membrane to attempt to accommodate the change in the perimeter by stretching and eventually rupturing [23, 44]. However, smaller tip radius could induces higher pressure on the bacterial membrane, enhancing the bactericidal effect of the nanostructured surface [5].

The SHAP values for aspect ratio indicate that high aspect ratios have a positive effect on bactericidal efficiency. This is in line with Linklater et al. study [22], which demonstrated that the flexibility of a high aspect ratio structure enhances the elastic energy storage of the nanostructure and releases this energy through bending when in contact with bacteria, thereby increasing the bactericidal activity of the nanostructured surface.

Model interpretation for material properties and bacterial species

It is noteworthy that the material properties of the nanostructured surface account for a small proportion of the feature importance. This corresponds to the mechanisms revealed from some experimental approaches, i.e. the mechano-bactericidal mechanism on nanostructured surfaces is independent of chemical effects, as the functionality (bactericidal ability) was shown to persist across materials [7]. However, recent studies have suggested that biological and chemical processes also play a synergistic role in the bactericidal activity of nanostructured surfaces [45,46,47]. For example, Jenkins et al. proposed a synergistic ROS-mediated mechanism of mechano-bactericidal activity, which involves chemistry at the bacterial level, in contrast to the purely mechano-bactericidal model currently proposed [46].

Furthermore, the species of bacteria as a biological factor is not of high importance in the ML model, a possible reason is the limited dataset, which focuses on only a few specific bacteria. Whereas it is now generally accepted that Gram-negative bacteria are more vulnerable to the bactericidal effects of nanostructures than Gram-positive bacteria because of the differences between their bacterial membrane structures. In the SHAP dependence analysis (Fig. 7c and d), we posit that Gram-positive bacteria demonstrate increased sensitivity to hydrophilic surfaces with nanostructured spacing below 250 nm. While the SHAP dependence plot distribution for Gram-negative bacteria in relation to WCA and spacing appears relatively dispersed.

Individual data points analysis and comparative analysis

To enhance the comprehension of why certain features exhibit a more pronounced impact than others within our dataset, we employed an analysis of individual SHAP value plots corresponding to specific data points. We selected three representative data points for this analysis, two of which are presented below, with the remaining details provided in Fig. S5 (Tables 5 and 6).

Table 5 Typical data points selected for the individual data points analysis

Full size table

Table 6 Machine learning models frequently used in the biomedical field

Full size table

Case 1: Silicon-based nano pillar against P. Aeruginosa

Figure 8 illustrates that ‘Height’ has a significant positive SHAP value, indicating that as the height of the nanostructures increases, it contributes more to the model’s prediction of bactericidal efficiency against P.aeruginosa cells. This aligns with the conclusion in this study [12], which suggests that higher nanostructures on surfaces lead to a decrease in bacterial adhesion due to reduced contact area between the bacteria and the substratum.

In contrast, ‘Material’ has a minor impact on the output value, which is consistent with the previous reports stating that the nanoscale topography influences bacterial attachment behaviour, orientation, and the expression of attachment organelles (fimbriae), with a preference for certain substratum types [49].

The importance of height in these figures supports the notion that the physical dimensions of surface nanoarchitecture and material stiffness are critical factors in the adhesion and potential killing of bacterial cells.

Case 2: Titanium-based nano tube against P. Aeruginosa

In this case, the dimensions, specifically the diameter and height, of the nanostructures used in the dataset are significantly smaller relative to the overall range observed. In Fig. 9, although the ‘GS’ feature exerts a significant positive effect on the output value, the adverse impacts attributable to both ‘Diameter’ and ‘Height’ on the bactericidal effectiveness of the nanostructures culminate in a final model output of zero. The study that includes this case involved assessing the bactericidal efficiency of nanostructures with identical structural parameters against various bacterial strains. Notably, the nanostructures demonstrated enhanced effectiveness in eliminating Gram-positive bacteria.

Furthermore, the positive impact associated with ‘GS’ indicates that the model identifies the presence of Gram-negative bacteria as a factor reducing the likelihood of poor bactericidal performance, which is in alignment with the conclusion of the study [48]. While the SHAP value analysis for ‘WCA’, suggests a negligible role of this feature in bactericidal efficiency. The implication is that surfaces do not exhibit extreme hydrophilicity, therefore having a relatively minor impact. The insights from the model support the observation that sharp, elongated nanostructures can disrupt bacterial cells non-selectively, whereas shorter, blunt structures might necessitate more precise interactions to overcome the defences of different bacterial species, reflecting their adaptation to the ecological niches they inhabit [30].

In addition, we performed a comparison of the SHAP values for both the XGBoost and MLP algorithms by examining them in each case, as illustrated in the accompanying Figs. 8 and 9 and Fig. S4. The consistency of the results across these scenarios underscores the robustness and interpretative capability of our model.

Regression model building

Based on the results of the classification model, a regression model was further developed for nanostructured surfaces with bactericidal efficiency greater than 70%. Figure 8 shows the distribution of bactericidal efficiency in the dataset and the range of data targeted by the classification/regression model.

By traversing all the model parameters, the best combination of parameters is selected (see Table S2). The performance results are summarised in Fig. 9 and Table S4. As mentioned above, lower RMSE and MAE values indicate better predictive performance, while higher $\:{R}^{2}$ values indicate a better fit of the model to the data and a better overall adaptation to the data. Of the four models, the XGBoost regression model had an outstanding performance with the lowest RMSE and MAE and the highest $\:{R}^{2}$ (50%). The relatively low $\:{R}^{2}$ values observed in the table may be attributed to the limited amount of data available for analysis (Figs. 10, 11, and 12).

The regression model showed consistent performance on both the training and test sets, with all predictions within a relative error of ± 20%, except for one data from the test set (Fig. 10). This demonstrates the model’s ability to withstand overfitting trends and enhances its potential for real-world applications.

Discussion

In this study, we implemented an ML model from data collection to model validation, to predict the bactericidal effects induced by nanostructured surface in in-vitro systems. We propose the concept of drawing patterns from existing studies on the bactericidal properties of nanostructured surface to guide future research. In the conventional materials research process, the selection of material candidates and topologies is entirely dependent on the developer’s intuition, and these materials need to be prepared and tested successively; this leads to inefficiencies, high error rates, and high costs [50]. Indeed, the further development of our model can help researchers in the field of nanostructured surfaces to save experimental resources and time costs. By analysing the feature importance derived from our model, researchers can gain insights into which parameters most significantly impact bactericidal efficiency. This can help in designing experiments that specifically target these influential factors, thereby optimizing the allocation of resources towards experiments that are most likely to yield significant findings. The predictive model can be used to generate hypotheses regarding the underlying mechanisms driving bactericidal effectiveness. For example, if the model identifies a specific nanotopography parameter as particularly influential, experimental work can be tailored to investigate how this parameter affects bacteria at the molecular or cellular level, leading to a deeper understanding of the interaction mechanisms.

As new materials are developed, our model provides a benchmark for predicting their potential bactericidal efficiency based on their properties and the bacteria type they are intended to target. This predictive capacity can streamline the initial screening process for novel materials, prioritizing those with the highest predicted efficacy for further experimental validation. Furthermore, by outlining the predictive relationships between various parameters and bactericidal efficiency, our model serves as a platform for fostering collaboration between materials scientists, microbiologists, and nanotechnology researchers. This interdisciplinary approach can lead to the development of more comprehensive experimental studies that consider a wider range of variables and their interactions.

The availability of data in materials science and other fields has long been a key issue, thus the recent concerted efforts have been focused on finding ML algorithms applicable to small datasets [51]. However, objectively comparing nanostructured surfaces with different parameters becomes more challenging due to the lack of standardized test methods to assess their performance [52]. While several valid methods may be available, provided they provide relevant information for the intended application, the lack of standard test conditions hinders direct comparisons between new techniques. Ideally, these standard conditions should be established and expanded to discover and verify the true effectiveness and utility of materials. Several ongoing initiatives are developing frameworks, methods and criteria for assessing the quality of reported data, based on the FAIR (Findable, Accessible, Interoperable, Reusable) data principles [53]. This may overcome the lack of data in the materials field and allow different studies to be compared more directly.

This research pragmatically proposes a different approach by developing prediction tools using well-established ML algorithms but comparing the impact of different data imputation methods on model performance. The rational data imputation method here allows us to apply the ML model to a larger dataset, thus improving the accuracy of the model.

Our dataset is drawn from a variety of published research, including different experimental setups being used by different research groups. This diversity of experimental setups enhances the generalization capabilities of our ML models.

The goal of our future work is to attempt to quantify the features of biological processes between nanostructured surfaces and bacteria in order to optimize the feature engineering of the ML model.

Conclusion

In this study, we have successfully applied data extracted from the literature to build effective ML models to predict the antimicrobial properties of nanostructured surfaces, with gradient boosting machine (GBM) being the best model (The accuracy rate reached 81%). A regression model was further developed for nanostructured surfaces with bactericidal efficiency greater than 70% to further predict the value of bactericidal efficiency, with XGBoost being the best model. The feature importance analysis and the interpretation of the ML models suggest that the WCA is the most important feature, followed by nanotopography (e.g., diameter, height). We then analysed the interrelated features by calculating SHAP values to further understand the mechanism of mechano-bactericidal properties, and results showed that WCA, diameter, height and aspect ratio are positively correlated with bactericidal efficiency, while spacing is the opposite. We also highlight the need for standardized measurements to evaluate the antimicrobial properties of nanostructured surfaces, allowing more consistent metadata. Overall, this is the first study of applying ML algorithms in the field of nanostructured surface, which assists researchers in taking advantage of developments in the field of artificial intelligence, thereby improving timeliness, and reducing the number of experiments performed and the associated costs.

Methods

Data Collection

A literature search was conducted to identify studies examining the effects of nanostructured surfaces on bacterial elimination or inhibition. The collected articles were published up to March 2023, encompassed various keywords, including “antibacterial,” “antimicrobial,” “bactericidal,” “nanopattern,” “nanopatterned topography,” “nanostructure,” “nanostructured surface,” “nano-coating,” and more. The initial screening of articles was based on the title and abstract, followed by a secondary screening after thoroughly reading the full text.

Inclusion criteria

To be included in the analysis, studies had to meet both of the following requirements:

Fabrication of nanoscale structures on surfaces.
Conducting bacterial viability tests on these nanostructured surfaces.

Exclusion criteria

Studies were typically excluded from the analysis based on the following factors:

Absence of bacterial viability quantification or unspecified method.
Absence of surface topography quantification or unspecified method.
Unrelated studies on the bactericidal activity of nanostructured surfaces.
Presence of antibiotics or chemically active coatings on surfaces that contribute to bactericidal activity.
Experiments are carried out under dynamic conditions, such as fluid flow, agitation, and so on.

Data extraction

Each paper was reviewed with a focus on:

Surface: materials, surface wettability, and surface roughness.
Nanopattern: fabrication method, nanopattern tip dimension (width or diameter), and height, spacing.
Bacteria viability: species, Gram-stain type, motility, and shape.

The above variables were acquired as input attributes for the prediction of the bactericidal efficiency of the investigated nanostructured surface. For the evaluation of bactericidal efficacy, we have comprehensively integrated quantitative evaluations derived from a diverse range of detection methods and techniques. The bacterial efficiency of most studies is calculated based on indicators such as bacterial viability, zone of inhibition (ZOI), and optical density (OD).