Exploring Machine Learning on Geochemistry Data For Efficient Prediction of Metal Concentrations in Copper Deposits
No Thumbnail Available
Date
2024-01-25
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Namibia University of Science and Technology
Abstract
Naturally occurring ore bodies like Copper often occur in compound form with other useful
metals such as Silver, Lead and Zinc. Due to the cost, mining companies find it difficult to pay for
analysis of various metals in their samples and end up focusing on analysing one metal or a few,
leaving out a bunch of other associated metal concentrations in the deposit. Additionally,
analysing different metals in samples can take time, and this increased turnaround time of
receiving results from the laboratory can negatively affect production. The research used a
geochemistry dataset comprising of 3,282 samples from the Kombat Copper deposit area in
Namibia to predict copper (Cu) concentrations from zinc (Zn) and lead (Pb) concentrations. In
addition to the metal concentrations, the dataset had sample coordinates and grid names
features. The four machine learning algorithms used were Random Forest (RF), K-Nearest
Neighbour (KNN), Decision Tree (DT), and Support Vector Machine (SVM). These models were
used because they were the commonly employed models for similar purposes, in the literature
reviewed. The learning task was a regression problem, therefore, the primary metric utilised to
assess the machine learning model and draw performance conclusions was the regression score
(R-squared), which quantifies how well the model explains the variance in the data.
The R squared score represents the percentage of variance in the dependent variable (target) that can
be predicted from the independent variables (features). It ranges on a scale of 0 to 1, where 1
indicates a perfect fit. In addition Mean Squared Error (MSE), Root means squared error (RMSE),
mean absolute error (MAE), Adjusted R-squared, and explained variance metrices were also
looked at. Based on the R-squared metric, the KNN model outperformed the other three models,
predicting 57% of the relationship between the dependent and independent variables. K-NN was
followed by RF with 0.55 score, DT with a 0.49 score and the SVM with a 0.44 score. KNN model
appeared to be the best choice among the four models for making predictions for the dataset.
Further optimisation of the models improved their prediction accuracy, with the KNN model still
with a superior performance of R-squared at 70% (0.70) with n-estimators set at 4 and the test
size set to 10%. Predicting metal contents from geochemistry data with machine learning can
iv help mining companies reduce costs by supplementing lab-based analyses with model-based
predictions in determining grades.
Description
Keywords
machine learning, geochemistry, copper deposits
Citation
Joel , L. (2024). Exploring machine learning on geochemistry data for efficient prediction of Metal concentrations in copper deposits [Master’s thesis, Namibia University of Science and Technology].