Exploring Machine Learning on Geochemistry Data For Efficient Prediction of Metal Concentrations in Copper Deposits

dc.contributor.authorJoel, Lydia.
dc.date.accessioned2024-09-16T11:55:51Z
dc.date.available2024-09-16T11:55:51Z
dc.date.issued2024-01-25
dc.description.abstractNaturally occurring ore bodies like Copper often occur in compound form with other useful metals such as Silver, Lead and Zinc. Due to the cost, mining companies find it difficult to pay for analysis of various metals in their samples and end up focusing on analysing one metal or a few, leaving out a bunch of other associated metal concentrations in the deposit. Additionally, analysing different metals in samples can take time, and this increased turnaround time of receiving results from the laboratory can negatively affect production. The research used a geochemistry dataset comprising of 3,282 samples from the Kombat Copper deposit area in Namibia to predict copper (Cu) concentrations from zinc (Zn) and lead (Pb) concentrations. In addition to the metal concentrations, the dataset had sample coordinates and grid names features. The four machine learning algorithms used were Random Forest (RF), K-Nearest Neighbour (KNN), Decision Tree (DT), and Support Vector Machine (SVM). These models were used because they were the commonly employed models for similar purposes, in the literature reviewed. The learning task was a regression problem, therefore, the primary metric utilised to assess the machine learning model and draw performance conclusions was the regression score (R-squared), which quantifies how well the model explains the variance in the data. The R squared score represents the percentage of variance in the dependent variable (target) that can be predicted from the independent variables (features). It ranges on a scale of 0 to 1, where 1 indicates a perfect fit. In addition Mean Squared Error (MSE), Root means squared error (RMSE), mean absolute error (MAE), Adjusted R-squared, and explained variance metrices were also looked at. Based on the R-squared metric, the KNN model outperformed the other three models, predicting 57% of the relationship between the dependent and independent variables. K-NN was followed by RF with 0.55 score, DT with a 0.49 score and the SVM with a 0.44 score. KNN model appeared to be the best choice among the four models for making predictions for the dataset. Further optimisation of the models improved their prediction accuracy, with the KNN model still with a superior performance of R-squared at 70% (0.70) with n-estimators set at 4 and the test size set to 10%. Predicting metal contents from geochemistry data with machine learning can iv help mining companies reduce costs by supplementing lab-based analyses with model-based predictions in determining grades.
dc.identifier.citationJoel , L. (2024). Exploring machine learning on geochemistry data for efficient prediction of Metal concentrations in copper deposits [Master’s thesis, Namibia University of Science and Technology].
dc.identifier.urihttp://hdl.handle.net/10628/1024
dc.language.isoen
dc.publisherNamibia University of Science and Technology
dc.subjectmachine learning
dc.subjectgeochemistry
dc.subjectcopper deposits
dc.titleExploring Machine Learning on Geochemistry Data For Efficient Prediction of Metal Concentrations in Copper Deposits
dc.typeThesis

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Lydia Joel_222085010_Dissertation.pdf
Size:
4.62 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: