Synthetic compressional sonic log (DT) generation using Machine Learning in the Volve Field

Published: July 15, 2025

Read this full work on Python in Plain English

This project focuses on leveraging Machine Learning to generate synthetic Compressional Sonic Logs (DT) in the challenging yet data-rich Volve Field. The goal was to accurately predict missing DT log data, a crucial parameter for comprehensive subsurface characterization, using readily available well log measurements.

📝 Project Overview

Compressional Sonic Logs (DT) are indispensable for both Petrophysicists and Geophysicists, providing critical insights into porosity, lithology, fluid content, and enabling accurate seismic interpretation and modeling. However, these logs are not always available for every well due to various operational or cost constraints. This project aimed to bridge these data gaps using a robust, data-driven approach.

I utilized the publicly available Volve Field dataset, a unique opportunity to apply advanced data science techniques to a real-world oil and gas challenge. My workflow, built end-to-end within Dataiku DSS and leveraging its AutoML capabilities, streamlined the process from raw data ingestion to model deployment and visualization.

⚙️ Methodology

The project was structured into three main phases:

Pre-training:
- Imported and stacked individual well datasets, ensuring proper well identification for visualization.
- Conducted comprehensive Exploratory Data Analysis (EDA), including correlation matrices and bivariate analysis, to understand data relationships and identify potential outliers.
- Performed data cleaning, defining outliers based on statistical methods (1.5 IQR) combined with geophysical domain expertise.
- Split the cleaned data into training and blind testing sets, and create dataset of well with no DT for prediction.
Training:
- Developed predictive models using Dataiku’s AutoML Prediction feature, with Root Mean Square Error (RMSE) as the primary target metric, prioritizing magnitude accuracy.
- Conducted three case training runs, varying the input features (4, 5, and 6 features: GR, CALI, PEF, RT, NPHI, RHOB) to assess their impact on model performance.
- Evaluated three algorithms: Random Forest, XGBoost, and LightGBM.
Post-training:
- Deployed the best-performing model (Random Forest) to predict DT logs for wells with absent sonic data.
- Generated custom visualizations using a Python-based custom plugin to display synthetic DT logs alongside available input logs.

✅ Key Results

This experiment demonstrated highly promising results, validating the effectiveness of Machine Learning for synthetic DT log generation:

Superior Model Performance: The Random Forest algorithm consistently outperformed XGBoost and LightGBM across all feature sets, proving to be the most accurate and reliable model for this specific application.

Model Performance Table

Optimal Feature Subset: Notably, even with a concise set of four features (Gamma Ray, Resistivity, Neutron Porosity, and Density), the Random Forest model achieved excellent accuracy, with an RMSE of 4.09 US/F and MAPE of 3.3% on blind test. This indicates that a basic input can still yield robust predictions, potentially optimizing data acquisition strategies.

Complete Model Evaluation Table

Feature Importance: Across all models, Neutron Porosity (NPHI) consistently ranked as the most important feature, followed by Density (RHOB), Gamma Ray (GR), and Resistivity (RT).

Feature Importance

Visual Validation: The synthetic DT logs for both blind wells (wells with existing DT logs not used in training) and wells with no DT data showed remarkable agreement with expected geological trends and with actual measured DT logs. The model successfully captured major geological features, confirming the physical plausibility of the predictions. The visual comparison also highlighted minimal differences in prediction quality when comparing models trained with 4, 5, or 6 features, further reinforcing the efficiency of the 4-feature model.

DT prediction for well 15/9-F-11 A as blind well

DT prediction for well 15/9-F-1 C as well without DT log

This project not only delivered accurate synthetic DT logs for absent data or critical data gaps, but also underscored the transformative potential of integrating robust ML techniques within geoscience domain knowledge.

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Ichsan Hibatullah

📝 Project Overview

⚙️ Methodology

✅ Key Results

Share on