Knowledge Extraction from Drilling Data Using Machine Learning Techniques

Research output: ResearchDoctoral Thesis



The goal of the work presented in this thesis is to provide a framework for knowledge extraction from rig sensor data, as well as from daily morning reports. The extracted knowledge is the basis from which to answer the key question in all optimization efforts, which is: how to improve drilling performance. Improving the drilling performance of new wells cannot be achieved without measuring the performance of the wells that have already been drilled, which in turn requires a comprehensive analysis about what has been done at the rig site. Although the basic sequence of events required to drill a well is well-known in advance, the durations, exact order and actual time of occurrence of these events are still unknown. Considering the amount and complexity of drilling data collected at the rig site, it is becoming difficult for drilling engineers to recognize and extract all drilling patterns and activities from that data. The proposed framework uses machine learning techniques to extract knowledge from drilling data. This knowledge represents a full description about what has been done at the rig site. It contains trends, patterns, usual and unusual drilling events that have taken place during the drilling of the well. The proposed framework is based on the idea that there is no single technique that outperforms all others over the full range of problems. Therefore, it combines various machine learning techniques, and exploits all available drilling data to provide a valuable and accurate outcome. The proposed approach is a five-step procedure. These steps are: drilling data representation, feature space construction, drilling pattern recognition, classifiers combination, and drilling activity breakdown. In the first step, sensors and textual data needed for building the classification models are collected and transformed into a compact representation. In the second step, a set of different kinds of features are extracted from the transformed data. Then, a subset of the most informative features is selected. This subset contains textual, statistical and symbolic features. In the third step, different base classifiers are trained. Each classifier is trained with specific kinds of features. Finally, the results of the base classifiers are combined in order to provide more accurate results, which will be the final output of the proposed framework.


Original languageEnglish
  • Thonhauser, Gerhard, Assessor A (internal)
  • SCHMARANZ, Klaus, Assessor B (external), External person
StatePublished - 2014