Decision trees partition the input space into hyper-rectangles. Regression trees are decision trees that fit a regression model to the data in the hyper-rectangle described by the leaf of a tree. Model trees are a mix of regression trees and classification trees. This paper contributes a new method for estimating model trees. This is achieved by having two types of internal nodes: one type partitions the input space, as found in decision trees, and the other type is a regression model of the data in the input space, as done in regression trees.
Model trees offer the data analyst a means to represent more global linear trends in the data than is possible with regression trees alone. The interest in model trees is then dependent on whether an analyst’s data can be expected to have a combination of global and local linear trends, and the analyst’s need to represent this in a model.
This paper shows that some previously unobserved global effects arise in the well-studied University of California at Irvine (UCI) machine learning repository datasets. The usual trade-off, of additional computational cost for more expressive model trees, also applies.
No access to the proposed method is provided, so the paper’s audience is limited to researchers active in regression methodologies. If they are interested in the model tree method contributed by this paper, data analysts will need to wait for its future transfer to one of the commercial or open source statistical and machine learning platforms.