Introduction Cannabis is Europe's most commonly used illicit drug. Some users do not develop dependence or other problems, whereas others do.
Many factors are associated with the occurrence of cannabis-related disorders. This makes it difficult to identify key risk factors and markers to profile at-risk cannabis users using traditional hypothesis-driven approaches.
Therefore, the use of a data-mining technique called binary recursive partitioning is demonstrated in this study by creating a classification tree to profile at-risk users. Methods 59 variables on cannabis use and drug market experiences were extracted from an internet-based survey dataset collected in four European countries (Czech Republic, Italy, Netherlands and Sweden), n = 2617.
These 59 potential predictors of problematic cannabis use were used to partition individual respondents into subgroups with low and high risk of having a cannabis use disorder, based on their responses on the Cannabis Abuse Screening Test. Both a generic model for the four countries combined and four country-specific models were constructed.
Results Of the 59 variables included in the first analysis step, only three variables were required to construct a generic partitioning model to classify high risk cannabis users with 65-73% accuracy. Based on the generic model for the four countries combined, the highest risk for cannabis use disorder is seen in participants reporting a cannabis use on more than 200 days in the last 12 months.
In comparison to the generic model, the country-specific models led to modest, non-significant improvements in classification accuracy, with an exception for Italy (p = 0.01).