We address the task of automatic detection of lesions caused by multiple myeloma (MM) in femurs or other long bones from CT data. Such detection is already an important part of the multiple myeloma diagnosis and staging.
However, it is so far performed mostly manually, which is very time consuming. We formulate the detection as a multiple instance learning (MIL) problem, where instances are grouped into bags and only bag labels are available.
In our case, instances are regions in the image and bags correspond to images. This has the advantage of requiring only subject-level annotation (ground truth), which is much easier to get than voxel-level manual segmentation.
We consider a generalization of the standard MIL formulation where we introduce a threshold on the number of required positive instances in positive bags. This corresponds better to the classification procedure used by the radiology experts and is more robust with respect to false positive instances.
We extend several existing MIL algorithms to solve the generalized case by estimating the threshold during learning. We compare the proposed methods with the baseline method on a dataset of 220 subjects.
We show that the generalized MIL formulation outperforms standard MIL methods for this task. For the task of distinguishing between healthy controls and MM patients with infiltrations, our best method makes almost no mistakes with a mean AUC of 0.982 and F-1 = 0.965.
We outperform the baseline method significantly in all conducted experiments.