Feature selection is an important problem in statistical machine learning, and is a common method for dimensionality reduction that encourages model interpretability. Classical feature selection asks for a subset of features that are most informative for the entire data set. Instancewise feature selection asks for a subset of features that are most informative for the model to be explained for each instance.
Mutual information plays as an important role in feature selection. Mutual information can be used to measure dependence. Ideally, the most dependent features with respect to the response variable are selected. But it is difficult to approximate mutual information in practice. There exist several approaches for approximating mutual information, like a lower bound through variational approaches and squared-loss mutual information. To get an empirical estimation of the approximated mutual information, one can use parametric methods like neural networks and nonparametric methods like kernel methods.
Our research covers both classical feature selection, and instancewise feature selection. We have developed both parametric and nonparametric methods for feature selection. Our ultimate goal is to provide a better interpretation of complex models, including deep neural networks, models for high-dimensional data, etc.