Machine learning models have been shown to be vulnerable to adversarial examples—that is, maliciously perturbed examples that are almost identical to original samples in human perception, but cause models to make incorrect decisions. The vulnerability of machine learning models to adversarial examples implies a security risk in applications with real-world consequences, such as self-driving cars, robotics, financial services, and criminal justice; in addition, it highlights fundamental differences between human learning and existing machine-based systems. The study of this phenomenon is carried out from two directions: evaluating the adversarial robustness of a model by designing new algorithms for generating adversarial examples, and improving the adversarial robustness by designing mechanisms to identify and conquer adversarial examples.
Our research covers both directions. First, we focus on a probabilistic framework on generating adversarial examples for discrete data, and a query-efficient decision-based algorithm on generating adversarial examples for continuous data. Second, we propose a framework that apply tools developed for model interpretation to the detection of adversarial examples, achieving superior performance to existing detection methods across various attacks.