In general, machine learning algorithms require a large amount of training data and companies often do not have enough to achieve the desired accuracy. To address this, a company may be interested in training its model jointly with others while not violating the confidentiality of its own data.
There are various approaches that allow machine learning models to be trained across several data sources without disclosing them. We have identified multi-party computation and federated machine learning as the most promising candidates for privacy preserving training. Additionally, we also take into consideration the protection of the trained model (Differential Privacy).
In order to gain expertise, we are running hands-on analysis and experiments where we focus on a real-life scenario (unbalanced, non-IID data):
- Secure multi-party computation for linear models
- Federated training of tree-based models (Gradient Boosted and CART decision trees)
- Federated training of neuronal networks (parameter server approach)
The trained model has a much better accuracy due to more data and more features. This applies for the following scenarios where the data cannot be centralized:
- Company-internal: analysis of distributed, siloed data sources across jurisdictions with stringent privacy (e.g. cross border mortgage default model)
- Cross-company: extended insights thanks to analysis of combined data from joint customers (e.g. extended features for cross & upselling or consolidates predictive maintenance)
- Consortia: access to more data & features through secure consortia partnerships between enterprises and regulators (e.g. extended anti money laundering, payment fraud detection, fraud detection for insurance claims)
Parallel to our hands-on analysis we are running workshops with clients from different industries in order to identify and sharpen use cases as well as to run proof of values.