The creation and maintenance of an automatic speech recognition (ASR) service used for example for voicebots, involves the collection and annotation of many hours of audio data. It can be tedious and expensive, especially when it concerns dialects such as Swiss-German for which little public data is available. Therefore, it would be wise and cost-effective for organizations to join forces within a consortium in order to train, use and maintain an ASR-system based on call center audio data from different enterprises.
Such a solution must be built on a solid business, technical and governance basis and must be driven initially by a strong common vision. During roundtables, companies and suppliers have confirmed a common interest in such a system. High-level collaboration patterns have been sketched and include the important aspect of protecting the call center data sets. A strong privacy approach has to be in place for both training and use of the ASR. This can be applied on traditional ASR models (based on Hidden-Markov-Models) or on end-to-end ASR models (based on neural networks).
The development of a joint ASR within a consortium leads to new types of services and business models. From a trained model that the individual consortium participants can install, enhance and run locally, to a full self-service platform. Of particular interest is the possibility to transfer this approach and the expertise gained to other use cases where collaborative machine learning with high privacy constraints is needed.
Despite high interest from major Swiss companies, we have decided to discontinue our Swiss German ASR initiative. This is due to federated learning for end-to-end ASR models not being mature enough and this requires further academic research. We are currently concentrating on other use cases in the area of collaborative machine learning, which have higher chances of implementation. However, the vision of a Swiss-German ASR is being pursued by two other initiatives, which we are closely working with: