联邦学习论文试读： Collaborative Machine Learning without Centralized Training Data

前言

本文作为联邦学习的开山鼻祖（？），对于理清联邦学习的脉络还是有很大帮助的，作为一个新手小白，我会从一个beginner的角度看一下这篇神奇的论文。原文链接ai.googleblog.com/2017/04/fed… 原文并没有划分明显的标题，我根据自己的阅读感受划分不同的parts

part1 instruction

Standard machine learning approaches require centralizing the training data on one machine or in a datacenter. And Google has built one of the most secure and robust cloud infrastructures for processing this data to make our services better. Now for models trained from user interaction with mobile devices, we're introducing an additional approach: Federated Learning.

Federated Learning enables mobile phones to collaboratively learn a shared prediction model while keeping all the training data on device, decoupling the ability to do machine learning from the need to store the data in the cloud. This goes beyond the use of local models that make predictions on mobile devices (like the Mobile Vision API and On-Device Smart Reply) by bringing model training to the device as well.

标准的机器学习依赖于在单个机器或是数据中心上集中的训练，尽管Google拥有目前最为安全和高度可用的人工智能基础建设，但是对于移动设备上的用户交互学习来说，就需要引出一个新的观念：联邦学习。

联邦学习可以在用户本地设备上训练本地数据，并不需要将数据上传到云上，并且可以和其他设备协同训练全局的模型。

目前来看，联邦学习的出发点是规避法律法规对于厂商使用用户数据的监管，目前主要服务的对象是移动端设备，根据阅读上一篇论文的经验（juejin.cn/post/721955… 来看，移动端设备存在着很多问题，比如 1. 本地数据差异性很大，并且不符合独立同分布的特征 2. 设备性能存在很大差距，网络速度，运算能力都会对训练的调度产生影响。 3. 自然少不了安全问题。

part2 mechanism

It works like this: your device downloads the current model, improves it by learning from data on your phone, and then summarizes the changes as a small focused update. Only this update to the model is sent to the cloud, using encrypted communication, where it is immediately averaged with other user updates to improve the shared model. All the training data remains on your device, and no individual updates are stored in the cloud.

在看过前一篇文章后，应该对联邦学习的机制并不陌生了。用户的设备会下载当前的模型，使用本地数据做feature update，加密发送到server，同其他用户的update合并，构建为下一个模型。

Federated Learning allows for smarter models, lower latency, and less power consumption, all while ensuring privacy. And this approach has another immediate benefit: in addition to providing an update to the shared model, the improved model on your phone can also be used immediately, powering experiences personalized by the way you use your phone.

We're currently testing Federated Learning in Gboard on Android, the Google Keyboard. When Gboard shows a suggested query, your phone locally stores information about the current context and whether you clicked the suggestion. Federated Learning processes that history on-device to suggest improvements to the next iteration of Gboard’s query suggestion model.

联邦学习允许更聪明的模型（？），更少的延迟（？），更少的电力消耗（？），所有东西都符合隐私政策，这种方式还有一个显著的好处：你帮助改进模型的同时，模型也会给你带来更多便利。这句话我比较疑惑的是：

为什么说这种模型会更加 smarter？从机制来看，似乎和传统的机器学习没什么不同，只是由于政策限制只能利用用户设备进行模型训练。
更少的延迟似乎也并不合理，不同设备间的显著差异似乎会增加延迟。
我并不认为众多移动设备进行训练所消耗的电力总和会小于高性能设备训练同样数据带来的能源消耗。这点也可以作为研究的方向。

detail

To make Federated Learning possible, we had to overcome many algorithmic and technical challenges. In a typical machine learning system, an optimization algorithm like Stochastic Gradient Descent (SGD) runs on a large dataset partitioned homogeneously across servers in the cloud. Such highly iterative algorithms require low-latency, high-throughput connections to the training data. But in the Federated Learning setting, the data is distributed across millions of devices in a highly uneven fashion. In addition, these devices have significantly higher-latency, lower-throughput connections and are only intermittently available for training.

These bandwidth and latency limitations motivate our Federated Averaging algorithm, which can train deep networks using 10-100x less communication compared to a naively federated version of SGD. The key idea is to use the powerful processors in modern mobile devices to compute higher quality updates than simple gradient steps. Since it takes fewer iterations of high-quality updates to produce a good model, training can use much less communication. As upload speeds are typically much slower than download speeds, we also developed a novel way to reduce upload communication costs up to another 100x by compressing updates using random rotations and quantization. While these approaches are focused on training deep networks, we've also designed algorithms for high-dimensional sparse convex models which excel on problems like click-through-rate prediction.

Deploying this technology to millions of heterogenous phones running Gboard requires a sophisticated technology stack. On device training uses a miniature version of TensorFlow. Careful scheduling ensures training happens only when the device is idle, plugged in, and on a free wireless connection, so there is no impact on the phone's performance.

02_Personalization sleeping.png

联邦学习主要用的还是随机梯度下降方法，同时也提到了上文的设备性能不平衡问题。带宽和延迟的限制也带来了笔者团队的另一篇作品： Federated Averaging algorithm，这篇文章也是我下一篇试读的主题。这种算法比起原生的联邦学习模式可以减少10到100倍的通信次数。由于上行普遍比下行慢，因此他们也开发了压缩算法减少通信开销。只有用户充电并且连接wifi的时候才进行计算。

The system then needs to communicate and aggregate the model updates in a secure, efficient, scalable, and fault-tolerant way. It's only the combination of research with this infrastructure that makes the benefits of Federated Learning possible.

联邦学习的关键点在于：1 . 安全 2. 效率 3. 可拓展性 4. 鲁棒性

Federated learning works without the need to store user data in the cloud, but we're not stopping there. We've developed a Secure Aggregation protocol that uses cryptographic techniques so a coordinating server can only decrypt the average update if 100s or 1000s of users have participated — no individual phone's update can be inspected before averaging. It's the first protocol of its kind that is practical for deep-network-sized problems and real-world connectivity constraints. We designed Federated Averaging so the coordinating server only needs the average update, which allows Secure Aggregation to be used; however the protocol is general and can be applied to other problems as well. We're working hard on a production implementation of this protocol and expect to deploy it for Federated Learning applications in the near future.

为了安全起见，他们设计了一种加密协议：只有每百个或者每千个用户参与的时候才会开始模型训练，用于避免可以观测到单个用户的更新。这种协议也是基于现实约束和神经网络特性的首次实践。

Our work has only scratched the surface of what is possible. Federated Learning can't solve all machine learning problems (for example, learning to recognize different dog breeds by training on carefully labeled examples), and for many other models the necessary training data is already stored in the cloud (like training spam filters for Gmail). So Google will continue to advance the state-of-the-art for cloud-based ML, but we are also committed to ongoing research to expand the range of problems we can solve with Federated Learning. Beyond Gboard query suggestions, for example, we hope to improve the language models that power your keyboard based on what you actually type on your phone (which can have a style all its own) and photo rankings based on what kinds of photos people look at, share, or delete.

Applying Federated Learning requires machine learning practitioners to adopt new tools and a new way of thinking: model development, training, and evaluation with no direct access to or labeling of raw data, with communication cost as a limiting factor. We believe the user benefits of Federated Learning make tackling the technical challenges worthwhile, and are publishing our work with hopes of a widespread conversation within the machine learning community.

废话blabla