创新背景
微控制器是可以运行简单命令的微型计算机,是数十亿台连接设备的基础,从物联网(IoT)设备到汽车传感器。但是,廉价、低功耗的微控制器具有极其有限的内存和操作系统,这使得在独立于中央计算资源的“边缘设备”上训练人工智能模型具有挑战性。
在智能边缘设备上训练机器学习模型使其能够适应新数据并做出更好的预测。例如,在智能键盘上训练模型可以使键盘不断从用户的写作中学习。但是,训练过程需要如此多的内存,因此在将模型部署到设备上之前,通常在数据中心使用功能强大的计算机来完成训练。这成本更高,并且会引起隐私问题,因为用户数据必须发送到中央服务器。
创新过程
为了解决上述问题,麻省理工学院和麻省理工学院-IBM Watson AI实验室的研究人员开发了一种新技术,可以使用不到四分之一兆字节的内存进行设备训练。其他为连接设备设计的训练解决方案可以使用超过500兆字节的内存,大大超过了大多数微控制器的256千字节容量(1兆字节中有1,024千字节)。
研究人员开发的智能算法和框架减少了训练模型所需的计算量,这使得该过程更快,内存效率更高。他们的技术可用于在几分钟内在微控制器上训练机器学习模型。
该技术还通过在设备上保留数据来保护隐私,这在数据敏感时(例如在医疗应用中)可能特别有用。它还可以根据用户的需求定制模型。此外,与其他训练方法相比,该框架保留或提高了模型的准确性。
该研究使物联网设备不仅可以进行推理,还可以不断将AI模型更新为新收集的数据,为终身设备学习铺平道路。低资源利用率使深度学习更容易获得,并且可以具有更广泛的覆盖范围,特别是对于低功耗边缘设备。
轻量级训练
一种常见的机器学习模型类型称为神经网络。这些模型松散地基于人脑,包含相互关联的节点或神经元层,这些节点或神经元处理数据以完成任务,例如识别照片中的人物。必须首先训练模型,这涉及向它展示数百万个示例,以便它可以学习任务。当它学习时,模型会增加或减少神经元之间连接的强度,这些神经元被称为权重。
模型在学习时可能会经历数百次更新,并且必须在每轮中存储中间激活。在神经网络中,激活是中间层的中间结果。由于可能有数百万个权重和激活,因此训练模型比运行预先训练的模型需要更多的内存。
研究人员采用了两种算法解决方案,使训练过程更加高效,记忆密集度更低。第一种称为稀疏更新,它使用一种算法来识别每轮训练中要更新的最重要权重。该算法开始一次冻结一个权重,直到它看到精度下降到设定的阈值,然后停止。其余权重将更新,而与冻结权重对应的激活不需要存储在内存中。
第二个解决方案涉及量化训练和简化权重,权重通常为32位。算法通过称为量化的过程对权重进行舍入,使它们只有八位,这减少了训练和推理的内存量。推理是将模型应用于数据集并生成预测的过程。然后,该算法应用一种称为量化感知缩放(QAS)的技术,该技术就像一个乘数来调整权重和梯度之间的比率,以避免量化训练可能带来的准确性下降。
研究人员开发了一种称为微型训练引擎的系统,可以在缺乏操作系统的简单微控制器上运行这些算法创新。此系统更改训练过程中的步骤顺序,以便在将模型部署到边缘设备上之前,在编译阶段完成更多工作。
成功的加速
它们的优化只需要157千字节的内存即可在微控制器上训练机器学习模型,而其他为轻量级训练而设计的技术仍然需要300到600兆字节。
他们通过训练计算机视觉模型来检测图像中的人来测试他们的框架。经过仅10分钟的训练,它就学会了成功完成任务。他们的方法能够比其他方法快20倍以上来训练模型。
现在他们已经证明了这些技术在计算机视觉模型方面的成功,研究人员希望将它们应用于语言模型和不同类型的数据,例如时间序列数据。与此同时,他们希望利用他们所学到的知识来缩小较大模型的大小,同时不牺牲准确性,这可能有助于减少训练大规模机器学习模型的碳足迹。
创新关键点
研究人员开发了一种称为微型训练引擎的系统,可以在缺乏操作系统的简单微控制器上运行这些算法创新。此系统更改训练过程中的步骤顺序,以便在将模型部署到边缘设备上之前,在编译阶段完成更多工作。
创新价值
研究人员开发的智能算法和框架减少了训练模型所需的计算量,这使得该过程更快,内存效率更高。他们的技术可用于在几分钟内在微控制器上训练机器学习模型。
设备上的学习是研究人员正在努力实现互联智能边缘的下一个重大进展。
Using microcontrollers to train AI models can reduce the risk of privacy exposure
To address these concerns, researchers at MIT and the MIT-IBM Watson AI Lab have developed a new technology that can train devices using less than a quarter of a Megabyte of memory. Other training solutions designed for connected devices can use more than 500 megabytes of memory, significantly exceeding the 256 kilobytes capacity of most microcontrollers (1,024 kilobytes in a Megabyte).
The intelligent algorithms and framework developed by the researchers reduce the amount of computation required to train the model, which makes the process faster and more memory efficient. Their technique can be used to train machine learning models on microcontrollers in minutes.
The technology also protects privacy by keeping data on the device, which can be particularly useful when data is sensitive, such as in medical applications. It can also tailor the model to the user's needs. In addition, the proposed framework preserves or improves the accuracy of the model compared with other training methods.
The research enables iot devices not only to reason, but also to continuously update AI models with newly collected data, paving the way for lifelong device learning. Low resource utilization makes deep learning more accessible and can have broader coverage, especially for low-power edge devices.
Light weight training
One common type of machine learning model is called neural networks. These models are loosely based on the human brain and contain layers of interconnected nodes or neurons that process data to complete tasks, such as identifying people in photographs. The model must first be trained, which involves showing it millions of examples so that it can learn the task. As it learns, the model increases or decreases the strength of connections between neurons, which are called weights.
The model may undergo hundreds of updates as it learns and must store intermediate activations in each round. In neural networks, activation is the intermediate result of the middle layer. With potentially millions of weights and activations, training the model requires more memory than running a pre-trained model.
The researchers used two algorithmic solutions to make the training process more efficient and less memory-intensive. The first, called sparse updating, uses an algorithm to identify the most important weights to be updated in each training round. The algorithm starts freezing one weight at a time until it sees the accuracy drop to a set threshold, and then stops. The remaining weights are updated, and the activations corresponding to the frozen weights do not need to be stored in memory.
The second solution involves quantization training and simplifying the weights, which are typically 32 bits. The algorithm rounds the weights through a process called quantization so that they have only eight bits, which reduces the amount of memory for training and reasoning. Inference is the process of applying a model to a data set and generating predictions. The algorithm then applies a technique called quantized aware scaling (QAS), which acts like a multiplier to adjust the ratio between weights and gradients to avoid the possible drop in accuracy due to quantized training.
The researchers developed a system called the Mini-Training Engine that can run these algorithmic innovations on simple microcontrollers that lack an operating system. This system changes the sequence of steps in the training process so that more work can be done in the compilation phase before the model is deployed on edge devices.
Acceleration of success
Their optimization requires only 157 kilobytes of memory to train a machine learning model on a microcontroller, whereas other techniques designed for lightweight training still require 300 to 600 megabytes.
They tested their framework by training computer vision models to detect people in images. After just 10 minutes of training, it learned to complete the task successfully. Their method is able to train models 20 times faster than other methods.
Now that they have demonstrated the success of these techniques with computer vision models, the researchers hope to apply them to language models and different types of data, such as time series data. At the same time, they hope to use what they've learned to downsize larger models without sacrificing accuracy, which could help reduce the carbon footprint of training large-scale machine learning models.
智能推荐
利用新型“质子可编程电阻”进一步提升模拟深度学习的效率
2022-08-31多学科研究团队利用二氧化硅和磷,制造出了拥有高质子导电性的新型无机磷硅酸盐玻璃(PSG)材料,并开发出了一种新型高能效质子可编程电阻。利用这种新型电阻器,通过增加和减少质子电阻的电导,实现了更快、更节能的模拟深度学习,从而使人工智能具有高效率、低能耗的计算能力。
涉及学科涉及领域研究方向人工智能通过图片识别皮肤病,减少误诊及医疗成本
2022-08-05研究人员设计了一款通过图像识别和管理常见皮肤病的应用程序,防止因误诊或拖延而加重病情,同时帮助卫生组织减少因不必要的复诊、无效的处方和不必要的转诊造成的医疗成本。
涉及学科涉及领域研究方向AI+能源化学 | 利用机器学习帮助提高锂离子电池和燃料电池的性能
2022-09-01帝国理工学院的研究人员将机器学习与能源储存相结合,开发出了一种名为“深度卷积生成对抗网络”(DC-GANs)的新技术。研究团队的发现将帮助能源领域的研究人员设计和制造优化电极,以提高电池性能。
涉及学科涉及领域研究方向AI+医学检验技术 | 利用新兴技术完成核酸检测全自动化
2022-06-30核酸检测创新融合人工智能、机器人、大数据和物联网,提高检测效率,降低人力成本。
涉及学科涉及领域研究方向