Machine learning

44 阅读1分钟

Machine learning is about designing algorithms that automatically extract valuable information from data. The emphasis here is on “automatic”, i.e., machine learning is concerned about general-purpose methodologies that can be applied to many datasets, while producing something that is mean ingful. There are three concepts that are at the core of machine learning: data, a model, and learning. Since machine learning is inherently data driven, data is at the core data of machine learning. The goal of machine learning is to design general purpose methodologies to extract valuable patterns from data, ideally without muchdomain-specific expertise. For example, given a large corpus of documents (e.g., books in many libraries), machine learning methods can be used to automatically find relevant topics that are shared across documents (Hoffman et al., 2010). To achieve this goal, we design mod els that are typically related to the process that generates data, similar to model the dataset we are given. For example, in a regression setting, the model would describe a function that maps inputs to real-valued outputs. To paraphrase Mitchell (1997): A model is said to learn from data if its per formance on a given task improves after the data is taken into account. The goal is to find good models that generalize well to yet unseen data, which we may care about in the future. Learning can be understood as a learning way to automatically find patterns and structure in data by optimizing the parameters of the model. While machine learning has seen many success stories, and software is readily available to design and train rich and flexible machine learning systems, we believe that the mathematical foundations of machine learn ing are important in order to understand fundamental principles upon which more complicated machine learning systems are built. Understand ing these principles can facilitate creating new machine learning solutions, understanding and debugging existing approaches, and learning about the inherent assumptions