How does Quora use machine learning in 2017? - Quora

235 阅读6分钟
原文链接: www.quora.com

Back in 2015, Xavier Amatriain, our VP of Engineering wrote a great answer on how we use Machine Learning at Quora: How does Quora use machine learning in 2015? Since then, the usage of Machine Learning at Quora has grown a lot. We have not only gone deeper with bigger and better models for existing Machine Learning applications, but we also have expanded the set of areas where we use Machine Learning. In this answer, I'll try and paint a picture from the ground-up on how Quora is using Machine Learning in 2017.

Machine Learning use cases

I'll walk through different parts of the product and talk about how we use Machine Learning in all of these parts.

1. Seeking information

The main format of knowledge sharing on Quora is questions and answers. This starts with a user having a question, or an “information need”, that they want to satisfy. After a user asks a new question on Quora, we have a set of Machine Learning systems doing question understanding, i.e. extracting information from the question in a way that helps make the rest of flow easier for us. I'll describe a few of these question understanding systems.
 
We care a lot about the quality of content, and it all starts with question quality. We have an ML system that takes a question, does question quality classification and helps us distinguish between high-quality and low-quality questions. Along with question quality, we also determine a few different question types, that help us determine how we should treat the question later on in the flow.
 
Finally, we also do question-topic labeling, where we determine what topics the question is about. While most topic modeling applications deal with a large document text and a smallish topic ontology, we work with a short question text and more than a million potential topics to tag on the question, which makes it a much more challenging problem to solve.

In all the question understanding models, we use features derived from the question and its context, e.g. the user who asked the question, the locale where the question was asked etc.
 
Another way to satisfy the user's information need is by letting them search for existing questions that answer what they are looking for. We have two main search systems: Ask Bar search, which powers the top-of-the-page search bar on the Quora homepage, and Full-text search, which is a deeper search that you can access by clicking on the “Search” option in the Ask Bar results. These search systems use different ranking algorithms that differ in terms of search speed, relevance and the breadth and depth of the results they return.

2. Getting answers to questions

The output of the question understanding systems forms an important input to the next step in the lifecycle of a question: getting answers from experts. Here too, we have Machine Learning systems that help us solve this problem better.

Request Answers (previously known as Ask To Answer) is a feature of Quora that allows users to send requests to other users asking them to write an answer to a particular question. We frame Request Answers as a Machine Learning problem. We covered the details of the system in this blog post: Ask To Answer as a Machine Learning Problem.
 
Outside of A2A, the main way we match unanswered questions to expert answer-writers is via the Quora homepage feed. Ranking questions on the feed is a very important problem for us. We take into account the question properties as described above, user properties (more on that below), and a whole set of other raw and derived features as inputs to the ranking model, to generate a feed that is topical, relevant and personalized for you. E.g. here is a screenshot of my feed from a few days ago.

3. Reading content

As you can see in my feed above, the feed not only consists of questions that you can write answers to, it also consists of answers worth reading. Ranking answers on the feed is another ML problem that is very important for us. Question ranking and answer ranking on the feed use similar underlying systems but have very different objectives, and as a result, use a different set of features in their underlying models. Another place where we use Machine Learning to rank answers worth reading are the Email Digests that we send to our users. All of these ranking problems are powered by fairly advanced ML systems that use multiple models and many different features to come up with the final ranking.
 
Once a user finds a question that's interesting to them, we want to make sure they have a great reading experience on Quora. Ranking answers for questions is an important ML application for us that ensures that the most relevant answers are ranked on the top for a given question. We talk about the ML behind answer ranking in detail here: A Machine Learning Approach to Ranking Answers on Quora. Staying on answers, we also do comment ranking to ensure that you see the most relevant answer comments on the top. All of these ranking systems go beyond simple upvotes and downvotes and use features derived from the users who upvoted or downvoted the content, the content quality, the engagement activity etc. to come up with the final ranking.
 
We also want to make sure that after you read the answers to a particular question, you have good avenues to find related content and continue your reading experience. One such product feature powered by Machine Learning is related questions. We show related questions on the question page and they help users navigate the Quora question space more easily. Also helping you navigate Quora as a reader are ranking systems like Related topics (shown on the topic page such as this one) and Trending topics (shown on the home page). On the home page, we also show panels for topics to follow and users to follow, both of which are recommendation systems personalized based on what we know about you.

A very important element of the above ML systems is personalization. Personalization involves making the product and the underlying systems relevant to each individual user of Quora. A big component of making the ML systems personalized is user understanding signals. As part of user understanding, we observe and derive various traits of users such as the topics they like/dislike, their expertise in different areas and their social network properties. We also have various “user-entity” affinity systems, e.g. user-topic affinity, user-user affinity etc. All of these personalization signals form important inputs not just for the “Reading” applications in this section, but also for matching questions to expert answer-writers and other use cases.

4. Maintaining high content quality

One of the things critical to a great user experience on Quora is the content quality. We want to make sure our questions, answers, topics and other content start off high quality and remain high quality throughout their lifetime. In order to do this, we have a set of Machine Learning systems working hard to maintain content quality. Here are a few of them:

  • Duplicate question detection: This involves detecting different questions that have the same intent and merging them into a single canonical question. We have talked in detail about our explorations on the duplicate questions problem here, and even released a duplicate questions dataset and started a Kaggle competition for you to play with it.
  • Abusive content detection: We have a policy at Quora, “Be Nice, Be Respectful”, but that's always challenging to maintain in online communities. We use Machine Learning in combination with human reviewers to help identify offensive or hurtful content so that we can better protect our users and make sure they have a great Quora experience.
  • Spam Detection: Spam detection is an important problem for most popular user-generated content apps, and we are no different. We have a few different ML systems working in combination to tackle spam content and users who post them.

There are many other ML systems for maintaining quality, but I won't go into them in the interest of space.

5. Ad Optimization

In 2016, we also kicked off our monetization efforts. Currently, we show ads that are relevant to the intent of the question on the question page. We use Machine Learning to do Ads CTR prediction, which ensures that the ads that we show are relevant to the users and high value-for-money for the advertisers. Our Machine Learning efforts in the monetization space are still young, and in the next few months and years, we will expand our use of ML here a lot.

We also have other Machine Learning systems than the ones listed above, but I won't go into them in order to avoid making this answer too long.

Models and Libraries

We empower our teams to use the best models and tools for the job, but also make sure we maintain standardization and reuse across these tools. Here are some of the models we use (in no particular order):

  • Logistic Regression
  • Elastic Nets
  • Gradient Boosted Decision Trees
  • Random Forests
  • (Deep) Neural Networks
  • LambdaMART
  • Matrix Factorization (SVD, BPR, Weighted ALS etc.)
  • Vector models and other NLP techniques
  • k-means and other clustering approaches
  • and others

We also support a wide set of open source and internal libraries for getting the job done such as Tensorflow, sklearn, xgboost, lightgbm, RankLib, nltk, QMF (Quora's own matrix factorization library), along with a few other internal ones.

ML Platform

Another exciting development for us since 2015 is the formation of a new ML Platform team. The goal of the ML Platform team is to make working on Machine Learning easier for the ML engineers in the rest of the company, both on the offline (model training) and the online (model serving) side of things. On the online side, the ML Platform owns the systems that help ML engineers to build and deploy high performance, cost efficient, realtime machine learning systems with high reliability and availability. On the offline side, the ML Platform team enables ML engineers to build data pipelines, do feature generation and train models in a fast, standardized and reusable way.
 
Having a dedicated platform team supporting Machine Learning at Quora has helped us accelerate our ML development speed to be much faster than before. It also sets us up well as we scale up our systems to handle bigger and bigger data every day. We will share more details about the ML Platform team and its roadmap in a future answer.
 
I hope this answer gave you a good picture of how Quora is using Machine Learning in 2017. If any of the above excites you, you should know that we are hiring for different kinds of Machine Learning roles! Please check out our careers page for open positions.