2018顶会论文合集「下」SIGIR（International ACM SIGIR Conference on Res

SIGIR 2018

会议时间：7月8日~12日

会议地点：安娜堡，美国

SIGIR（International ACM SIGIR Conference on Research and Development in Information Retrieval）是展示信息检索领域新技术和新成果的顶级国际会议，始于1978年，由ACM主办。

2018年SIGIR总投稿量达409篇，最终录取86篇，录取率约为21%。

最佳论文

《Should I Follow the Crowd? A Probabilistic Analysis of the Effectiveness of Popularity in Recommender Systems》

Rocio Caamares， Pablo Castells

【Abstract】The use of IR methodology in the evaluation of recommender systems has become common practice in recent years. IR metrics have been found however to be strongly biased towards rewarding algorithms that recommend popular items –the same bias that state of the art recommendation algorithms display. Recent research has confirmed and measured such biases, and proposed methods to avoid them. The fundamental question remains open though whether popularity is really a bias we should avoid or not; whether it could be a useful and reliable signal in recommendation, or it may be unfairly rewarded by the experimental biases. We address this question at a formal level by identifying and modeling the conditions that can determine the answer, in terms of dependencies between key random variables, involving item rating, discovery and relevance. We find conditions that guarantee popularity to be effective or quite the opposite, and for the measured metric values to reflect a true effectiveness, or qualitatively deviate from it. We exemplify and confirm the theoretical findings with empirical results. We build a crowdsourced dataset devoid of the usual biases displayed by common publicly available data, in which we illustrate contradictions between the accuracy that would be measured in a common biased offline experimental setting, and the actual ac- curacy that can be measured with unbiased observations.

【论文摘要】在推荐系统的评估中使用IR方法论在近年来已成为惯例。然而，IR指标在推荐受欢迎条目的奖励算法中被发现有很强的偏见，相同的偏见在当前最佳的推荐算法中也出现了。近期的研究证实并测量了这种偏见，并提出了相应的方法来避免它们。问题仍然是开放性的：即流行度是不是一种需要避免的偏见；它在推荐系统中是不是一种有用的和可靠的信号；或者它是否可能由实验偏见带来不公平的奖励。我们通过识别和建模可以确定（关于关键随机变量之间的依赖关系，涉及条目评分、发现和相关性）答案的条件，在形式层次上解决了这个问题。我们发现了保证有效流行度（或恰好相反）的条件，和反映真实有效性的测量指标值的条件，或定量地从中推导出。我们通过经验结果证实了理论发现。我们构建了一个完全没有在常见的公共数据中存在的偏见的众包数据集，其中我们解释了在常见带偏见离线实验设置的准确率，和通过无偏见观察数据测量得到的真实准确率之间的矛盾。

SIGKDD 2018

会议时间：8月19日~23日

会议地点：伦敦，英国

ACM SIGKDD 国际会议是由 ACM 的知识发现及数据挖掘专委会（SIGKDD）主办的数据挖掘研究领域的顶级年会。KDD 大会涉及的议题大多跨学科且应用广泛，吸引了来自统计、机器学习、数据库、万维网、生物信息学、多媒体、自然语言处理、人机交互、社会网络计算、高性能计算以及大数据挖掘等众多领域的专家和学者参会。

Research Track最佳论文

《Adversarial attacks on classification models for Graphs》

Daniel Zügner， Amir Akbarnejad， Stephan Günnemann

【Abstract】Deep learning models for graphs have achieved strong performance for the task of node classification. Despite their proliferation, currently there is no study of their robustness to adversarial attacks. Yet, in domains where they are likely to be used, e.g. the web, adversaries are common. Can deep learning models for graphs be easily fooled? In this work, we introduce the first study of adversarial attacks on attributed graphs, specifically focusing on models exploiting ideas of graph convolutions. In addition to attacks at test time, we tackle the more challenging class of poisoning/causative attacks, which focus on the training phase of a machine learn ing model. We generate adversarial perturbations targeting the node’s features and the graph structure, thus, taking the dependencies between instances in account. Moreover, we ensure that the perturbations remain unnoticeable by preserving important data characteristics. To cope with the underlying discrete domain we propose an efficient algorithm Nettack exploiting incremental computations. Our experimental study shows that accuracy of node classification significantly drops even when performing only few perturbations. Even more, our attacks are transferable: the learned attacks generalize to other state-of-the-art node classification models and unsupervised approaches, and likewise are successful even when only limited knowledge about the graph is given.

【论文摘要】图的深度学习模型在节点分类任务中有很好的表现。尽管它们被大量应用，但是目前还没有研究它们抗敌攻击的能力。然而，在其可能被使用的领域，例如网络应用中，对手是很常见的。图的深度学习模型容易被愚弄吗？在本文中，我们首先介绍了对属性图的对抗性攻击的研究，特别集中于利用图卷积思想的模型。除了测试时的攻击，我们还处理了更具挑战性的一类中毒/因果攻击，其重点是机器学习模型的训练阶段。我们生成针对节点特征和图结构的对抗性扰动，从而考虑实例之间的依赖关系。此外，通过保持重要数据特征，我们确保扰动保持不明显。为了处理底层离散域，我们提出了利用增量计算的有效算法Nettack。我们的实验研究表明，即使只执行很少的扰动，节点分类的精度也显著下降。此外，我们的攻击是可推广的：学习到的攻击推广到其他最先进的节点分类模型和无监督方法，即使仅给出关于图的有限知识，结果同样是成功的。

Research Track最佳学生论文

《XiaoIce Band: A Melody and Arrangement Generation Framework for Pop Music》

Hongyuan Zhu， Qi Liu， Nicholas Jing Yuan， Chuan Qin， Jiawei Li， Kun Zhang， Guang Zhou， Furu Wei， Yuanchun Xu， Enhong Chen

【Abstract】With the development of knowledge of music composition and the recent increase in demand, an increasing number of companies and research institutes have begun to study the automatic generation of music. However, previous models have limitations when applying to song generation, which requires both the melody and arrangement. Besides, many critical factors related to the quality of a song such as chord progression and rhythm patterns are not well addressed. In particular, the problem of how to ensure the harmony of multi-track music is still underexplored. To this end, we present a focused study on pop music generation, in which we take both chord and rhythm influence of melody generation and the harmony of music arrangement into consideration. We pro- pose an end-to-end melody and arrangement generation frame- work, called XiaoIce Band, which generates a melody track with several accompany tracks played by several types of instruments. Specifically, we devise a Chord based Rhythm and Melody Cross- Generation Model (CRMCG) to generate melody with chord progressions. Then, we propose a Multi-Instrument Co-Arrangement Model (MICA) using multi-task learning for multi-track music arrangement. Finally, we conduct extensive experiments on a real- world dataset, where the results demonstrate the effectiveness of XiaoIce Band.

【论文摘要】随着音乐创作知识的发展和近年来需求的增加，越来越多的公司和研究机构开始研究音乐的自动生成。然而，以往的模型在应用于歌曲生成时存在局限性，因为这既需要旋律又需要编排。此外，许多与歌曲质量相关的关键因素，如和弦和节奏模式没有得到很好的解决。尤其是如何保证多声道音乐的和谐，目前还处于探索阶段。为此，我们对流行音乐的产生进行了重点研究，在研究中，我们既考虑了旋律产生的和弦和节奏的影响，也考虑了和声。我们提出了一个端到端的旋律和排列生成框架，称为小冰乐队（Xiaoice Band），它用几种乐器演奏的多个伴奏曲目生成旋律曲目。特别地，我们设计了一个基于和弦的节奏和旋律交叉生成模型（chord based rhythm and melody cross generation model, CRMCG）来生成具有和弦的旋律。然后，我们提出了采用多任务学习的多乐器协奏模型（Multi-Instrument Co-Arrangement Model，MICA），用于多声道音乐的编曲。最后，我们在真实数据集上进行了大量的实验，结果证明了小冰乐队的有效性。

ADS Track 最佳论文

《Real-time Personalization using Embeddings for Search Ranking at Airbnb》

Mihajlo Grbovic， Haibin Cheng

【Abstract】Search Ranking and Recommendations are fundamental problems of crucial interest to major Internet companies, including web search engines, content publishing websites and marketplaces. How- ever, despite sharing some common characteristics a one-size-fits- all solution does not exist in this space. Given a large difference in content that needs to be ranked, personalized and recommended, each marketplace has a somewhat unique challenge. Correspondingly, at Airbnb, a short-term rental marketplace, search and recommendation problems are quite unique, being a two-sided market- place in which one needs to optimize for host and guest preferences, in a world where a user rarely consumes the same item twice and one listing can accept only one guest for a certain set of dates. In this paper we describe Listing and User Embedding techniques we developed and deployed for purposes of Real-time Personalization in Search Ranking and Similar Listing Recommendations, two channels that drive 99% of conversions. The embedding models were specifically tailored for Airbnb marketplace, and are able to capture guest’s short-term and long-term interests, delivering effective home listing recommendations. We conducted rigorous offline testing of the embedding models, followed by successful online tests before fully deploying them into production.

【论文摘要】搜索排名和推荐是互联网公司非常感兴趣的基本问题，包括网络搜索引擎、内容发布网站和市场。然而，尽管共享了一些共同的特征，但是在这个空间中不存在一个通用的解决方案。鉴于需要排名、个性化和推荐的内容有很大差异，每个市场都有一些独特的挑战。相应地，在Airbnb，短期租赁市场、搜索和推荐问题非常独特，它是一个双向市场，需要同时优化屋主和入住客户的偏好，这是一个用户很少两次消费相同的物品、并且一个列表在某一组日期只能接受一个客户的“世界”。在本文中，我们描述了为了在搜索排名和类似列表推荐中实现实时个性化而开发和部署的列表和用户嵌入技术，这两个通道驱动99%的转换。嵌入模型是专门为Airbnb市场量身定制的，能够捕捉客户的短期和长期兴趣，提供有针对性的住户建议。我们对嵌入模型进行了严格的离线测试，然后在将它们完全部署到生产环境中之前对其进行了成功的在线测试。

ADS Track 最佳学生论文

《ActiveRemediation: The Search for Lead Pipes in Flint, Michigan》

Jacob Abernethy， Alex Chojnacki， Arya Farahi， Eric Schwartz， Jared Webb

【Abstract】We detail our ongoing work in Flint, Michigan to detect pipes made of lead and other hazardous metals. After elevated levels of lead were detected in residents’ drinking water, followed by an increase in blood lead levels in area children, the state and federal governments directed over $125 million to replace water service lines, the pipes connecting each home to the water system. In the absence of accurate records, and with the high cost of determining buried pipe materials, we put forth a number of predictive and procedural tools to aid in the search and removal of lead infrastructure. Alongside these statistical and machine learning approaches, we describe our interactions with government officials in recommending homes for both inspection and replacement, with a focus on the statistical model that adapts to incoming information. Finally, in light of discussions about increased spending on infrastructure development by the federal government, we explore how our approach generalizes beyond Flint to other municipalities nationwide.

【论文摘要】我们详细介绍了我们在密歇根州Flint市正在进行的工作：检测由铅和其他有害金属制成的管道。在居民饮用水中检测到铅含量升高，随后该地区儿童血铅水平升高之后，州和联邦政府拨款超过1.25亿美元用以更换供水管线，这些管线将每个家庭连接到供水系统。在缺乏准确记录的情况下，并且由于确定埋地管道材料的高成本，我们提出了许多预测和程序化工具，以帮助搜索和移除铅基础设施。除了这些统计和机器学习方法之外，我们还描述了我们与政府官员在建议房屋进行检查和更换方面的交流，重点是适应输入信息的统计模型。最后，根据联邦政府关于增加基础设施建设支出的讨论，我们探索我们的方法如何从Flint市推广到全国其他城市。

ICLR 2018

会议时间：4月30日~5月3日

会议地址：温哥华，加拿大

ICLR，全称为International Conference on Learning Representations（国际学习表征会议），2013 年才刚刚成立了第一届。这个一年一度的会议虽然今年才办到第六届，但已经被学术研究者们广泛认可，被认为深度学习的顶级会议，有深度学习顶会“无冕之王”之称。

ICLR由Yann LeCun 和 Yoshua Bengio 等大牛发起，会议开创了公开评议机制（open review），但在今年取消了公开评议，改为双盲评审。

今年参与人数达到2000人，47.8% 的是来自美国，其次 15.9% 的来自加拿大，8.6% 的来自英国，而只有 3.8% 的来自中国。共计收到1003篇论文投稿，最终2.3%的Oral论文、31.4%的Poster论文、9%被接收为Workshop track，51%的论文被拒收、6.2%的撤回率。

最佳论文

《On the convergence of Adam and Beyond》

Sashank J. Reddi, Satyen Kale & Sanjiv Kumar

【Abstract】Several recently proposed stochastic optimization methods that have been successfully used in training deep networks such as RMSPROP, ADAM, ADADELTA, NADAM are based on using gradient updates scaled by square roots of exponential moving averages of squared past gradients. In many applications, e.g. learning with large output spaces, it has been empirically observed that these algorithms fail to converge to an optimal solution (or a critical point in nonconvex settings). We show that one cause for such failures is the exponential moving average used in the algorithms. We provide an explicit example of a simple convex optimization setting where ADAM does not converge to the optimal solution, and describe the precise problems with the previous analysis of ADAM algorithm. Our analysis suggests that the convergence issues can be fixed by endowing such algorithms with “long-term memory” of past gradients, and propose new variants of the ADAM algorithm which not only fix the convergence issues but often also lead to improved empirical performance.

【论文摘要】最近提出的几种在深层网络训练中成功应用的随机优化方法，如RMSprop、ADAM、ADADELTA、NADAM，这些方法都是基于梯度更新的算法，在更新的过程中，利用了历史梯度平方的指数移动平均值的平方根进行缩放。然而在许多应用中，例如在具有大输出空间的学习中，我们观察到这些算法不能收敛到最优解（或非凸问题中的临界点）。我们证明了这种失效的一个原因是算法中使用的指数移动平均。我们给出了一个简单的凸优化示例，在这个示例中ADAM无法收敛到最优解，并在文中描述了Adam算法分析中以往存在的关键问题。。我们的分析表明，通过赋予这种算法对过去梯度的“长期记忆”，可以修复收敛问题，并且提出了新的ADAM算法，它不仅解决了收敛问题，而且常常能提高经验性能。

《Spherical CNNs》

Taco S. Cohen，Mario Geiger，Jonas Köhler， Max Welling

【Abstract】Convolutional Neural Networks (CNNs) have become the method of choice for learning problems involving 2D planar images. However, a number of problems of recent interest have created a demand for models that can analyze spherical images. Examples include omnidirectional vision for drones, robots, and autonomous cars, molecular regression problems, and global weather and climate modelling. A naive application of convolutional networks to a planar projection of the spherical signal is destined to fail, because the space-varying distortions introduced by such a projection will make translational weight sharing ineffective.

In this paper we introduce the building blocks for constructing spherical CNNs. We propose a definition for the spherical cross-correlation that is both expressive and rotation-equivariant. The spherical correlation satisfies a generalized Fourier theorem, which allows us to compute it efficiently using a generalized (non-commutative) Fast Fourier Transform (FFT) algorithm. We demonstrate the computational efficiency, numerical accuracy, and effectiveness of spherical CNNs applied to 3D model recognition and atomization energy regression.

【论文摘要】卷积神经网络（CNN）已经成为二维平面图像学习问题的首选方法。然而，最近一些有趣的问题产生了对能够分析球形图像的模型的需求。比如无人机、机器人和自主汽车的全向视觉、分子回归问题、以及全球天气和气候建模。将卷积网络应用于球面信号的平面投影肯定会失败，因为这种投影引入的空间变化失真将使平移权重共享无效。本文介绍了构建球形CNN的基本模块。我们提出了球面互相关的一个定义，即它既是表示性的，又是旋转等变的。球面相关满足一个广义傅立叶定理，它允许我们使用广义（非交换）快速傅立叶变换（FFT）算法有效地计算它。我们证明了球形CNN应用于三维模型识别和雾化能量回归的计算效率、数值精度和有效性。

《Continuous adaptation via meta-learning in nonstationary and competitive environments》

Maruan Al-Shedivat， Trapit Bansal， Yura Burda， Ilya Sutskever， Igor Mordatch， Pieter Abbeel

【Abstract】The ability to continuously learn and adapt from limited experience in nonstationary environments is an important milestone on the path towards general intelligence. In this paper, we cast the problem of continuous adaptation into the learning-to-learn framework. We develop a simple gradient-based meta-learning algorithm suitable for adaptation in dynamically changing and adversarial scenarios. Additionally, we design a new multiagent competitive environment, RoboSumo, and define iterated adaptation games for testing various aspects of continuous adaptation. We demonstrate that meta-learning enables significantly more efficient adaptation than reactive baselines in the fewshot regime. Our experiments with a population of agents that learn and compete suggest that metalearners are the fittest.

【论文摘要】持续学习并适应非平稳环境中的有限经验的能力对通用人工智能的发展至关重要。在本文中，我们将连续适应问题引入到学习到学习框架中。我们提出了一个简单的基于梯度的元学习算法，适合于适应动态变化和对抗性的情况。此外，我们设计了一个新的多智能体竞争环境RoboSumo，并定义了迭代适应游戏来测试连续适应的各个方面。我们证明，元学习比反应性基线在少样本模式下能更有效地进行适应。我们对智能体集群的学习和竞争实验表明，元学习是最合适的方法。

COLT 2018

最佳论文

《Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations》

Yuanzhi Li, Tengyu Ma and Hongyang Zhang.

【Abstract】We show that the gradient descent algorithm provides an implicit regularization effect in the learning of over-parameterized matrix factorization models and one-hidden-layer neural networks with quadratic activations.

Concretely, we show that given O ̃(dr2) random linear measurements of a rank r positive semidefinite matrix X⋆, we can recover X⋆ by parameterizing it by UU⊤ with U ∈ $R^{dxd}$ and minimizing the squared loss, even if r ≪ d. We prove that starting from a small initialization, gradient descent recovers X⋆ in O ̃(√r) iterations approximately. The results solve the conjecture of Gunasekar et al. [16] under the restricted isometry property.

The technique can be applied to analyzing neural networks with one-hidden-layer quadratic activations with some technical modifications.

【论文摘要】我们发现梯度下降法为训练过参数化的矩阵分解模型，以及使用二次函数作为激活函数的单隐含层神经网络提供了隐式的正则化效果。

具体的，我们发现给定秩为r的半正定矩阵X*的O(dr 2)随机线性测度，可以通过用UUT 来参数化该矩阵而重构出X*，其中U∈Ddxd ，通过最小化均方误差进行重，即使是

r<=d。我们证明了从一个小初始化开始，梯度下降法能够在o(√2)此迭代后重构出X*。这个结果在约束等距性下解决了 Gunasekar 等人的猜想。

经过修改，这种技术可以被用于分析使用二次激活函数，有单个隐含层的神经网络。

最佳学生论文

《Reducibility and Computational Lower Bounds for Problems with Planted Sparse Structure》

Matthew Brennan, Guy Bresler and Wasim Huleihel.

【Abstract】Recently, research in unsupervised learning has gravitated towards exploring statistical- computational gaps induced by sparsity. A line of work initiated in [BR13a] has aimed to explain these gaps through reductions from conjecturally hard problems in complexity theory. However, the delicate nature of average-case reductions has limited the development of tech- niques and often led to weaker hardness results that only apply to algorithms robust to different noise distributions or that do not need to know the parameters of the problem. We introduce several new techniques to give a web of average-case reductions showing strong computational lower bounds based on the planted clique conjecture. Our new lower bounds include:

• Planted Independent Set: We show tight lower bounds for detecting a planted inde- pendent set of size k in a sparse Erdo ̋s-R ́enyi graph of size n with edge density Θ ̃(n−α).

• Planted Dense Subgraph: If p > q are the edge densities inside and outside of the community, we show the first lower bounds for the general regime q = Θ ̃(n−α) and p−q = Θ ̃(n−γ) where γ ≥ α, matching the lower bounds predicted in [CX16]. Our lower bounds apply to a deterministic community size k, resolving a question raised in [HWX15].

• Biclustering: We show lower bounds for the canonical simple hypothesis testing formu- lation of Gaussian biclustering, slightly strengthening the result in [MW15b].

• Sparse Rank-1 Submatrix: We show that detection in the sparse spiked Wigner model is often harder than biclustering, and are able to obtain tight lower bounds for these two problems with different reductions from planted clique.

• Sparse PCA: We give a reduction between sparse rank-1 submatrix and sparse PCA to

obtain tight lower bounds in the less sparse regime k ≫ n, when the spectral algorithm is optimal over the natural SDP. We give an alternate reduction recovering the lower bounds of [BR13a, GMZ17] in the simple hypothesis testing variant of sparse PCA. We also observe a subtlety in the complexity of sparse PCA that arises when the planted vector is biased.

• Subgraph Stochastic Block Model: We introduce a model where two small communi- ties are planted in an Erdo ̋s-R ́enyi graph of the same average edge density and give tight lower bounds yielding different hard regimes than planted dense subgraph.

Our results demonstrate that, despite the delicate nature of average-case reductions, using natural problems as intermediates can often be beneficial, as is the case in worst-case complexity. Our main technical contribution is to introduce a set of techniques for average-case reductions that: (1) maintain the level of signal in an instance of a problem; (2) alter its planted structure; and (3) map two initial high-dimensional distributions simultaneously to two target distributions approximately under total variation. We also give algorithms matching our lower bounds and identify the information-theoretic limits of the models we consider.

【论文摘要】最近，对无监督学习的研究偏向于探索由于稀疏性所导致的统计计算间隙。文献[BR13a]

所开启的一系列研究的目标是通过对复杂性理论中猜测的难问题进行一些规约来解释这些间隙。但是，平均规约脆弱的特性限制了技术的发展，通常导致很弱的难结果，只能用于那些对不同噪声分布鲁棒的算法，或者是不需要知道问题的参数的算法。本文提出了一些新的技术，给出了由均值规约组成的网，基于planted clique猜想，给出了强计算下界。我们得到的新的下界包括：

Planted independent set：我们得到了在尺寸为n、边密度为Θ ̃(n−α)的稀疏Erdos-Renyi图中检测尺寸为k的planted independent集的紧下界。

Planted dense subgraph：如果p>q分别为community内部和外部的边密度，我们得到了通用间隙的第一个下界为q = Θ ̃(n−α) ，并且 p−q = Θ ̃(n−γ) ，当γ ≥ α时，与文献[CX16]预测的下界吻合。这些下界用于尺寸为k的确定性community时，解决了文献[HWX15]提出的问题。

Biclustering：为表述高斯双聚类的典型简单假设检验提出了下界，直接加强了文献[MW15b]的结果。

Sparse Rank-1 Submatrix：发现在稀疏spiked wigner模型中进行检测通常比双聚类更难，而且能得到更紧的下界。

《Logistic Regression: The Importance of Being Improper》

Dylan Foster, Satyen Kale, Haipeng Luo, Mehryar Mohri and Karthik Sridharan.

【Abstract】Learning linear predictors with the logistic loss—both in stochastic and online settings—is a fundamental task in machine learning and statistics, with direct connections to classification and boosting. Existing “fast rates” for this setting exhibit exponential dependence on the predictor norm, and Hazan et al. (2014) showed that this is unfortunately unimprovable. Starting with the simple observation that the logistic loss is 1-mixable, we design a new efficient improper learning algorithm for online logistic regression that circumvents the aforementioned lower bound with a regret bound exhibiting a doubly-exponential improvement in dependence on the predictor norm. This provides a positive resolution to a variant of the COLT 2012 open problem of McMahan and Streeter (2012) when improper learning is allowed. This improvement is obtained both in the online setting and, with some extra work, in the batch statistical setting with high probability. We also show that the improved dependence on predictor norm is near-optimal.

Leveraging this improved dependency on the predictor norm yields the following applica- tions: √(a) we give algorithms for online bandit multiclass learning with the logistic loss with an O ( n) relative mistake bound across essentially all parameter ranges, thus providing a so- lution to the COLT 2009 open problem of Abernethy and Rakhlin (2009), and (b) we give an adaptive algorithm for online multiclass boosting with optimal sample complexity, thus partially resolving an open problem of Beygelzimer et al. (2015) and Jung et al. (2017). Finally, we give information-theoretic bounds on the optimal rates for improper logistic regression with general function classes, thereby characterizing the extent to which our improvement for linear classes extends to other parametric and even nonparametric settings.

【论文摘要】无论是在随机还是在线情况下，用logistic损失训练线性预测器都是机器学习和统计学中的一个基本任务，与分类和boosting直接相关。现有的对这些情况的“faste rates”显示出对预测函数的范数成指数级的相关性，并且文献[Hazan et al 2014]发现这一个问题是无法改进的。从logistic损失是1-mixable这一结果开始，本文为在线logistichuig 设计了一个新的有效的improper学习算法，避开了上面提到的下界，得到了一个regret bound，它对预测器范数的相关性有双指数级提升。这一结论正面回答了COLT 2012的开放问题的变种。这一改进对在线条件下是有效的。另外还显示了这一改进的对预测期的范数的相关性是近似最优的。

COLING 2018

会议时间：8月22日~25日

会议地点：新墨西哥，USA

国际计算语言学大会（International Conference on Computational Linguistics，COLING）也是计算语言学/自然语言处理的重要会议，每两年举办一次。

Best error analysis

《SGM: Sequence Generation Model for Multi-label Classification》

Pengcheng Yang, Xu Sun, Wei Li, Shuming Ma, Wei Wu and Houfeng Wang

【Abstract】Multi-label classification is an important yet challenging task in natural language processing. It is more complex than single-label classification in that the labels tend to be correlated. Existing methods tend to ignore the correlations between labels. Besides, different parts of the text can contribute differently for predicting different labels, which is not considered by existing models. In this paper, we propose to view the multi-label classification task as a sequence generation problem, and apply a sequence generation model with a novel decoder structure to solve it. Extensive experimental results show that our proposed methods outperform previous work by a substantial margin. Further analysis of experimental results demonstrates that the proposed methods not only capture the correlations between labels, but also select the most informative words automatically when predicting different labels.

【论文摘要】多标签分类是自然语言处理中一项重要而又具有挑战性的任务。它比单标签分类更复杂，因为标签往往是相互关联的。现有的方法往往忽略标签之间的相关性。此外，文本的不同部分对于预测不同的标签会有不同的贡献，这在现有的模型中没有考虑过。本文提出将多标签分类任务看作一个序列生成问题，并应用一种新的解码器结构的序列生成模型来解决该问题。大量的实验结果表明，我们提出的方法比之前的模型性能要好很多。通过对实验结果的进一步分析，表明该方法不仅能够捕获标签之间的相关性，而且在预测不同标签时能够自动选择信息量最大的单词。

Best linguistic analysis

《Distinguishing affixoid formations from compounds》

Josef Ruppenhofer, Michael Wiegand, Rebecca Wilm and Katja Markert

【Abstract】We study German affixoids, a type of morpheme in between affixes and free stems. Several properties have been associated with them – increased productivity; a bleached semantics, which is often evaluative and/or intensifying and thus of relevance to sentiment analysis; and the existence of a free morpheme counterpart – but not been validated empirically. In experiments on a new data set that we make available, we put these key assumptions from the morphological literature to the test and show that despite the fact that affixoids generate many low-frequency formations, we can classify these as affixoid or non-affixoid instances with a best F1-score of 74%.

【论文摘要】本文针对德语词缀做了研究，这是一种介于词缀和自由词干之间的词素。德语词缀与几个特性有关——生产力的提高；一种淡化的语义，它经常是评估性的和/或加强性的，因此与情感分析相关；和自由语素对应物的存在——但是这些并没有被验证。在一组新数据集上的实验中，我们从形态学文献中对这些关键假设进行了检验，结果表明，尽管附加物产生许多低频结构，但我们可以将它们分类为附加物或非附加物，其最佳F1-分数为74％。

Best NLP engineering experiment

《Authorless Topic Models: Biasing Models Away from Known Structure》

Laure Thompson and David Mimno

【Abstract】Most previous work in unsupervised semantic modeling in the presence of metadata has assumed that our goal is to make latent dimensions more correlated with metadata, but in practice the exact opposite is often true. Some users want topic models that highlight differences between, for example, authors, but others seek more subtle connections across authors. We introduce three metrics for identifying topics that are highly correlated with metadata, and demonstrate that this problem affects between 30 and 50% of the topics in models trained on two real-world collections, regardless of the size of the model. We find that we can predict which words cause this phenomenon and that by selectively subsampling these words we dramatically reduce topicmetadata correlation, improve topic stability, and maintain or even improve model quality.

【论文摘要】以前在存在元数据的无监督语义建模方面的大多数工作都假定我们的目标是使潜在维度与元数据更加相关，但在实践中恰恰相反：一些用户希望主题模型突出例如作者之间的区别，但是另一些用户希望作者之间有更微妙的联系。我们介绍了三个用于识别与元数据高度相关的主题的度量，并展示了在两个真实集合上训练的模型中，无论模型的大小如何，这个问题都会影响30%到50%的主题。我们发现，我们可以预测哪些单词导致了这种现象，并且通过选择性地对这些单词进行子采样，我们显著地减少了主题元数据相关性，提高了主题稳定性，并且保持甚至提高了模型质量。

Best position paper

《Arguments and Adjuncts in Universal Dependencies》

Adam Przepiórkowski and Agnieszka Patejuk

【Abstract】The aim of this paper is to argue for a coherent Universal Dependencies approach to the core vs. non-core distinction. We demonstrate inconsistencies in the current version 2 of UD in this respect – mostly resulting from the preservation of the argument–adjunct dichotomy despite the declared avoidance of this distinction – and propose a relatively conservative modification of UD that is free from these problems.

【论文摘要】本文论证了一个连贯的通用依赖（Universal Dependencies，UD）方法可以用来区分核心和非核心。我们在这方面论证了UD当前版本2中的不一致性——主要是由于保留了争论点（附加二分法，尽管声明避免了这一特性）——并且建议对UD进行相对保守的修改，以免出现这些问题。

Best reproduction paper

《Neural Network Models for Paraphrase Identification, Semantic Textual Similarity, Natural Language Inference, and Question Answering》

Wuwei Lan and Wei Xu

【Abstract】In this paper, we analyze several neural network designs (and their variations) for sentence pair modeling and compare their performance extensively across eight datasets, including paraphrase identification, semantic textual similarity, natural language inference, and question answering tasks. Although most of these models have claimed state-of-the-art performance, the original papers often reported on only one or two selected datasets. We provide a systematic study and show that (i) encoding contextual information by LSTM and inter-sentence interactions are critical, (ii) Tree-LSTM does not help as much as previously claimed but surprisingly improves performance on Twitter datasets, (iii) the Enhanced Sequential Inference Model (Chen et al., 2017) is the best so far for larger datasets, while the Pairwise Word Interaction Model (He and Lin, 2016) achieves the best performance when less data is available. We release our implementations as an open-source toolkit.

【论文摘要】本文分析了几种用于句子对建模的神经网络（及其衍生变种），并广泛比较了它们在八个数据集上的性能，包括释义识别、语义文本相似性、自然语言推理和问答任务。虽然大多数模型都声称具有最优的性能，但是最初的论文通常只报道一到两个选定的数据集。我们提供了一项系统研究，并且表明：(i)通过LSTM和句间交互来编码上下文信息是至关重要的，(ii)Tree-LSTM没有以前声称的那么多帮助，但是令人惊讶地提高了Twitter数据集的性能，(iii)增强顺序推理模型(Chen et al.，2017)是目前对于更大的数据集表现最好的，而成对单词交互模型（He et al.，2016）在可用数据较少时达到最佳性能。我们将实现源码作为开源工具包发布。

Best resource paper

《AnlamVer: Semantic Model Evaluation Dataset for Turkish – Word Similarity and Relatedness》

Gökhan Ercan and Olcay Taner Yıldız

【Abstract】In this paper, we present AnlamVer, which is a semantic model evaluation dataset for Turkish designed to evaluate word similarity and word relatedness tasks while discriminating those two relations from each other. Our dataset consists of 500 word-pairs annotated by 12 human subjects, and each pair has two distinct scores for similarity and relatedness. Word-pairs are selected to enable the evaluation of distributional semantic models by multiple attributes of words and word-pair relations such as frequency, morphology, concreteness and relation types (e.g., synonymy, antonymy). Our aim is to provide insights to semantic model researchers by evaluating models in multiple attributes. We balance dataset word-pairs by their frequencies to evaluate the robustness of semantic models concerning out-of-vocabulary and rare words problems, which are caused by the rich derivational and inflectional morphology of the Turkish language.

【论文摘要】本文提出了一个土耳其语语义模型评价数据集AnlamVer，这个数据集可用于评价词语相似性和词语关联性任务。我们的数据集由500个单词对组成，由12个人类受试者注释，每对单词有相似性和相关性两个不同的分数。选择词对，以便通过词的多个属性和词对关系，例如频率、形态、具体性和关系类型（如同义词、反义词）对分布式语义模型进行评估。我们的目的是通过评估多个属性中的模型，为语义模型研究者提供见解。我们根据它们的频率来平衡数据集词对，以评估由土耳其语丰富的派生和屈折形态引起的词汇外问题和稀有词问题的语义模型的鲁棒性。

Best survey paper

《A Survey on Open Information Extraction》

Christina Niklaus, Matthias Cetto, André Freitas and Siegfried Handschuh

【Abstract】We provide a detailed overview of the various approaches that were proposed to date to solve the task of Open Information Extraction. We present the major challenges that such systems face, show the evolution of the suggested approaches over time and depict the specific issues they address. In addition, we provide a critique of the commonly applied evaluation procedures for assessing the performance of Open IE systems and highlight some directions for future work.

【论文摘要】我们提供了迄今为止为解决开放信息提取任务而提出的各种方法的详细概述。我们介绍了这些系统面临的主要挑战，展示了随时间的演变而出现的建议的方法，并描述它们所处理的具体问题。此外，我们还对用于评估开放式IE系统性能的常用评估程序进行了批评，并强调了今后工作的一些方向。

Most reproducible

《Design Challenges and Misconceptions in Neural Sequence Labeling》

Jie Yang, Shuailong Liang and Yue Zhang

【Abstract】We investigate the design challenges of constructing effective and efficient neural sequence labeling systems, by reproducing twelve neural sequence labeling models, which include most of the state-of-the-art structures, and conduct a systematic model comparison on three benchmarks (i.e. NER, Chunking, and POS tagging). Misconceptions and inconsistent conclusions in existing literature are examined and clarified under statistical experiments. In the comparison and analysis process, we reach several practical conclusions which can be useful to practitioners.

【论文摘要】我们通过重现12个包含大多数最新结构的神经序列标记模型，并针对三个基准（即NER、Chunking和POS标记）进行系统地模型比较，来研究构建有效和高效的神经序列标记系统的设计挑战。我们通过统计实验检验并澄清了现有文献中的误解和不一致的结论。在比较分析的过程中，我们得出了一些可供实践者参考的实用结论。