NAACL 2015
TechTalks from event: NAACL 2015
3C: Machine Learning for NLP
-
When and why are log-linear models self-normalizing?Several techniques have recently been pro- posed for training self-normalized discriminative models. These attempt to find parameter settings for which unnormalized model scores approximate the true label probability. However, the theoretical properties of such techniques (and of self-normalization generally) have not been investigated. This paper examines the conditions under which we can expect self-normalization to work. We characterize a general class of distributions that admit self-normalization, and prove generalization bounds for procedures that minimize empirical normalizer variance. Motivated by these results, we describe a novel variant of an established procedure for training self-normalized models. The new procedure avoids computing normalizers for most training examples, and decreases training time by as much as factor of ten while preserving model quality.
-
Deep Multilingual Correlation for Improved Word EmbeddingsWord embeddings have been found useful for many NLP tasks, including part-of-speech tagging, named entity recognition, and parsing. Adding multilingual context when learning embeddings can improve their quality, for example via canonical correlation analysis (CCA) on embeddings fromtwo languages. In this paper, we extend this idea to learn deep non-linear transformations of word embeddings of the two languages, using the recently proposed deep canonical correlation analysis. The resulting embeddings, when evaluated on multiple word and bigram similarity tasks, consistently improve over monolingual embeddings and over embeddings transformed with linear CCA.
-
Disfluency Detection with a Semi-Markov Model and Prosodic FeaturesWe present a discriminative model for detecting disfluencies in spoken language transcripts. Structurally, our model is a semi-Markov conditional random field with features targeting characteristics unique to speech repairs. This gives a significant performance improvement over standard chain-structured CRFs that have been employed in past work. We then incorporate prosodic features over silences and relative word duration into our semi-CRF model, resulting in further performance gains; moreover, these features are not easily replaced by discrete prosodic indicators such as ToBI breaks. Our final system, the semi-CRF with prosodic information, achieves an F-score of 85.4, which is 1.3 F1 better than the best prior reported F-score on this dataset.
- All Sessions
- Best Paper Plenary Session
- Invited Talks
- Tutorials
- 1A: Semantics
- 1B: Tagging, Chunking, Syntax and Parsing
- 1C: Information Retrieval, Text Categorization, Topic Modeling
- 2A: Generation and Summarization
- 2B: Language and Vision (Long Papers)
- 2C: NLP for Web, Social Media and Social Sciences
- 3A: Generation and Summarization
- 3B: Information Extraction and Question Answering
- 3C: Machine Learning for NLP
- 4A: Dialogue and Spoken Language Processing
- 4B: Machine Learning for NLP
- 4C: Phonology, Morphology and Word Segmentation
- 5A: Semantics
- 5B: Machine Translation
- 5C: Morphology, Syntax, Multilinguality, and Applications
- 6A: Generation and Summarization
- 6B: Discourse and Coreference
- 6C: Information Extraction and Question Answering
- 7A: Semantics
- 7B: Information Extraction and Question Answering
- 7C: Machine Translation
- 8A: NLP for Web, Social Media and Social Sciences
- 8B: Language and Vision
- 9A: Lexical Semantics and Sentiment Analysis
- 9B: NLP-enabled Technology
- 9C: Linguistic and Psycholinguistic Aspects of CL
- 8C: Machine Translation
- Opening remarks