TechTalks from event: NAACL 2015
3A: Generation and Summarization
How to Make a Frenemy: Multitape FSTs for Portmanteau GenerationA portmanteau is a type of compound word that fuses the sounds and meanings of two component words; for example, frenemy (friend + enemy) or smog (smoke + fog). We develop a system, including a novel multitape FST, that takes an input of two words and outputs possible portmanteaux. Our system is trained on a list of known portmanteaux and their component words, and achieves 45% exact matches in cross-validated experiments.
Aligning Sentences from Standard Wikipedia to Simple WikipediaThis work improves monolingual sentence alignment for text simplification, specifically for text in standard and simple Wikipedia. We introduce a method that improves over past efforts by using a greedy (vs. ordered) search over the document and a word-level se- mantic similarity score based on Wiktionary (vs. WordNet) that also accounts for structural similarity through syntactic dependencies. Experiments show improved performance on a hand-aligned set, with the largest gain coming from structural similarity. Resulting datasets of manually and automatically aligned sentence pairs are made available.
Inducing Lexical Style Properties for Paraphrase and Genre DifferentiationWe present an intuitive and effective method for inducing style scores on words and phrases. We exploit signal in a phrases rate of occurrence across stylistically contrasting corpora, making our method simple to implement and efficient to scale. We show strong results both intrinsically, by correlation with human judgements, and extrinsically, in applications to genre analysis and paraphrasing.