NAACL 2015
TechTalks from event: NAACL 2015
2C: NLP for Web, Social Media and Social Sciences
-
TopicCheck: Interactive Alignment for Assessing Topic Model StabilityContent analysis, a widely-applied social science research method, is increasingly being supplemented by topic modeling. However, while the discourse on content analysis centers heavily on reproducibility, computer scientists often focus more on scalability and less on coding reliability, leading to growing skepticism on the usefulness of topic models for automated content analysis. In response, we introduce TopicCheck, an interactive tool for assessing topic model stability. Our contributions are threefold. First, from established guidelines on reproducible content analysis, we distill a set of design requirements on how to computationally assess the stability of an automated coding process. Second, we devise an interactive alignment algorithm for matching latent topics from multiple models, and enable sensitivity evaluation across a large number of models. Finally, we demonstrate that our tool enables social scientists to gain novel insights into three active research questions.
-
Inferring latent attributes of Twitter users with label regularizationInferring latent attributes of online users has many applications in public health, politics, and marketing. Most existing approaches rely on supervised learning algorithms, which require manual data annotation and therefore are costly to develop and adapt over time. In this paper, we propose a lightly supervised approach based on label regularization to infer the age, ethnicity, and political orientation of Twitter users. Our approach learns from a heterogeneous collection of soft constraints derived from Census demographics, trends in baby names, and Twitter accounts that are emblematic of class labels. To counteract the imprecision of such constraints, we compare several constraint selection algorithms that optimize classification accuracy on a tuning set. We find that using no user-annotated data, our approach is within 2% of a fully supervised baseline for three of four tasks. Using a small set of labeled data for tuning further improves accuracy on all tasks.
-
A Neural Network Approach to Context-Sensitive Generation of Conversational ResponsesWe present a novel response generation system that can be trained end to end on large quantities of unstructured Twitter conversations. A neural network architecture is used to address sparsity issues that arise when integrating contextual information into classic statistical models, allowing the system to take into account previous dialog utterances. Our dynamic-context generative models show consistent gains over both context-sensitive and non-context-sensitive Machine Translation and Information Retrieval baselines.
- All Sessions
- Best Paper Plenary Session
- Invited Talks
- Tutorials
- 1A: Semantics
- 1B: Tagging, Chunking, Syntax and Parsing
- 1C: Information Retrieval, Text Categorization, Topic Modeling
- 2A: Generation and Summarization
- 2B: Language and Vision (Long Papers)
- 2C: NLP for Web, Social Media and Social Sciences
- 3A: Generation and Summarization
- 3B: Information Extraction and Question Answering
- 3C: Machine Learning for NLP
- 4A: Dialogue and Spoken Language Processing
- 4B: Machine Learning for NLP
- 4C: Phonology, Morphology and Word Segmentation
- 5A: Semantics
- 5B: Machine Translation
- 5C: Morphology, Syntax, Multilinguality, and Applications
- 6A: Generation and Summarization
- 6B: Discourse and Coreference
- 6C: Information Extraction and Question Answering
- 7A: Semantics
- 7B: Information Extraction and Question Answering
- 7C: Machine Translation
- 8A: NLP for Web, Social Media and Social Sciences
- 8B: Language and Vision
- 9A: Lexical Semantics and Sentiment Analysis
- 9B: NLP-enabled Technology
- 9C: Linguistic and Psycholinguistic Aspects of CL
- 8C: Machine Translation
- Opening remarks