NAACL HLT 2016
TechTalks from event: NAACL HLT 2016
Conference talks are in QA stage, please report any inconsistency by emailing us at email@example.com
"How can NLP help cure cancer?"Cancer inflicts a heavy toll on our society. One out of seven women will be diagnosed with breast cancer during their lifetime, a fraction of them contributing to about 450,000 deaths annually worldwide. Despite billions of dollars invested in cancer research, our understanding of the disease, treatment, and prevention is still limited. Majority of cancer research today takes place in biology and medicine. Computer science plays a minor supporting role in this process if at all. In this talk, I hope to convince you that NLP as a field has a chance to play a significant role in this battle. Indeed, free-form text remains the primary means by which physicians record their observations and clinical findings. Unfortunately, this rich source of textual information is severely underutilized by predictive models in oncology. Current models rely primarily only on structured data. In the first part of my talk, I will describe a number of tasks where NLP-based models can make a difference in clinical practice. For example, these include improving models of disease progression, preventing over-treatment, and narrowing down to the cure. This part of the talk draws on active collaborations with oncologists from Massachusetts General Hospital (MGH). In the second part of the talk, I will push beyond standard tools, introducing new functionalities and avoiding annotation-hungry training paradigms ill-suited for clinical practice. In particular, I will focus on interpretable neural models that provide rationales underlying their predictions, and semi-supervised methods for information extraction.
How Will Deep Learning Change Computational Linguistics?What are the big problems in NLP historically, now, and in the future? (What do we need to solve, regardless of the approach for solving it?) What current NLP problems has DL solved, or where has DL made an important contribution towards improving the state of the art? Does DL guide NLP towards new problems? Do we already have examples? Do you want to speculate? (Have a new hammer, looking for un-hammered nails.) Does DL change our methodology profoundly, or is it just another machine learning method? Is there a greater danger of overfitting because of the massive tuning required? Given the computational requirements, are off-the-shelf tools incorporating DL practical? Is the use of off-the-shelf word embeddings the major contribution of DL? Does every task in which in the past we had bag of words features now required to also use word embedding features? Is linguistics obsolete because DL will find better representations on its own? Or should DL be combined with traditional representations of latent linguistic structure? What is the best way to do that – hybrid architectures, hybrid training objectives, hand-designed input representations, or something else? Is DL mostly good for supervised mapping of input to output where very large training sets are available? Or can it also help for semi-supervised learning and unsupervised structure discovery? What are the best approaches to interpretability (explaining why a DL system made a particular decision)? What are the best approaches to understanding the latent representations and figuring out what the system is missing and how to fix that? How much do architectures and parameters need to be task-specific? How much can researchers reuse architectures, and learning algorithms reuse parameters, across tasks? A DL design that looks nice on paper often doesn't work right away. What are best practices for achieving good performance? Do experienced researchers not have this problem because they know more tricks of the trade and have better intuitions about hyperparameters? Or does every paper involve 6 months of fiddling around on a dev set until it works? Is it worth doing automatic tuning of hyperparameters, e.g., Bayesian optimization?
Evaluating Natural Language Generation SystemsNatural Language Generation (NLG) systems have different characteristics than other NLP systems, which effects how they are evaluated. In particular, it can be difficult to meaningfully evaluate NLG texts by comparing them against gold-standard reference texts, because (A) there are usually many possible texts which are acceptable to users and (B) some NLG systems produce texts which are better (as judged by human users) than human-written corpus texts. Partially because of these reasons, the NLG community places much more emphasis on human-based evaluations than most areas of NLP. I will discuss the various ways in which NLG systems are evaluated, focusing on human-based evaluations. These typically either measure the success of generated texts at achieving a goal (eg, measuring how many people change their behaviour after reading behaviour-change texts produced by an NLG system); or ask human subjects to rate various aspects of generated texts (such as readability, accuracy, and appropriateness), often on Likert scales. I will use examples from evaluations I have carried out, and highlight some of the lessons I have learnt, including the importance of reporting negative results, the difference between laboratory and real-world evaluations, and the need to look at worse-case as well as average-case performance. I hope my talk will be interesting and relevant to anyone who is interested in the evaluation of NLP systems.