TechTalks from event: KDD 2010
KDD Keynote Talks
Data Mining in the Online Services IndustryThe online services industry is a rapidly growing industry with a worldwide online ad market projected to grow from $48 billion in 2011 to $67 billion in 2013, of which 47% will come from display advertising and 53% from search advertising. Online Services Division (OSD) within Microsoft is a leader in the consumer cloud space today with a strong portfolio of a set of 3 mutually reinforcing businesses: Search, Portal, Advertising. They are supported by a shared foundational asset of Intent & Knowledge Stores and a shared technology platform supporting large scale data and high performance systems. MSN (Portal) and Bing (Search) generate the content, traffic and data, that make for an exciting fertile environment for large scale data mining practice and system development. Our advertisers are thus given more valuable targeting opportunities and better ROI, which in turn, provide better economics, usability data, and allows for a higher quality services for our advertisers and experience for our users. The ability to transform data into meaningful, actionable insight is an important source of competitive advantage for OSD. The data mining initiatives within the division continue to strive for excellence around the following goals: actionable insights through deep data analysis, data mining and data modeling at scale and with speed, increased productivity from deployed large scale data systems and tools, improved product and service development and decision making gained from effective measurement and experimentation, and a mature data culture in product teams that made the above possible. With many technical and data challenges ahead of us, we are committed to utilizing our huge data asset well to understand the need, intent, and behavior of our users for the purpose of serving them better.
Computational Social ScienceResearch and applications in knowledge discovery and data mining increasingly address some of the most fundamental questions of social science: What determines the structure and behavior of social networks? What influences consumer and voter preferences? How does participation in social systems affect behaviors such as fraud, technology adoption, or resource allocation? Often for the first time, these questions are being examined by analyzing massive data sets that record the behavior and interactions of individuals in physical and virtual worlds. A new kind of scientific endeavor - computational social science - is emerging at the intersection of social science and computer science. The field draws from a rich base of existing theory from psychology, sociology, economics, and other social sciences, as well as from the formal languages and algorithms of computer science. The result is an unprecedented opportunity to revolutionize the social sciences, expand the reach and impact of computer science, and enable decision-makers to understand the complex systems and social interactions that we must manage in order to address fundamental challenges of economic welfare, energy production, sustainability, health care, education, and crime. Computational social science suggests an impressive array of new tasks and technical challenges to researchers and practitioners of KDD. These include modeling complex systems with temporal, spatial, and relational dependence; identifying cause and effect rather than mere association; modeling systems with feedback; and conducting analyses in ways that protect the privacy of individuals. Many of these challenges interact in fundamental ways that are both surprising and encouraging. Together, they point to an exciting new future for knowledge discovery and data mining.
The quantification of advertising and lessons from building a business based on large scale data miningAs electronic communication, media and commerce increasingly permeate every aspect of modern life, real-time personalization of consumer experience through data-mining becomes practical. Effective classification, prediction and change modeling of consumer interests, behaviors and purchasing habits using machine learning and statistical methods drives efficiency, insights and consumer relevance that were never before possible. The internet has brought on a rapid evolution in advertising. Everything about behavior on the internet can be quantified and responses to behavior can occur in real time. This dynamic interaction with the user has created opportunities to better understand the way in which individuals move from awareness of a product to considering a purchase, through to intent and ultimately a sale for the marketer. When a marketer can answer the question â€ždid those TV ads cause consumers to switch shampoo brands?? they can model behavior change and adjust marketing strategies accordingly. Underpinning this shift in how the world?s trillion dollar marketing budget is spent is transactional data on an unprecedented scale, creating new challenges for software that must interpret this stream and make real time decisions tens, even hundreds of thousands of times every second. I will explore advances in modeling media consumption, advertising response and the real-time evaluation of media opportunities through reference to Quantcast, a business launched in September 2006 which today interprets in excess of 10 billion new digital media consumption records every day. We will examine the challenges of applying machine learning to non-search advertising and in doing so explore the creation of business environments â€“ organization, infrastructure, tools, processes (and costs considerations) â€“ in which scientists can quickly develop new petabyte scale algorithmic approaches, migrate them rapidly to real-time production and deliver fully customized experiences for marketers, publishers and consumers alike.