Microsoft's Research in Software Engineering (RiSE) group mission is to advance the state of the art in Software Engineering and to bring those advances to Microsoft's businesses.
It is well known that a great proportion of the time devoted to data mining and, especially, data science projects is devoted to data acquisition, integration, transformation, cleansing and other highly tedious tasks. These tasks are tedious basically because they are repetitive and, hence, automatable. As a consequence, progress in the automation of this process can lead to a dramatic reduction of the cost and duration of data-oriented projects. Recently, inductive programming in general (and the learning of declarative rules and programs from a few user interaction examples in particular) has shown a large potential for this automation. The release of FlashFill as a plug-in inductive programming tool for Microsoft Excel and ConvertFrom-String as a Powershell command on Windows 10 are impressive demonstrations that inductive programming research has matured in such a way that commercial applications become feasible.
The aim of this workshop is to gather practitioners and researchers around the use of inductive programming techniques, programming by example and other learning techniques to automate the data wrangling process. It is well known that a great proportion of the time devoted to data mining and, especially, data science projects is devoted to data acquisition, integration, transformation, cleansing and other highly tedious tasks. These tasks are tedious basically because they are repetitive and, hence, automatable. As a consequence, progress in the automation of this process can lead to a dramatic reduction of the cost and duration of data-oriented projects.
We welcome regular papers, demo papers about benchmarks or tools, and position papers, and encourage discussions over a broad list of topics (not exhaustive):
Full Paper Submissions | August 12, 2016 |
Full Paper Notification | September 13, 2016 |
Camera-ready for accepted papers | September 20, 2016 |
Workshop date | December 12, 2016 |
Paper submissions should be limited to a maximum of eight (8) pages, in the IEEE 2-column format, including the bibliography and any possible appendices. Submissions longer than 8 pages will be rejected without a review. All papers must be formatted according to the IEEE Computer Society proceedings manuscript style, following IEEE ICDM 2016 submission guidelines.
All submissions will be triple-blind reviewed by the Program Committee on the basis of technical quality, relevance to data mining, originality, significance, and clarity. Author names and affiliations must not appear in the submissions, and bibliographic references must be adjusted to preserve author anonymity. Authors of accepted papers will be asked to prepare a presentation (short or long) during the workshop. Accepted papers will be published in the IEEE ICDM 2016 Workshops Proceedings volume by IEEE Computer Society Press, and will also be included in the IEEE Xplore Digital Library. After the workshop, contributing authors will be invited to submit a paper to a special issue (journal to be announced).
Manuscripts must be submitted electronically through the IEEE ICDM CyberChair system . We do not accept email submissions.
Luc De Raedt | Katholieke Universiteit Leuven, Belgium |
Peter Flach | University of Bristol, United Kingdom |
José Hernández-Orallo | Technical University of Valencia, Spain |
Bongshin Lee | Microsoft Research, Redmond, USA |
Ute Schmid | Otto-Friedrich-Universität Bamberg, Germany |
Mary Roth | IBM Research, San Jose, CA, USA |
Armando Solar-Lezama | Massachusetts Institute of Technology, USA |
Rishabh Singh | Microsoft Research, Redmond, USA |
Gemma C. Garriga | Allianz SE, Munich, Germany |
Janis Voigtländer | University of Bonn, Germany |
Ricardo Aler Mur | Universidad Carlos III de Madrid, Spain |
Umair Z. Ahmed | Indian Institute of Technology, Kanpur, India |
ML services are quickly becoming a commodity, and they will be taken for granted by developers and computer users alike in the near future. The building blocks for ML as an ubiquitous service are already in place, almost always in the form of remote APIs that provide a first level of abstraction over ML problem-solving and, specially, obviate scalability and resource allocation issues. But that's not enough: those building blocks still leak implementation details inessential to the application developer that needs to provide domain-specific solutions. We need to ascend a couple of rungs in the abstraction ladder and provide domain-specific languages to describe ML solutions without nitty-gritty details unrelated to the problem at hand, offering non-experts the possibility of automating their ML solutions. In this talk, we'll discuss our experience designing and developing BigML's data wrangling and ML workflow DSLs, Flatline and WhizzML, and how they generalize to similar ML services and APIs.
Dr. Charles Parker is the Vice President of Machine Learning algorithms at BigML. He holds a Ph.D. in computer science from Oregon State University. He was previously a research associate at the Eastman Kodak Company where he applied machine learning to image, audio, video, and document analysis. He also worked as a research analyst for Allston Holdings, a proprietary stock trading company, developing statistically-based trading strategies for U.S. and European futures markets. His current work for BigML is in the areas of Deep Learning and Bayesian Parameter Optimization.
9:00 - 10:30 | Maritime Domain Data Mining Session |
10:30 - 11:00 | Coffee Break |
11:00 - 12:00 | Whizzml: Designing and developing BigML's data wrangling, Charles Parker |
12:00 - 12:20 | Using Machine Learning to accelerate Data Wrangling, Shilpi Ahuja, Mary Roth, Rashmi Gangadharaiah, Peter Schwarz, and Rafael Zujur |
12:20 - 12:40 | Toward Representation Independent Analytics Over Structured Data, Jose Picado, Yodsawalai Chodpathumwan, Arash Termehchy, Alan Fern, and Yizhou Sun | 12:40 - 13:00 | Mining the Dark Web: drugs and fake ids, Andres Baravalle, Mauro Sanchez Lopez, and Sin Wee Lee | 13:00 - 14:30 | Lunch Break |