Small Data Can Play a Big Role in AIFebruary 17, 2020
For every big data set (with one billion columns and rows) fueling an AI or advanced analytics initiative, a typical large organization may have a thousand small data sets that go unused. Examples abound: marketing surveys of new customer segments, meeting minutes, spreadsheets with less than 1,000 columns and rows. As small-data techniques advance, their increased efficiency, accuracy, and transparency will increasingly be put to work across industries and business functions. Think drug discovery, industrial image retrieval, the design of new consumer products, and the detection of defective factory machine parts, and much more. But competitive advantage will come not from automation, but from the human factor. For example, as AI plays an increasingly bigger role in employee skills training, its ability to learn from smaller datasets will enable expert employees to embed their expertise in the training systems, continually improving them and efficiently transferring their skills to other workers. People who are not data scientists could be transformed into AI trainers, enabling companies to apply and scale the vast reserves of untapped expertise unique to their organizations.
More than three quarters of large companies today have a “data-hungry” AI initiative under way — projects involving neural networks or deep-learning systems trained on huge repositories of data. Yet, many of the most valuable data sets in organizations are quite small: Think kilobytes or megabytes rather than exabytes. Because this data lacks the volume and velocity of big data, it’s often overlooked, languishing in PCs and functional databases and unconnected to enterprise-wide IT innovation initiatives.
But as a recent experiment we conducted with medical coders demonstrates, emerging AI tools and techniques, coupled with careful attention to human factors, are opening new possibilities to train AI with small data and transform processes.
For every big data set (with one billion columns and rows) fueling an AI or advanced analytics initiative, a typical large organization may have a thousand small data sets that go unused. Examples abound: marketing surveys of new customer segments, meeting minutes, spreadsheets with less than 1,000 columns and rows. In our experiment, it was annotations added to medical charts by a team of medical coders — just tens of annotations on each of several thousands of charts.
Medical coders analyze individual patient charts and translate complex information about diagnoses, treatments, medications, and more into alphanumeric codes. These codes are submitted to billing systems and health insurers for payment and reimbursement and play a critical role in patient care.
Coders in our experiment, all of whom were registered nurses, were already accustomed to drawing on an AI system for assistance. The AI scanned charts and identified links…