From the course: Cognitive Technologies: The Real Opportunities for Business

Unsupervised learning

- In this lecture, we're going to cover unsupervised learning, and Eric Nyberg is back to help. Unsupervised learning discovers patterns in data, even though no explicit feedback or labeled examples are provided as they are in supervised learning. A common unsupervised learning task is clustering, given a large collection things, discover a way of grouping items into subsets that share important similarities. When someone says, you know, there are two types of people, they've just performed clustering using unsupervised learning to identify the two types. Let's look at examples of unsupervised learning used in customer segmentation. Say my online retail business has five million customers. I want to design promotions that will appeal to them, but I can't design five million different promotions. My task is to find the four main types of customers I have according to their buying preferences so that I can develop four different promotions. This works by defining a way of measuring the distance or difference between items in your data set or between customers in my example. An algorithm defines clusters where the items in each cluster are closer to each other than any other items. The question for you, Eric, what are some of the learning issues involved in unsupervised learning. - Well, I think you have the same challenges that we discussed earlier in feature engineering because of course in order to place the items into the clusters, the algorithm has to decide which features of those elements should be used in order to do the clustering. And depending on what features are available, it may be hard to distinguish between two items and to decide sort of which cluster they belong in. - Got it, now typical applications of clustering using unsupervised learning include: customer segmentation, which we talked about; social network analysis, recommending new friends on Facebook after discovering what groups of people you seem to belong to; defining product baskets, discovering products likely to be purchased together, we can place them near each other in a store or recommend them to buyers; topic analysis or concept discovery, using clustering to analyze new stories and cluster them into the most common topics; or anomaly detection, you can use unsupervised learning to discover the most common patterns of sensor data produced by a manufacturing process for instance, and then look for outlying data which might suggest something is going wrong in your process. Now, is just clustering data without input from the user useful enough? - Well David, that's a great question. I think that clustering is often the sort of first level analysis that we do in sort of unsupervised processing of data objects. And I think that sometimes if you give a little more input to the program like telling it how many clusters you're interested in or maybe giving it some hints about what features you might want to use to calculate the distance between objects, you can actually make the process of clustering go a little more quickly than the completely unsupervised case. It may make sense to do some semi-supervised learning where you have a human intervene and start to maybe improve the clustering algorithm by giving it some feedback on the number of clusters or on the features that are used for clustering. - So that's a great point to bring up. So you raised the idea of semi-supervised learning, which is unsupervised learning with some human guidance to improve its performance. - Exactly, great. - So let's wrap this one up. Unsupervised learning discovers patterns in data. A common unsupervised learning task is clustering. Applications of clustering include: customer segmentation, anomaly detection, social network analysis, anything involving the discovery of patterns and large amounts of data.

Contents