January: Progress Update

nchau4
Feb 3, 2022
1 min read

Another important skill that I have learnt through my journey to become a data scientist is clustering. Cluster analysis is the grouping of objects based on their characteristics such that there is high intra-cluster (similarity within cluster) and low inter-cluster (similarity between cluster groups).

The classification into clusters is done using criteria such as smallest distances, density of data points, graphs, or various statistical distributions. Cluster analysis has wide applicability, including in unsupervised machine learning (type of machine learning that searches for patterns in a data set with no pre-existing labels and a minimum of human intervention), data mining, statistics, graph analytics, image processing, and numerous physical and social science applications.

Data scientists and others use clustering to gain important insights from data by observing what groups the data points fall into when they apply a clustering algorithm to the data. Clustering can also be used for anomaly detection to find data points that are not part of any cluster. Furthermore, it is used to identify groups of similar objects in datasets with two or more variable quantities.

January: Progress Update

Recent Posts

Comments