Spatial, textual and visual data are ubiquitous and are massively generated by cameras, satellites, sensors and humans. Clustering large and semi-structured data is a very popular pattern recognition problem in the Big Data era, that aims to support data management and grouping into topics. Our research focus is to identify groups of textual and/or visual content that pose a high semantic relevance within each group and low similarity between different groups of documents, i.e. news articles, short texts, social images, satellite images, etc. The estimation of the number of topics to detect is a computationally complex problem that requires effective parallel programming methods and efficient computational resources, such as cloud and High Performance Computing environments.