Open Access Pub publishes peer-reviewed, free-to-read open-access articles. Showing
articles matching Data mining — open any to read the full text,
or download the PDF or XML.
Jun 2023 DOI 10.14302/issn.2692-1537.ijcv-23-4586
The goal is to do a text mining analysis of all scientific publications and find out what journal and what aspects are studying about the conspiracy theories of Covid-19. For this purpose, all publications available in the National Center for Biotechnology Information (NCBI) database were consulted as they were peer-reviewed papers. Of all these papers, only the abstracts of each one were studied using artificial intelligence techniques to determine, for example, whether the subject is of importance depending on the journals where it has been published, and above all, what possible relationships could be extracted from the information published in them. In addition, the "Net Prevalence per Covid19" index was definedin those countries with a high value, greater campaigns should be sponsored to avoid the misinformation generated by Covid-19, although this comment should be verified in future publications. The main challenge was to unify the abstracts and for this purpose, a text summarizer was used under artificial intelligence schemes. The results obtained indicate the tendency of certain topics by the frequency of the words obtained where the focus on the conspiration are the Covid-19 vaccines, but further work is still needed to continue working on this methodology to unify the results.
Mar 2023 DOI 10.14302/issn.2768-0207.jbr-23-4478
Spatial data mining (SDM) is searching important relationships and characteristics that can clearly exist in spatial databases. This content aims to compare object clustering algorithms for spatial data mining, before identifying the most efficient algorithm. To this end, this paper compare k-means, Partionning Around Medoids (PAM) and Clustering Large Applications based on RANdomized Search (CLARANS) algorithms based on computing time. Experimental results indicate that, CLARANS is very efficient and effective.
Mar 2021 DOI 10.14302/issn.2768-0207.jbr-21-3455
In recent times, the urge to collect data and analyze it has grown. Time stamping a data set is an important part of the analysis and data mining as it can give information that is more useful. Different mining techniques have been designed for mining time-series data, sequential patterns for example seeks relationships between occurrences of sequential events and finds if there exist any specific order of the occurrences. Many Algorithms has been proposed to study this data type based on the apriori approach. In this paper we compare two basic sequential algorithms which are General Sequential algorithm (GSP) and Sequential PAttern Discovery using Equivalence classes (SPADE). These two algorithms are based on the Apriori algorithms. Experimental results have shown that SPADE consumes less time than GSP algorithm.
Jul 2020 DOI 10.14302/issn.2641-5526.jmid-20-3424
In the last decade, the amount of collected data, in various computer science applications, has grown considerably. These large volumes of data need to be analysed in order to extract useful hidden knowledge. This work focuses on association rule extraction. This technique is one of the most popular in data mining. Nevertheless, the number of extracted association rules is often very high, and many of them are redundant. In this paper, we propose an algorithm, for mining closed itemsets, with the construction of an it-tree. This algorithm is compared with the DCI (direct counting & intersect) algorithm based on min support and computing time. CHARM is not memery-efficient. It needs to store all closed itemsets in the memory. The lower min-sup is, the more frequent closed itemsets there are so that the amounts of memory used by CHARM are increasing.
Apr 2020 DOI 10.14302/issn.2641-5526.jmid-20-3302
Data Mining is a process of exploring against large data to find patterns in decision-making. One of the techniques in decision-making is classification. Data classification is a form of data analysis used to extract models describing important data classes. There are many classification algorithms. Each classifier encompasses some algorithms in order to classify object into predefined classes. Decision Tree is one such important technique, which builds a tree structure by incrementally breaking down the datasets in smaller subsets. Decision Trees can be implemented by using popular algorithms such as ID3, C4.5 and CART etc. The present study considers ID3 and C4.5 algorithms to build a decision tree by using the “entropy” and “information gain” measures that are the basics components behind the construction of a classifier model