Use Apriori, Genetic Algorithm and Fuzzy Logic to Foretell the Most Common Amino Acid Sequence

Title
Authors
Abstract
Keywords
PDF
Conclusion
Reference
Footnotes

Title

Authors

1. Krupali Patel, S P University, Student, India
2. Pravinbhai Patel, S P University, Postdoctoral Researcher, India

Abstract

Data mining is the practise of discovering connections between seemingly unrelated pieces of biological information. Rapid progress in genomics and proteomics in recent years has resulted in an abundance of biological data. Thus, categorising biological sequences and structures according to essential properties and functions is a pressing issue in the field of biological data processing. Many methods have been used to generate recurrent patterns from published works for use in a wide range of contexts. The frequency with which this algorithm was produced has diminished. Because of this, it's completely pointless. In this case, I want to use two different methods to compare the common pattern and optimise the data. Hence, we find it to be of great value. The contaminated protein sequence is the root cause of several human illnesses, and our method is designed to extract the amino acids that are both hidden and most dominant in the sequence. We deal with this issue by employing a combination of the apriori algorithm, the genetic algorithm, and strong association rules for pattern prediction. Apply fuzzy logic to the optimisation of data and the identification of intriguing common patterns in the protein sequence database. This Recurring Pattern is quite helpful in the Pharmaceutical Industry.

Keywords

Genetic Algorithms Protein structure analysis Association methods Fuzzy Systems for mining biological data

PDF

Conclusion

The process of doing research on the existing system in order to ascertain whether or not it is essential for the system to be reengineered is known as a literature survey. The current state of affairs makes it such that the use of an apriori algorithm and an association rule is the only method to foresee a popular item. Ninety percent is the minimum required to consider this prevalent pattern to be an authentic one.

The partition method, the apriori algorithm, the genetic algorithm, and fuzzy logic are the four different algorithms that I will use in my plan to predict the common pattern of amino acids. In order to produce the required knowledge in the form of common patterns, the fuzzy logic module takes input from both the genetic algorithm module and the apriori algorithm module. Additionally, it is essential to make the most of the regimen that is followed. It is to everyone's advantage for companies in the pharmaceutical sector to use this Common Pattern. The major objective is to make an educated guess as to which combinations of amino acids would prove to be the most beneficial when it comes to the development of treatments for ill health.

It is possible that in the future, this work will be developed to employ the Eclat algorithm rather than the apriori method in order to improve the accuracy with which it forecasts the common pattern. In addition, the protein sequences of some diseases, such as HIV/AIDS, influenza, dengue fever, viral fever, swine flu, and others, are compared in order to find similarities among them. If medicines treating viral infections adhere to this more constant pattern, they will be more successful in curing the conditions they are intended to treat.

Reference

1. Jiawei Han, Hong Cheng, Dong Xin and Xifeng Yan, “Frequent pattern mining: current ststus and future directions”, Data Mining Knowledge Discovery(2007) 15:55 -86.
2. Lakshmi Priya. G., Shanmugasundaram Hariharan “A Study On Predicting Patterns Over The Protein Sequence Datasets Using Association Rule MINING”, Journal Of Engineering Science And Technology Vol. 7, No. 5. (2012) 563 – 573
3. Davnah Urbach And Jason H Moore, “Data Mining And Thev Evolution Of Biological Complexity”, Biodata Mining 2011, 4:7.

Author Contribution

Krupali Patel participated in the conception and execution of the study, the analysis and interpretation of the data, and the drafting of the paper. Pravinbhai managed the work on this project.

Funding

No funding was provided to the author(s) of this article during its research, writing, or publishing.

Software Information

I have used Rapid miner.

Conflict of Interest

Each author confirms that they have no competing interests.

Acknowledge

I owe a great debt of appreciation to Pravinbhai, my primary supervisor, who gave me invaluable direction throughout this endeavour. I'd also want to say thanks to the friends and family members that helped me during this process and provided invaluable feedback and insights.

Data availability

Data sharing is not relevant to this topic since no datasets were created or analyzed over the course of this investigation.