Pan-Genome Analysis of Bacillus licheniformis to Find Patterns in Industrially Important Genes
Abstract
Bacillus licheniformis is found in the soil and feathers of terrestrial birds. In an aerobic environment B. licheniformis makes bacteriocins and under anaerobic conditions produces lichenin. To produce commercially important heterologous proteins, Bacillus has been identified as a safe genus including many strains from B. licheniformis. Pan-genome analysis of the bacteria from the perspective of targeted industrial important genes/proteins was investigated in this study. In the present study, B. licheniformis genomic data of 177 strains and their industrially significant protein sequences were downloaded from Refseq and UniProtKB databases respectively. Bioinformatics analysis (i.e. standalone BLAST) was performed with the help of different Python scripts and tools on all strains to find patterns (genes presence/absence frequency polymorphism) for the targeted industrially important genes. A heat map was constructed to show how the targeted genes were related to each other, and how the strains were related to each other based on trends in the presence and absence of genes. The current pan-genome analysis on 39 B. licheniformis genomes predicted that bacteria have an open pan-genome consisting of 165,775 gene count, and a core genome gene count consisting of 124,882-120,026 genes. The pan-genome cluster count was comprised of 7233-8151 genes and a core genome cluster count was comprised of 3012- 3075 genes. According to clusters of orthologous genes (COG) categories. Metabolism and non-metabolism genes were separated, the biggest part of the core genome contains genes with metabolic functions (45.29%), while (11%) genes were involved in non-metabolic function and found in the housekeeping process. This core genome showed strong conservation in metabolic genes. We also found that the ribosomal proteins in the bacterial genome fall in the COG category (J), encodes for Translation, ribosomal structure & biogenesis. To investigate the evolutionary relatedness of the bacterial genome a phylogenetic tree shows 14 separate groups. Groups A and B, and groups D and E were closely related to each other, while groups D and E were distantly related to groups A and B. According to the heat map, groups A and B contain a maximum number of genes except for the Beta-mannosidase genes. This phylogenetic study provides how species evolve due to genetic changes.