Virtual screening of natural compounds as combinatorial agents from indian medicinal plants against estrogen positive breast cancer

Facing worldwide challenges associated with multifactorial etiology of breast cancer, designing of combinatorial therapies using natural compounds is currently the emergent way of treating several cancers including breast cancer in a synergistic way, which may mitigate several problems associated with multiple receptor targeting. In this research, Estrogen receptor positive breast cancer was taken as prototype and several key receptors associated with this particular disease were targeted by virtual screening of natural compounds found in Indian originated medicinal plants using Computer aided Drug Designing (CADD) strategies. We found the combination of Carpusin, Paulownin Cornigerine, Nororientaline, Oryzalexin B, Romucosine H and Colchicine as effective against six potential receptors i.e. FGFR2, ESR1, PIK3CA, PIK3CB, PIK3CD and AR in Estrogen receptor positive breast cancer with their binding energies in the range of ∆G ≤ -8.0 Kcal/mol as well as significant number of common amino acid binding residues as compared with binding sites of receptors. Thus this research holds significant implications for the designing of combinatorial therapeutic agents against breast cancer which can be further tested in-vitro and in-vivo to prove their synergistic efficiency.


INTRODUCTION
The occurrence of cancer is related to several factors to which human expose, the complexity of which is an arduous challenge in the field of epidemiology due to which many epidemiologists around the world have been understanding and determining cancer's multifactorial etiology. [1] The complex interplay of several genetic and non-genetic factors have been clearly assessed in several researches that predispose to development of cancer. In this research, breast cancer was taken as a prototype due to its prevalence at a large scale worldwide and in major areas of India. Breast cancer has been described as multifactorial in occurrence in several epidemiological studies in which 90% cases occur due to the involvement of non-genetic factors, like environmental factors, major biological, chemical carcinogenic agents. [2]- [4] To date, many researches have shown that a dysregulation of several molecular and signaling pathways occur due to the multifactorial effects pf breast cancer. Nevertheless, several potential therapies are needed for targeting multiple receptors and pathways to serve as an effective treatment against its proliferation and metastasis. [5] Breast Cancer being an important concern of health and cause of deaths due to cancer among women worldwide has shown increased trends of prevalence and incidence, resulting in a high mortality rate globally. There has been a sharp increase in breast cancer cases in recent past. [6] [7]. According to 'Global Burden of Disease Cancer Collaboration', 2017, the data showed an increment by 33% overall cancer cases from 2005-2015 worldwide. [8] [9]. The total cancer cases are likely to go up to 1,148,757 cases in 2020. [6] The age standardized prevalence data of cancer cases shows 97 per one lakh individuals according to recent trends. According to a report by WHO, the present data of 6% total deaths due to cancer in India might be harmful in near future which is estimated to show an increment up to 9 million deaths by 2030. [9]- [11] Improvements in cancer therapies have shown a remarkable success in recent past. Modern therapies have been developed which varies from DNA therapies based on DNA-excision repairing and modifying cellapoptotic pathways etc. to other potential treatments like hormone-based therapies like tamoxifen (Estrogen synthesis blocking agent) has been developed for breast cancer. Monoclonal antibody based treatment, Immune-based therapies (ICIs, CART therapies) etc. are other potential therapies. [12] [13] Drug delivery systems like nanoparticles have also been developed in a past decade for targeted delivery in several cancers including breast cancer to avoid by-stander killing of tumor cells. [14] [15]. They include Doxorubicin, ThermoDox (Temperature based release of doxorubicin), SPIONs (Supermagnetic iron oxide nanoparticles), Nanotherm etc. [16] [17] In the recent Covid-19 pandemic due to novel coronavirus (SARS COV-2), started out around December 2019, there has been a declination in cancer research worldwide and a high chances of increasing comorbidities in patients during this era. Cancer patients have been found vulnerable to Covid-19 infection according to some Chinese studies. Moreover consequences like delayed treatment, distraction in healthcare for cancer patients during pandemic conditions pose fatal consequences for cancer patients. [18]. Remarkable efforts have been made uptil now for overcoming the therapeutic and diagnostic challenges in this vulnerable population. According to recent safe guidelines of Canadian Association of Breast Imaging, delaying of breast cancer imaging and screening is needed to be avoid along with prevention of infection to the patients in Covid-19 pandemic. [18] [19] Combinatorial therapies for cancer have been designed in many researches for overcoming challenges of cross-talk inhibition, signaling feedback etc. The rationale behind using multiple drugs is to suppress multiple receptors and pathways in a particular tumor to eradicate it in a synergistic way as compared with monotherapies. [20] [21]. Uptil now, different combinations of drugs including natural compound-based drugs, synthetic drugs, monoclonal antibodies etc. have been designed. [9] [22]. Examples include, USFDA approved Ipilimumab and nivolumab for unresectable melanoma. [23]. Combinatorial therapies have also been designed in Breast cancer. Eg. Combinations of Gemcitabine and Paclitaxel with Cyclophosphamide and Epirubicin in breast cancer. [24] , combination of Fulvestrant and Anastrozole in hormone receptor positive breast cancer. [25].
Natural compounds has attained enormous benefits in therapeutics like stability, enhancement in efficacy, reduced toxicity, bioavailability, etc. In cancer treatment, many natural compound-based medicines have been designed as monotherapeutics and in combinations as well as in combination with synthetic compounds. [26]- [28]. Examples-Cyclophosphamide and Paclitaxel (Synthetic) in combination with curcumin (Natural), has been used in MDA-MB-231, MCF-12F and MCF-7 cancer cell lines. [29]- [33]. Combinations of natural compounds in breast cancer have been documented in several researches. [34] [31].
Nanoparticles in conjugation with natural compounds and their combinations have been tailored in different cancers including breast cancer, to act as an efficient targeted drug delivery tool to enhance the specificity and preventing bystander killing. [14]- [17].
Breast Cancer as a prototype Breast Cancer has been found as the most frequently diagnosed cancer in India. The incidence rate has been estimated to 25.8/100000 individuals. In this project, estrogen receptor positive breast cancer was taken as a prototype. Estrogen receptor positive breast cancer accounts for approximately 80% breast cancer cases. Uptil now, various therapies have been designed for ER positive BC. [35]. They include, endocrine therapies under which GnRHa (Gonadotropin releasing hormone agonists), SERMSs (Selective estrogen receptor modulators, eg. Tamoxifen), SERDs (Selective estrogen receptor down-regulators, eg. Fulvestrant) in combination with AIs (Aromatase inhibitors) are given as combinatorial drug therapies. [36] [37].
Although a vast majority of receptors are associated with the expression of ER(+)BC, but in this research, some potential receptors of ER(+)BC in human were screened using a database which have been found to play an important role in its progression. Here, FGFR2 gene (encode protein from fibroblast growth receptor family), ESR1gene (encode estrogen receptor and a transcription factor), AR gene (encode androgen receptor), PIK3CA, PIK3CB and PIK3CD genes (encoding three catalytic isoforms i.e. p110α, p110β and p110δ of phosphatidylinositol 4,5-bisphosphate 3-kinase respectively) were taken into consideration.
FGFRs play an important role in the progression of several cancers whose structures composed of different domains like 3 immunoglobulin-like domain, cytoplasmic tyrosine kinase domain, extra-cellular domain. [38]. FGFR2 has been reported as a crucial biomarker in breast cancer. In a research, its differently expressed genes (namely CXCL8, BMP7, CD44, MMP9) have been found between FGFR2 silenced breast cancer cell lines. FGFR2 also plays an important role in the progression of ER(+)BC. N550K and M5381 mutations of FGFR2 has been observed to offer resistance to CDK4/6, SERDs inhibitors in ER(+)BC patients.
[38] [39] ESR1 gene that encodes for estrogen receptor (ER) is an important biomarker in ER(+)BC which localizes to nucleus and forms homodimer or heterodimer with ESR2. [40]. In a research, patients with circulating ESR1 mutations have been found resistant to endocrine therapy which is given in ER BC treatment. In another research, these ESR1 mutations have been validated using ct-DNA (circulating tumor DNA) detection method in ER(+)BC. [41] [42] Androgen receptor (encoded by AR gene) which have 3 functional domains and function as a transcription factor activated by steroid hormone act as a potential prognostic biomarker in breast cancers including ER(+)BC. [43]. Overexpression of AR as a key molecule has been seen in aromatase inhibitors (eg. exemestane) resistant ER(+)BC. Therefore, AR antagonism may serve as a promising strategy in ER(+)BC which may in turn target several AR associated pathways like ERK1/2 signaling pathway, PI3K pathways etc. [44], [45] The pathways of Phosphoinositide 3-kinase PIK (composed of a regulatory and catalytic subunits) shows an emerging prevalence and are responsible for frequent alterations in breast cancers including ER(+)BC. [46]. The three genes i.e. PIK3CA, PIK3CB and PIK3CD encode three catalytic isoforms i.e. p110α, p110β and p110δ respectively and treatments using RNAi has been designed to target these isoforms.
[47]- [49]. PIK3CA mutations have been reported crucial in fulvestrant resistant ER(+)BC using circulating tumor DNA (ct DNA), therefore PIK3CA has been found as an important regulator in progression and migration in ER(+)BC. [50]- [53] In-silico approach in drug development Currently, In-silico approach has emerged as a promising strategy which ensures fruitful insights in the various fields like drug designing, target profiling, drug repurposing, polypharmacology, etc. [54]. Conventional pipeline remain centered on time-taking experiments and high delivery cost. Computer aided drug discovery (CADD) on other hand has exhibited as an effective technology for significant drug designing at a faster and cheaper rate in several diseases including cancers. [55]- [57]. Virtual screening, High throughput screening, Molecular docking etc. are emergent techniques for screening of drugs against particular protein receptors involved in various diseases as well as their structural and binding analysis. [58], [59]. Many useful tools have been developed for virtual screening and CADD. Autodock 4 [60], Autodock Vina [60], PyRx [61], PyMol [62], etc. are some of the open source online softwares available for studying the affinities between ligands and proteins and their binding poses, crystallographic analysis, molecular dynamics evaluation etc.
There are many databases available for ligand library screening like ZINC 15, PubChem, etc. IMPPAT (Indian Medicinal Plants, Phytochemistry and Therapeutics) is an online curated database (developed by The Institute of Mathematical Sciences, IMSc, India) which has many useful in-built webservers like admetSAR, FAF-Drugs4, FAF-QED, Open Babel, RDKit, etc. There is a huge library of 1742 medicinal plants of Indian origin, 9596 phytochemicals present in this database which have been used in traditional Indian medicines since time immemorial. [63] 2. MATERIALS AND METHODS

Receptors selection
510 genes associated with ER(+)BC were identified using DisGeNET, a curated database for genes associated with human diseases and variants. [64]. Then those 510 genes were filtered on the basis of Genedisease association score (Scoregda ≥ 0.1), probability of being loss of function intolerant (pLI ≥ 0.99), total no. of PMIDs supporting association (N. PMIDs ≥ 7) and identified top hits were further evaluated.

Protein-Protein Interaction study
Types of protein-protein interactions and their interaction scores between the selected receptors were calculated using STRING (11.0), a database for identification of proteins and their functional interactions.
[65]. The strength of associations among them was evaluated from the number of nodes, PPI enrichment value, number of edges, average node degree, average local clustering coefficient and expected number of edges.

Binding site analysis
Binding sites of receptors were identified using Discovery Studio Visualizer (DSV), a software for analyzing sequences, modeling molecular structures and other data. [66], [67]. It locates all surrounding amino acid residues of a bounded ligand to the protein within a distance under defined threshold and forms a 2-D image which depicts all the amino acid residues and type of bonds through which they were attached to the ligand. In this research, high resolution crystal structures of proteins with their previously experimentally identified inhibitors/drugs were used and their drug sites were evaluated using DSV to which our concerned ligand was supposed to bind.

Protein preparation
Before docking of concerned ligands with selected proteins, proteins were prepared by deleting the hetatoms (co-crystallized water molecules, nonstandard residues, nonpolar hydrogens, small molecules and lone pairs) using DSV. Later gasteiger charges and polar hydrogens were added and merged using PyRx. Here, a file in pdbqt format is generated which is required for docking via Autodock.
6. Ligand library preparation IMPPAT database was used for literature analysis of natural compounds using in-built webservers admetSAR, FAF-Drugs4, FAF-QED, RDKit. Initially all 9596 phytochemicals were primarily screened and filtered on the basis of zero Lipinski RO5 violations, zero Oral PhysChem Score (Traffic lights) which indicates good oral bioavailability, good GlaxoSmithKline's (GSK's) 4/400 score which indicates log P (octanol-water partition coefficient) ≤ 4 and Mw (Molecular weight) ≤ 400 Dalton, Pfizer's 3/75 rule's good values which indicates small molecules with values of TPSA (Topological polar surface area) > 75 and logP <3 Å 2 indicating more drug likeliness and less toxicity, Egan rule's good values (−1.0 ≤logP ≤5.8 and TPSA ≤130 Å 2 ) which indicates good oral bioavailability, veber rule's good value (TPSA<140 and number of rotatable bonds ≤ 10) which satisfies good oral bioavailability, good QEDw (Quantitative estimate of druglikeness) score ( ≥ 0.8) which evaluates the weighted geometric mean of log P, number of hydrogen donors (≤5) and acceptors (≤10), molecular weight, No. Of rotatable bonds, TPSA, number of structural alerts and number of aromatic rings. QEDw ≥ 0.8 signifies high druggability of compounds. Later, the primarily screened compounds were further secondarily screened on the basis of good PK properties like positive for BBB (Blood Brain Barrier) permeability, positive for HIA (Human intestinal absorption), good solubility. After screening, ligands were prepared finally for docking after energy minimization for 200 conjugate gradient steps with an UFF force field using PyRx.

Molecular docking
After secondary screening and preparation of ligands and receptors, they were subjected to structurebased drug designing through molecular docking using PyRx (with in-built Autodock tool). After selecting the binding sites manually, a grid box of dimension 60×60×60 Å was generated with grid spacing 0.375 Å which confines the active sites of protein to which the concerned ligand was supposed to bind within a specific shortened area. Then, after completion of docking within the specific sites, the binding energies i.e. ∆G (in Kcal/mol) were noted and the compounds were selected on the basis of top hits/ autodock vina good scores. It is considered that lower the ∆G, more effective the binding is (in the range of ∆G ≤ -8.0 Kcal/mol).

Screening of drug likeliness and ADMET properties evaluation
The drug likeliness was analyzed using admetSAR open-source tool. The docked ligands having good ∆G were again filtered through extensive ADMET properties such as good PK and PD properties like positive BBB permeability, positive HIA, positive calcium carbonate permeability, Non-inhibition of P-glycoprotein protein substrate and inhibitors, Non-inhibition of CYP 450 (Cytochrome P450) substrate and inhibitors (including 2C9, 2D6, 3A4, 1A2, 2C9, 2D6, 2C19, 3A4), low CYP 450 inhibitory promiscuity (CYP IP), Nonames test toxicity. P-glycoproteins and CYP enzymes substrate play a fundamental role in metabolism of drugs, so their inhibition affects the drug likeliness property of drugs. After assessing binding scores, important ADMET properties, obtained drugs were subjected to binding amino acid residues evaluation using PyRx and PyMol and proposed as combinatorial therapeutic agents.

Binding amino acid residues evaluation
After molecular docking and extensive ADMET analysis, the binding amino acid residues of finally selected docked ligands and the type of bonds through which they are attached were observed by generating a 2-D image using DSV. Later PyMol was used for depicting the bounded ligands in the binding pocket of receptors.
Results and discussion In this research, natural compounds from Indian medicinal plants were screened against potent receptors of ER(+)BC. Further, they were assessed on the basis of good ADMET properties. The 7 top natural compounds having excellent ADMET properties were found to have multi-targeting effects with efficient binding energies and similar binding residues against the binding pockets of selected receptors in ER(+)BC. (Work flow in figure: 1).

Receptors selection
The associated genes for ER(+)BC were screened using DisGenet in which we found total 510 associated genes, out of which 6 top genes (FGFR2, ESR1, PIK3CA, PIK3CB, PIK3CD, AR) were filtered on the basis of parameters like pLI, Scoregda, N.PMIDs as mentioned previously. These genes are basically responsible for translation of mentioned receptors/proteins. Moreover, it was found that these six genes were of 'kinase' and 'nuclear receptor' protein class and have a large number of other associated diseases.  (Table 1). So, on the basis of above parameters, these 6 receptors were found potent to regulate the biological behavior of ER(+)BC, therefore proposed for further evaluation.

Protein-protein interactions
Types and strength of protein-protein associations between the selected 6 receptors were assessed using STRING (11.0) database ( Figure 2) which defines PPI (protein-protein interaction) with confidence ranges for obtained data scores (low confidence scores <0.4; medium confidence: 0.4-0.7; high confidence: >0.7). All 6 were found to have firm interactions among themselves with significant interaction scores as mentioned in Table 2. Moreover, 6 nodes, 10 edges, average node degree = 3.33, average local clustering coefficient = 0.806, 2 expected number of edges and PPI enrichment p-value = 3.48e-05 were found. These values are strong enough to elucidate the significant interactions among receptors. Therefore, targeting these receptors was considered very useful in targeting associated pathways simultaneously for an effective multi-targeted effect of drugs. These receptors were further evaluated for molecular docking simulations.

Protein retrieval and binding sites evaluation
Binding sites or inhibitory sites of proteins after retrieving their crystal structures from PDB in complex with different experimentally identified inhibitors were assessed through DSV. The surrounding amino acid residues (which are considered as binding sites) of bounded ligand were noted from DSV which generates a 2-D image of bounded inhibitor/ligand and its surrounding amino acid residues within a distance of threshold value. As shown in Figure: 3, FGFR2, ESR1, PIK3CA, PIK3CB, PIK3CD, AR were found to have 10, 15, 11, 10, 11, 10 residues respectively.

Ligands screening
In this research, natural compounds from medicinal plants of Indian origin were assessed through IMPPAT database of 9596 phytochemicals. Initially they were filtered using FAF-Drugs4 and FAF-QED open-source cheminformatics platforms on the basis of LRo5 violations, Oral PhysChem score (traffic lights), GSK's 4/400, Pfizer's 3/75, Veber's rule, Egan's rule, QEDw as mentioned previously. At this stage, we found 99 natural compounds satisfying above mentioned properties. Then out of them, 42 Natural compounds were filtered on the basis of PK and PD properties (BBB permeability preferentialy) using admetSAR open source tool. (Table: 4-6)

Molecular docking evaluations
In this research, Structure-based Drug Designing (SBDD) through molecular docking was undertaken using PyRx (with in-built Autodock). After molecular docking of 42 ligands at the selected binding sites. Here, 25 natural compounds out of 42 were found to have promising binding energies in the range of ∆G ≤ -8.0 kcal/mol with RMSD lower bound and RMSD upper bound values = 0. These 25 selected compounds were further evaluated through extensive ADMET analysis using admetSAR. (Table: 3-6)

Prediction of ADMET properties
After getting the screening done on the basis of ∆G (in the range ≤ -8 kcal/mol), 23 natural compounds were assessed for ADMET properties for identification of likelihood of compounds using admetSAR online server on the basis of CYP450 inhibitory promiscuity, ames toxicity, as mentioned before. Here, top 7 natural compounds were found to possess mentioned desired properties (preferentially CYP inhibitory promiscuity) as shown in table 6. On other hand, compounds like 4-Hydroxysesamin (CID: CID:16745513), Pinoresinol (CID:17750970), Irilone (CID:5281779) among 23 compounds were refuted besides having very good ∆G score, due to not fulfilling the criteria of CYP inhibition. (Table: 3-6). Then these 7 natural compounds were finally selected and subjected for binding amino acid residues evaluation using DSV and PyMol, within the particular domain (kinase or nuclear receptor domain) of concerned receptor. (Figure: 4-9)

Molecular docking studies of receptors
Having good binding efficiencies (indicating good stability and longer half-life) with 6 receptors, promising ADMET properties and significant number of common interacting amino acid residues as compared with binding sites of receptors, 7 natural compounds compounds whose 2-D structures were taken from PubChem as shown in Figure 6, were finally proposed on the basis of receptor based drug screening or Structure-based Drug Designing (SBDD). Here, we found that among top 7 compounds, some of them showed highest binding energies (as shown in table 3), ∆G (in the range ≤ -8 kcal/mol), as well as significant number of common amino acid residues with binding AA residues of protein. (Figure: 4,5,7-9). Besides this, it was also observed that some ligands like Paulownin-CID:3084131 which showed ∆G of -9.7 Kcal/mol with PIK3CD receptor and ∆G of -8.7Kcal/mol with PIK3CA receptor; Cornigerine-CID:100188 which showed ∆G of -8.3 Kcal/mol with PIK3CD receptor were refuted as no common amino residues were found. The various types of molecular interactions (H-bonds, π-sigma, alkyl, π-alkyl, π-Sulfur, etc.) among the ligand and binding residues were also studied as shown in Tables (7)(8)(9)(10)(11)(12) . Finally proposed and their combinations are believed to be promising in inhibiting the potential receptors of ER(+)BC in a multi-targeted way. Furthermore, their synergistic effects can be evaluated further in in-vitro studies on ER(+)BC cell lines.

3.Conclusion
In this research, virtual screening of natural compounds found in Indian medicinal plants was done against the key receptors of Estrogen positive breast cancer. Ligand library was created using IMPPAT database followed by molecular docking simulations and ADMET predictions using PyRx and admetSAR tools respectively. We hereby conclude that the combination of Carpusin-CID:134369, Paulownin-CID:3084131, Cornigerine-CID:100188, Nororientaline-CID:440632, Oryzalexin B-CID:176496, Romucosine H-CID:5320994 and Colchicine-CID:6167 are potential natural compounds against key receptors in clinical expression of Estrogen positive breast cancer. Among them, Carpusin, Paulownin, Oryzalexin B and colchicine showed excellent ADMET properties. These seven compounds which possess good PK, PD, Physiochemical properties as well as high drug likeliness shows efficient binding with these key receptors. Therefore, they can be further explored for their synergistic efficiency in molecular dynamics studies to prove their potential to act as combinatorial therapeutic agents against estrogen receptor positive breast cancer.
LIST OF FIGURES: 1. Figure 1: Figure Table 2: Protein-Protein interaction score obtained from STRING (11.0) database. 3. Table 3: Binding energy scores of 7 top ligands with selected receptors. Scores highlighted in red were selected as best ligands with top binding energies with significant number of common interacting amino acid residues with receptor' binding sites. 4. Table 4: Druggability properties of top 7 selected natural compounds. Computed via FAF-Drugs4 and FAF-QED. 5. Table 5: Physicochemical properties of top 7 selected ligands. Computed via FAF-Drugs4 and RDKit 6. Table 6: ADMET analysis of top 7 selected ligands. Computed via admetSAR. 7. Table 7: Types of molecular interactions of top 3 ligands with FGFR2. Binding amino acid residues highlighted in red indicate similarity with the binding sites of receptor. 8. Table 8: Types of molecular interactions of top 3 ligands with ESR1. Binding amino acid residues highlighted in red indicate similarity with the binding sites of receptor. 9. Table 9: Types of molecular interactions of top 3 ligands with PIK3CA. Binding amino acid residues highlighted in red indicate similarity with the binding sites of receptor. 10. Table 10: Types of molecular interactions of top 1 ligands with PIK3CB. Binding amino acid residues highlighted in red indicate similarity with the binding sites of receptor. 11. Table 11: Types of molecular interactions of top 2 ligands with PIK3CD. Binding amino acid residues highlighted in red indicate similarity with the binding sites of receptor.