GP41 structure and GP41-derived peptide inhibitors. and the GP41 N terminus as potential drug targets. In the analysis of factors that impact HIV-1 genomic diversity, we focused on protein multimerization, immunological constraints and HIV-human protein interactions. We found that amino acid diversity in monomeric proteins was higher than in multimeric proteins, and diversified positions were preferably located within human CD4 T cell and antibody epitopes. Moreover, intrinsic disorder regions in HIV-1 proteins coincided with high levels of amino acid diversity, facilitating a large number of interactions between HIV-1 and human proteins. Conclusions This first large-scale analysis provided a detailed mapping of HIV genomic diversity and highlighted drug-target regions conserved across different groups, subtypes and CRFs. Our findings suggest Rabbit Polyclonal to CSGLCAT that, in addition to the impact of protein multimerization and immune selective pressure on HIV-1 diversity, HIV-human protein interactions are facilitated by high variability within intrinsically disordered structures. Electronic supplementary material The online version of this article (doi:10.1186/s12977-015-0148-6) contains supplementary material, which is available to SR9238 authorized users. and is the NT or AA form of the position at the ith sequence in the dataset D, represents the Kronecker symbol, is identical to is defined as the average genetic diversity of all positions: Suppose two sequence datasets D1 and D2 aligned with the same reference genome have the number of sequences test was performed to compare the distributions of genetic diversity and a significant difference was identified if a p-value was lower than 0.05 [65]. Our Matlab implementation of genomic diversity analysis is available in Additional file 3. Acknowledgements We thank Fossie Ferreira, Jasper Edgar Neggers, Soraya Maria Menezes and Tim Dierckx for technical assistance and valuable contributions to our analysis. This work was supported by the National Nature Science Foundation of China [81130015]; the National Basic Research Program of China [2014CB910500]; the Fonds voor Wetenschappelijk Onderzoek C Flanders (FWO) [PDO/11 to K.T., G069214N]; the European Communitys Seventh Framework Programme (FP7/2007-2013) under the project Collaborative HIV and Anti-HIV Drug Resistance Network (CHAIN) [223131]. Abbreviations Additional files Additional file SR9238 1:(2.5M, pdf) Figures and tables. Figure S1. Gene maps and protein structures of HIV-1 and HIV-2. Figure S2. Distribution plots of nucleotide and AA diversity among HIV types, groups and subtypes. Figure S3. Distribution plots of AA diversity between HIV-1 subtype B/C and the other HIV groups/subtypes. Figure S4. Global distribution of HIV-1 genomic diversity. Figure S5. AA diversity along the full-length HIV genome. Figure S6. Global distribution of HIV-1 genomic diversity. Figure S7. Average AA diversity of HIV-1 protein clusters and number of HIV-human protein interactions. Figure S8. AA composition of HIV-1 subtype B genome, HIV-1 peptide-derived regions and sequences of HIV-derived peptide inhibitors. Figure S9. Average AA diversity of peptide-derived regions in HIV-1 subtype B. Figure S10. Solvent accessible surface area of peptide-derived regions in the HIV-1 subtype SR9238 B genome. Figure S11. Protein intrinsic disorder scores of peptide-derived regions in the HIV-1 subtype B genome. Figure S12. Protein structure of the HIV-1 GP120-CD4-Fab 48d complex (PDB: 2B4C, 3U4E) and mapped GP120 peptide-derived inhibitors. Figure S13. GP41 structure and GP41-derived peptide inhibitors. Figure S14. HIV-1 Integrase tetramer and Integrase-derived peptide inhibitors. Figure S15. HIV-1 RT structure and RT-derived peptide inhibitors. Figure S16. HIV-1 Protease homodimer structure and protease-derived peptide inhibitors. Figure S17. HIV-1 Tat structure and Tat-derived peptide inhibitors. Figure S18. HIV-1 Vpr structure and Vpr-derived peptides. Figure S19. HIV-1 Rev tetramer structure and Rev-derived peptide inhibitors. Figure S20. Structure of HIV-1 Capsid monomer and Capsid-derived peptide inhibitors. Figure S21. HIV-1 Vif structure and Vif-derived peptide inhibitors. Figure S22. Distribution plots of AA diversity between the consensus and the circulating genomes, within circulating genomes. Figure S23. Prediction similarities of the consensus and the 9 protein secondary structure prediction methods. Figure S24. Prediction similarities of the consensus SR9238 and 17 methods for protein intrinsically disorder prediction. Additional file 2: Table S1.(588K, pdf)Average amino acid diversity of HIV monomeric and multimeric proteins. Table S2. Summary of average AA diversity, average dN, average dS and average dN/dS in the HIV-1 subtype A1, B, C and CRF 01_AE genomes. Table S3. Statistical of dN/dS, dN and dS distributions in the monomeric and multimeric protein groups of the HIV-1 subtype A1, B, C and CRF01_AE genomes. Table S4. Summary of 121 peptide inhibitors derived from HIV-1 proteins..