Liquid–liquid phase separation sequence-based predictors

LLPS often involves sequence regions that have unique functional characteristics, as well as the presence of prion-like and RNA-binding domains. Nowadays there are just a few methods to predict the propensity of a protein to drive LLPS. The range of biological mechanisms involved in LLPS, the limited knowledge about these mechanisms and the important context-dependent component of LLPS make this problem challenging. In the last years, despite the advances in this field, just few predictors, specific for LLPS, have been developed, trying to understand the relationship between protein sequence properties and the capability to drive LLPS.  Here we will revise the state-of-the-art LLPS sequence-based predictors, briefly introducing them and explaining which are the individual protein characteristics that they identify in the context of LLPS.

Table 2
Predictor Published Description - Type of data
PSPer[1] 2019 PSPer is a method trained to identify prion-like RNA binding phase-separation proteins (PSPs). This method is focused on a particular feature of LLPS proteins and provides an overall score for a given protein depending on the presence of this feature. The method is trained on an experimental dataset of FUS-like PSPs, and the biophysical characteristics (PLD and RNA binding domain, RNA-recognition motif, disordered and additional domains) that belong to each region, implemented in a probabilistic model. This method was also trained including a negative dataset of ordered proteins, so it is expected that its performance is increased on those disordered proteins driven LLPS.[1]
PLAAC[2] 2014 PLAAC predicts prion-like amino acid composition, usually enriched in polar-residues by using Hidden Markov Model (HMM). This method was originally developed before realizing the implication of PLDs in LLPS, and consequently it is not trained to identify the majority of phase separating regions.[2]
PScore[3] 2017 PScore is a statistical scoring algorithm that predicts pi-pi interactions. It compares pi-pi interactions predicted in the target proteins with all proteins found in the PDB to assign a score of phase-separation propensity.[3]
catGRANULE[4] 2016 catGRANULE is a method that was originally trained against yeast protein but it has been shown to be useful to predict human phase-separating proteins.[5] The algorithm is based on sequence composition statistics to differentiate proteins that are localized in yeast granules from the rest of the yeast proteome. The features considered to weight the residues are disorder and nucleic-acid binding propensities, as well as properties of some amino acids.[4]
PSPredictor[6] 2019 PSPredictor is a machine learning approach to predict proteins that phase separate, trained on a set of experimentally validated protein sequences in the LLPSDB database.[6]
PSAP[7] 2021 PSAP is a random forest classifier to predict the probability of proteins to mediate phase separation. This classifier is trained on a set of 90 high-confident HUMAN proteins that drive LLPS.[7]
FuzDrop[8] 2020 FuzDrop is a method to predict droplet-driver promoting regions and proteins. The algorithm was trained on a dataset of drivers collected from different public databases, and the output is a per-residue probability of droplet formation.[8]
ParSe[9] 2022 ParSe v2 explores the possibility that protein mediated phase separation can be predicted from sequence-based calculations of hydrophobicity, α-helix propensity, and a model of the polymer scaling exponent (νmodel). The algorithm was trained on a curated dataset of homotypic phase-separating intrinsically disordered sequences that were experimentally verified to phase-separate in vitro.[9]

LLPS Simulations edit

Another important computational resource in the field of LLPS are the theoretic simulations of proteins, particularly Intrinsically disordered proteins (IDPs), driving LLPS. These simulations are complementary to the experiments and provide important insights about the molecular mechanisms of individual proteins driving LLPS. A review from Dignon et al.[10] discussed how these simulations can be applied to interpret the experimental results, to explain the phase behavior and to provide predictive frameworks to design proteins with tunable phase transition properties. The challenge is the compromise between the resolution of the model and the computational efficiency, since all-atom simulations of big systems involving IDPs are still difficult to be performed. Moreover, the molecular interactions among IDPs in the droplet-state are still poorly understood, and the combination of experimental data and simulations are indispensable to elucidate them. Improvements in sampling and simulation methods might occur in the next few years, in order to enlighten these mechanisms.[11]

See also edit

References edit

  1. ^ a b Orlando, Gabriele; Raimondi, Daniele; Tabaro, Francesco; Codicè, Francesco; Moreau, Yves; Vranken, Wim F (2019-04-17). "Computational identification of prion-like RNA-binding proteins that form liquid phase-separated condensates" (PDF). Bioinformatics. 35 (22): 4617–4623. doi:10.1093/bioinformatics/btz274. ISSN 1367-4803. PMID 30994888.
  2. ^ a b Lancaster, A. K.; Nutter-Upham, A.; Lindquist, S.; King, O. D. (2014-05-13). "PLAAC: a web and command-line application to identify proteins with prion-like amino acid composition". Bioinformatics. 30 (17): 2501–2502. doi:10.1093/bioinformatics/btu310. ISSN 1367-4803. PMC 4147883. PMID 24825614.
  3. ^ a b Vernon, Robert McCoy; Chong, Paul Andrew; Tsang, Brian; Kim, Tae Hun; Bah, Alaji; Farber, Patrick; Lin, Hong; Forman-Kay, Julie Deborah (2018-02-09). Shan, Yibing (ed.). "Pi-Pi contacts are an overlooked protein feature relevant to phase separation". eLife. 7: e31486. doi:10.7554/eLife.31486. ISSN 2050-084X. PMC 5847340. PMID 29424691.
  4. ^ a b Bolognesi, Benedetta; Gotor, Nieves Lorenzo; Dhar, Riddhiman; Cirillo, Davide; Baldrighi, Marta; Tartaglia, Gian Gaetano; Lehner, Ben (2016-06-28). "A Concentration-Dependent Liquid Phase Separation Can Cause Toxicity upon Increased Protein Expression". Cell Reports. 16 (1): 222–231. doi:10.1016/j.celrep.2016.05.076. ISSN 2211-1247. PMC 4929146. PMID 27320918.
  5. ^ Ambadipudi, Susmitha; Biernat, Jacek; Riedel, Dietmar; Mandelkow, Eckhard; Zweckstetter, Markus (2017-08-17). "Liquid–liquid phase separation of the microtubule-binding repeats of the Alzheimer-related protein Tau". Nature Communications. 8 (1): 275. Bibcode:2017NatCo...8..275A. doi:10.1038/s41467-017-00480-0. ISSN 2041-1723. PMC 5561136. PMID 28819146.
  6. ^ a b Sun, Tanlin; Li, Qian; Xu, Youjun; Zhang, Zhuqing; Lai, Luhua; Pei, Jianfeng (2019-11-15). "Prediction of liquid-liquid phase separation proteins using machine learning": 842336. doi:10.1101/842336. S2CID 209574590. {{cite journal}}: Cite journal requires |journal= (help)
  7. ^ a b Mierlo, Guido van; Jansen, Jurriaan R. G.; Wang, Jie; Poser, Ina; Heeringen, Simon J. van; Vermeulen, Michiel (2021-02-02). "Predicting protein condensate formation using machine learning". Cell Reports. 34 (5): 108705. doi:10.1016/j.celrep.2021.108705. hdl:2066/231424. ISSN 2211-1247. PMID 33535034. S2CID 231804701.
  8. ^ a b Hardenberg, Maarten; Horvath, Attila; Ambrus, Viktor; Fuxreiter, Monika; Vendruscolo, Michele (2020-12-29). "Widespread occurrence of the droplet state of proteins in the human proteome". Proceedings of the National Academy of Sciences. 117 (52): 33254–33262. doi:10.1073/pnas.2007670117. ISSN 0027-8424. PMC 7777240. PMID 33318217.
  9. ^ a b Ibrahim, Ayyam; Khaodeuanepheng, Nathan; Amarasekara, Dhanush; Correia, John; Lewis, Karen; Fitzkee, Nicholas; Hough, Loren; Whitten, Steven (2023-01-01). "Intrinsically disordered regions that drive phase separation form a robustly distinct protein class". Journal of Biological Chemistry. 299 (1): 102801. doi:10.1016/j.jbc.2022.102801. ISSN 0021-9258. PMC 9860499. PMID 36528065.
  10. ^ Dignon, Gregory L; Zheng, Wenwei; Mittal, Jeetain (2019-03-01). "Simulation methods for liquid–liquid phase separation of disordered proteins". Current Opinion in Chemical Engineering. Frontiers of Chemical Engineering: Molecular Modeling. 23: 92–98. doi:10.1016/j.coche.2019.03.004. ISSN 2211-3398. PMC 7426017. PMID 32802734.
  11. ^ Shea, Joan-Emma; Best, Robert B; Mittal, Jeetain (2021-04-01). "Physics-based computational and theoretical approaches to intrinsically disordered proteins". Current Opinion in Structural Biology. Theory and Simulation/Computational Methods ● Macromolecular Assemblies. 67: 219–225. doi:10.1016/j.sbi.2020.12.012. ISSN 0959-440X. PMC 8150118. PMID 33545530.