|Stable release||3.8.1 / October 27, 2012|
|Written in||Core: C; Bindings: Java, lua, Ruby.|
|Platform||Unix, Mac OS X, Microsoft Windows|
|Type||Distributed system simulator, Network simulator|
|License||GNU Lesser General Public License|
SimGrid is a toolkit that provides core functionalities for the simulation of distributed applications in heterogeneous distributed environments. The specific goal of the project is to facilitate research in the area of parallel and distributed large scale systems, such as Grids, P2P systems and Cloud. Its use cases encompass heuristic evaluation, application prototyping or even real application development and tuning.
In 1999 Henri Casanova joined the AppLeS research group in the Computer Science and Engineering Department at the University of California at San Diego, as a post-doc. The AppLeS group, led by Francine Berman, focused mostly on the study of practical scheduling algorithms for parallel scientific application on heterogeneous, distributed computing platforms. Shortly after Henri joined the group he faced the need to run simulation instead of or in addition to merely running real-world experiments. At that time Arnaud Legrand, a 1st year graduate student at Ecole Normale Superieure de Lyon, France, spent 2 months in the summer in the AppLeS group as a visiting student. He worked with Henri that summer on a research project as part of which he implemented an ad-hoc simulator.
After Arnaud left UCSD, Henri realized that most likely every researcher in the AppLeS group would eventually need to run simulations, and that they would most likely all end up rewriting the same code at one point or another. He took apart the simulator that Arnaud had developed, and packaged it as a more generic simulation framework with a simple API, and called it SimGrid v1.0 (a.k.a. SG). This version was simple, and in retrospect a bit naive. However, it was surprisingly useful to study "centralized" scheduling (e.g., off-line scheduling of a DAG on a heterogeneous set of distributed compute nodes). SimGrid v1.0 was described in "SimGrid: A Toolkit for the Simulation of Application Scheduling, by Henri Casanova, in Proceedings of CCGrid 2001". Henri became the first user of SimGrid and used it for several research projects from then on.
By 2001 time Arnaud was engaged in his Ph.D. thesis work and started studying "decentralized" scheduling heuristics, that is ones in which scheduling decisions are made by more or less autonomous agents that typically have only partial knowledge of the applications and/or computing platform. Although simulating decentralized scheduling with SimGrid v1.0 was actually possible (and done by one Ph.D. student at UCSD in fact!), it was extremely cumbersome and limited in scope. So Arnaud built a layer on top of SG, which he called MSG (for Meta-SimGrid). MSG added threads and introduced the concept of independently running simulated processes that performed computations and communication tasks in possibly asynchronous fashion. MSG was described in "MetaSimGrid: Towards realistic scheduling simulation of distributed applications, by Arnaud Legrand and Julien Lerouge, LIP Research Report". This resulted in the following layered architecture:
(user code) ----------- | MSG | | ------- | | SG | -----------
With Henri and some of his students using SG and Arnaud using MSG, the project started having a (tiny) user base. It was time to be more ambitious and to address one of the key limitations of SG: its inability to simulate multi-hop network communications realistically. In the summer 2003 Loris Marchal, a 1st year graduate student at Ecole Normale Superieure, came to UCSD to work with Henri. During that summer, based on results in the TCP modeling literature, he implemented a macroscopic network model as part of SG. This model dramatically increased the level of realism of SimGrid simulations and was initially described in: "A Network Model for Simulation of Grid Applications, by Loris Marchal and Henri Casanova, LIP research report". By the end of 2003 the work at UCSD and at Ecole Normale was merged in what became SimGrid v2, as described in: "Scheduling Distributed Applications: the SimGrid Simulation Framework, by Henri Casanova, Arnaud Legrand, and Loris Marchal, in Proceedings of CCGrid 2003".
SimGrid v2, with its much improved features and capabilities, garnered a larger user base and many friends and collaborators of Arnaud and Henri started using it for their research. On these friends was Martin Quinson, then a Ph.D. student at Ecole Normale Superieure, who was working in the area of distributed resource monitoring systems. As part of his Ph.D. Martin attempted to develop a network topology discovery tool and quickly found out that it was difficult and required prototyping in simulation. Faced with the perspective of first implementing a throw-away prototype in simulation and then reimplementing the whole thing for production, Martin started working on a framework that would easily compile the same code in "simulation mode" or in "real-world mode". He found this ability to be invaluable when developing distributed systems and built his framework, called GRAS, on top of MSG (for the simulation mode) and on top of the socket layer (for the real-world mode). GRAS is described in "GRAS: A Research & Development Framework for Grid and P2P Infrastructures, by Martin Quinson, in Proceedings of PDCS 2006". This led to the following layered software architecture:
(user code for either SG, MSG or GRAS) ----------------------------- | | | GRAS API | | | ------------------- | | |GRAS S | |GRAS R | | | --------- --------- | | MSG | |sockets| | --------------| --------- | SG | -------------------
At this point, with more users running more complex simulations, it became clear that the initial SG foundation inherited from SimGrid v1 was too limiting in terms of scalability and performance. In 2005 Arnaud took the bull by the horns and replaced SG with a new simulation engine called SURF, thus removing the SG API. Users reported acceleration factors of up to 3 orders of magnitude when going from SG to SURF. Furthermore, SURF is much more extensible than SG ever was and has enabled the evolution of simulation models used by SimGrid. Although it made sense at the time to re-implement GRAS on top of SURF, it was never accomplished due to the "too many things to do not enough time" syndrome. Martin added a layer on top of GRAS called AMOK, to implement high-level services needed by many distributed applications, thus leading to the new overall layered architecture:
(user code for either MSG or GRAS—using AMOK or not) ------- | AMOK| ------------------------- | | GRAS API | | ------------------- | |GRAS S | |GRAS R | | --------- --------- | MSG | |sockets| --------------| --------- | SURF | ---------------
This architecture culminated in SimGrid v3! One development worth mentioning is that of SimDAG, written by Christophe Thiery during an Internship with Martin Quinson. Many users indeed had asked functionality similar to what the SG API provided in SimGrid v1 and v2, to study centralized scheduling without all the power of the MSG API. SimDAG provides an API especially for this purpose and was integrated in SimGrid v3.1, leading to the following layered architecture:
(user code for either SimDag, MSG or GRAS) ------- | AMOK| -------------------------------- | | | GRAS API | | | ------------------- | | |GRAS SG| |GRAS RL| | | --------- --------- |SimDag| MSG | |sockets| |--------------------| --------- | SURF | ----------------------
SimGrid 3.2, the current publicly available version as this document is being written, implements the above architecture and also provides a (partial) port to the Windows operating system. Ongoing Work
As the project advances, it becomes increasingly clearer that there is a need for an intermediate layer between the base simulation engine, SURF, and higher level APIs. In the previously shown software architecture MSG plays the role of an intermediate layer between SURF and GRAS, but is itself a high-level API, which is not very good design. Bruno Donassolo, during an internship with Arnaud, has developed an intermediate layer called SIMiX, and both GRAS and MSG are being rewritten on top of it.
Another development is that of SMPI, a framework to run unmodified MPI applications in either simulation mode or in real-world mode (sort of GRAS for MPI). The development of SMPI, by Mark Stillwell who works with Henri, is being greatly simplified thanks to the aforementioned SIMiX layer. Finally, somewhat unrelated, is the development of Java bindings for the MSG API by Malek Cherier who works with Martin. The current software architecture thus looks as follows:
(user code for either SimDAG, MSG, GRAS, or MPI) ---------------------------------- | | |jMSG| |AMOK| | | | -----| ------ | |SimDag| MSG | GRAS | SMPI | (Note that GRAS and SMPI also run on top of | --------------------------- sockets and MPI, not shown on the figure) | | SIMiX | ---------------------------------- | SURF | ----------------------------------
While the above developments are about adding simulation functionality, a large part of the research effort in the SimGrid project relates to simulation models. These models are implemented in SURF, and Arnaud has refactored SURF to make it more easily extensible so that one can experiment with different models, in particular different network models. Pedro Velho, who works with Arnaud, is currently experimenting with several new network models. Also, Kayo Fujiwara, who works with Henri, has interfaced SURF with (a patched version of) the GTNetS packet-level simulator.
The current architecture in the CVS tree at the time this document is being written is as follows:
---------------------------------- | | |jMSG| |AMOK| | | | ------ ------ | |SimDag| MSG | GRAS | SMPI | (Note that GRAS and SMPI also run on top of | | | ------- | sockets and MPI, not shown on the figure) | | | |SMURF| | | --------------------------- | | SIMiX | ---------------------------------- | SURF interface | ---------------------------------- | SURF kernel | | GTNetS | | (several models) | | | -------------------- ----------
The primary short-term future direction is to develop a distributed version of SIMiX to increase the scalability of simulations in terms of memory. This can be done using the GRAS "real world" functionality to run SIMiX in a distributed fashion across multiple hosts, thus allowing running simulations that are not limited by the amount of memory on a single host. The simulation itself would still be centralized and sequential, meaning that a single simulated process would run at a time. Bruno Donassolo is currently working on this idea, which is currently called SMURF.
Longer-term plans include:
- More development in AMOK
- Component for simulation visualization
- Model-checking in GRAS
- True parallel simulation
One of the constant challenges in this project is its duality: it is a useful tool for scientists (hence our efforts on APIs, portability, documentation, etc.), but is it also a scientific project in its own right (so that we can publish papers).
- Casanova, Henri (May 2001). "A Toolkit for the Simulation of Application Scheduling". First IEEE International Symposium on Cluster Computing and the Grid (CCGrid'01). Brisbane, Australia. pp. 430–441. doi:10.1109/CCGRID.2001.923223.
- "SimGrid download page". Retrieved November 27, 2012.
- "Official SimGrid Page". Retrieved November 27, 2012.
- SimGrid - official project homepage
- USS SimGrid (ANR 08 SEGI 022 Project)
- SONGS Simulation Of Next Generation Systems (ANR 11 INFRA XXX Project)
SimGrid, Casanova, H., Legrand, A. and Quinson, M., SimGrid: a Generic Framework for Large-Scale Distributed Experiments, 10th IEEE International Conference on Computer Modeling and Simulation}, 2008.
Refereed Papers about SimGrid
- Pedro Velho. Accurate and Fast Simulations of Large- Scale Distributed Computing Systems. PhD thesis, University Joseph Fourier, Grenoble, June 2011.
- Pierre-Nicolas Clauss, Mark Stillwell, Stéphane Genaud, Frédéric Suter, Henri Casanova, and Martin Quinson. Single Node On-Line Simulation of MPI Applications with SMPI. In International Parallel & Distributed Processing Symposium, Anchorange (AK), United States, May 2011. IEEE. [WWW]
- Cristian Rosa, Stephan Merz, and Martin Quinson. SimGrid MC: Verification Support for a Multi-API Simulation Platform. In 31th Formal Techniques for Networked and Distributed Systems—FORTE 2011, Reykjavik, Iceland, pages 274-288, June 2011.
- Henri Casanova, Frédéric Desprez, and Frédéric Suter. Minimizing Stretch and Makespan of Multiple Parallel Task Graphs via Malleable Allocations. In Proceedings of the 39th International Conference on Parallel Processing (ICPP'10), San Diego, CA, pages 71–80, September 2010.
- Bruno Donassolo, Henri Casanova, Arnaud Legrand, and Pedro Velho. Fast and Scalable Simulation of Volunteer Computing Systems Using SimGrid. In Proceedings of the Workshop on Large-Scale System and Application Performance (LSAP), Chicago, IL, June 2010.
- Sascha Hunold, Ralf Hoffmann, and Frédéric Suter. Jedule: A Tool for Visualizing Schedules of Parallel Applications. In Proceedings of the 1st International Workshop on Parallel Software Tools and Tool Infrastructures (PSTI'10), San Diego, CA, pages 169-178, September 2010.
- Cristian Rosa, Stephan Merz, and Martin Quinson. A Simple Model of Communication APIs—Application to Dynamic Partial-order Reduction. In 10th International Workshop on Automated Verification of Critical Systems—AVOCS 2010, Düsseldorf, Germany, pages 137-151, September 2010.
- Pedro Velho and Arnaud Legrand. Accuracy Study and Improvement of Network Simulation in the SimGrid Framework. In Proceedings of the 2nd International Conference on Simulation Tools and Techniques (SIMUTools'09), Rome, Italy, March 2009.
- Henri Casanova, Arnaud Legrand, and Martin Quinson. SimGrid: a Generic Framework for Large-Scale Distributed Experiments. In proceedings of the 10th IEEE International Conference on Computer Modeling and Simulation (UKSim), Cambridge, UK, April 2008.
- Kayo Fujiwara and Henri Casanova. Speed and Accuracy of Network Simulation in the SimGrid Framework. In Proceedings of the First International Workshop on Network Simulation Tools (NSTools), Nantes, France, October 2007.
- Kayo Fujiwara. Cost and Accuracy of Packet-Level vs. Analytical Network Simulations: An Empirical Study. Master's thesis, Department of Information and Computer Sciences, University of Hawai`i at Manoa, April 2007.
- Arnaud Legrand, Martin Quinson, Kayo Fujiwara, and Henri Casanova. The SimGrid Project - Simulation and Deployment of Distributed Applications. In Proceedings of the IEEE International Symposium on High Performance Distributed Computing (HPDC-15), Paris, France, pages 385-386, May 2006. Note: Poster. [PDF]
- Martin Quinson. Gras: A Research & Development Framework for Grid and P2P Infrastructures. In Proceedings of the 18th IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS'06), Dallas, TX, November 2006. Note: Best paper.
- Arnaud Legrand, Loris Marchal, and Henri Casanova. Scheduling Distributed Applications: the SimGrid Simulation Framework. In Proceedings of the third IEEE International Symposium on Cluster Computing and the Grid (CCGrid'03), Tokyo, Japan, pages 138-145, May 2003.
- Henri Casanova and Loris Marchal. A Network Model for Simulation of Grid Application. Research Report 2002-40, LIP, ENS Lyon, France, 2002.
- Arnaud Legrand and Julien Lerouge. MetaSimGrid : Towards Realistic Scheduling Simulation of Distributed Applications. Research Report 2002-28, LIP, ENS Lyon, France, 2002. [POSTSCRIPT]
- Henri Casanova. Simgrid: A Toolkit for the Simulation of Application Scheduling. In Proceedings of the First IEEE International Symposium on Cluster Computing and the Grid (CCGrid'01), Brisbane, Australia, pages 430-441, May 2001. [doi:http://doi.ieeecomputersociety.org/10.1109/CCGRID.2001.923223]