This article focuses too much on specific examples without explaining their importance to its main subject. (March 2017)
In software engineering, software aging refers to all software's tendency to fail, or cause a system failure after running continuously for a certain time. As the software gets older it becomes less immune and will eventually stop functioning as it should, therefore rebooting or reinstalling the software can be seen as a short term fix. A proactive fault management method to deal with the software aging incident is software rejuvenation. This method can be classified as an environment diversity technique that usually is implemented through software rejuvenation agents (SRA).
From both an academic and industrial point of view, the software aging phenomenon has increased. The main focus has been to understand its effects from a verifiable observation and theoretical understanding.
"Programs, like people, get old. We can't prevent aging, but we can understand its causes, take steps to limit its effects, temporarily reverse some of the damage it has caused, and prepare for the day when the software is no longer viable."
Memory bloating and leaking, along with data corruption and unreleased file-locks are particular causes of software aging.
Proactive management of software agingEdit
Software failures are a more likely cause of unplanned systems outages compared to hardware failures. This is because software exhibits over time an increasing failure rate due to data corruption, numerical error accumulation and unlimited resource consumption. In widely used and specialized software, a common action to clear a problem is rebooting because aging occurs due to the complexity of software which is never free of errors. It is almost impossible to fully verify that a piece of software is bug-free. Even high-profile software such as Windows and macOS must receive continual updates to improve performance and fix bugs. Software development tends to be driven by the need to meet release deadlines rather than to ensure long-term reliability. Designing software that can be immune to aging is difficult. Not all software will age at the same rate as some users use the system more intensively than others.
To prevent crashes or degradation software rejuvenation can be employed proactively as inevitable aging leads to failures in software systems. This proactive technique was identified as a cost-effective solution during research at the AT&T Bell Laboratories on fault-tolerant software in the 1990s. Software rejuvenation works by removing accumulated error conditions and freeing up system resources, for example by flushing operating system kernel tables, using garbage collection, reinitializing internal data structures, and perhaps the most well known rejuvenation method is to reboot the system.
There are simple techniques and complex techniques to achieve rejuvenation. The method most individuals are familiar with is the hardware or software reboot. A more technical example would be the web server software Apache's rejuvenation method. Apache implements one form of rejuvenation by killing and recreating processes after serving a certain number of requests. Another technique is to restart virtual machines running in a cloud computing environment.
Some systems which have employed software rejuvenation methods include:
- Transaction processing systems
- Web servers
- Spacecraft systems
- Design, implementation, and evaluation of rejuvenation mechanisms
- Modeling, analysis, and implementation of rejuvenation scheduling
- Software rejuvenation benchmarking
In systems that use an OS user programs have to request memory blocks in order to perform an operation. After this operation (e.g. a subroutine) is completed, the program is expected to free up all the memory blocks allocated for it in order to make it available to other programs for use. In programming languages without a garbage collector (e.g. C and C++) it's up to the programmer to call the necessary memory releasing functions and to account for all the unused data within the program. However this doesn't always happen. Due to software bugs the program might consume more and more memory eventually causing the system to run out of memory. In low memory conditions, the system usually functions slower due to the performance bottleneck caused by intense swapping (thrashing), applications become unresponsive and those that request large amounts of memory unexpectedly may crash. In case the system runs out of both memory and swap even the OS might crash causing the whole system to reboot.
Programs written in programming languages that use a garbage collector (e.g. Java) usually rely on this feature for avoiding memory leaks. Thus the "aging" of these programs is at least partially dependent on the quality of the garbage collector built into the programming language's runtime environment itself.
Sometimes critical components of the OS itself can be a source of memory leaks and be the main culprit behind system stability issues. In Microsoft Windows, for example, the memory use of Windows Explorer plug-ins and long-lived processes such as services can impact the reliability of the system to the point of making it unusable. A reboot might be needed to make the system work again.
Software rejuvenation helps with memory leaks as it forces all the memory used by an application to be released. The application can be restarted but starts with a clean slate.
This section does not cite any sources. (March 2017) (Learn how and when to remove this template message)
Two methods for implementing rejuvenation are:
- Time based rejuvenation
- Prediction based rejuvenation
Garbage collection is a form of automatic memory management whereby the system automatically recovers unused memory. For example, the .NET Framework manages the allocation and release of memory for software running under it. But automatically tracking these objects takes time and is not perfect.
.NET based web services manage several logical types of memory such as stack, unmanaged and managed heap (free space). As the physical memory gets full, the OS writes rarely-used parts of it to disk, so that it can reallocate it to another application, a process known as paging or swapping. But if the memory does need to be used, it must be reloaded from disk. If several applications are all making large demands, the OS can spend much of its time merely moving data between main memory and disk, a process known as disk thrashing. Since the garbage collector has to examine all of the allocations to decide which are in use, it may exacerbate this thrashing. As a result, extensive swapping can lead to garbage collection cycles extended from milliseconds to tens of seconds. This results in usability problems.
- Shereshevsky, Mark; Crowell, Jonathan; Cukic, Bojan; Gandikota, Vijai; Liu, Yan (2003-01-01). Software Aging and Multifractality of Memory Resources. 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. 0. Los Alamitos, CA, USA: IEEE Computer Society. p. 721. doi:10.1109/DSN.2003.1209987. ISBN 978-0-7695-1952-4.
- Grottke, M.; Matias, R.; Trivedi, K.S. (2008-11-01). The fundamentals of software aging. IEEE International Conference on Software Reliability Engineering Workshops, 2008. ISSRE Wksp 2008. pp. 1–6. doi:10.1109/ISSREW.2008.5355512. ISBN 978-1-4244-3416-9.
- Parnas, D.L. (1994-05-01). Software aging. 16th International Conference on Software Engineering, 1994. Proceedings. ICSE-16. pp. 279–287. doi:10.1109/ICSE.1994.296790. ISBN 978-0-8186-5855-6.
- "Oatd: -".
- Garg, S.; van Moorsel, A.; Vaidyanathan, K.; Trivedi, K.S. (1998-11-01). A methodology for detection and estimation of software aging. The Ninth International Symposium on Software Reliability Engineering, 1998. Proceedings. pp. 283–292. doi:10.1109/ISSRE.1998.730892. ISBN 978-0-8186-8991-8.
- Castelli, V.; Harper, R.E.; Heidelberger, P.; Hunter, S.W.; Trivedi, K.S.; Vaidyanathan, K.; Zeggert, W.P. (2001-03-01). "Proactive management of software aging". IBM Journal of Research and Development. 45 (2): 311–332. CiteSeerX 10.1.1.28.7273. doi:10.1147/rd.452.0311. ISSN 0018-8646.
- Gross, K.C.; Bhardwaj, V.; Bickford, R. (2002-12-01). Proactive detection of software aging mechanisms in performance critical computers. 27th Annual NASA Goddard/IEEE Software Engineering Workshop, 2002. Proceedings. pp. 17–23. doi:10.1109/SEW.2002.1199445. ISBN 978-0-7695-1855-8.
- Cotroneo, D., Natella, R., Pietrantuono, R., and Russo, S. 2014. A survey of software aging and rejuvenation studies. ACM J. Emerg. Technol. Comput. Syst. 10, 1, Article 8 (January 2014), 34 pages.
- Trivedi, K. S. and Vaidyanathan, K. 2007. Software Aging and Rejuvenation. Wiley Encyclopedia of Computer Science and Engineering.
- Bruneo, Dario; Distefano, Salvatore; Longo, Francesco; Puliafito, Antonio; Scarpa, Marco (2013). "Workload-Based Software Rejuvenation in Cloud Systems". IEEE Transactions on Computers. 62 (6): 1072–1085. doi:10.1109/TC.2013.30.
- Trivedi, Kishor S.; Vaidyanathan, Kalyanaraman (2004-01-01). Reis, Ricardo, ed. Software Rejuvenation - Modeling and Analysis. IFIP International Federation for Information Processing. Springer US. pp. 151–182. doi:10.1007/1-4020-8159-6_6. ISBN 978-1-4020-8158-3.
- Li, Lei; Vaidyanathan, K.; Trivedi, K.S. (2002-01-01). An approach for estimation of software aging in a Web server. Empirical Software Engineering, 2002. Proceedings. 2002 International Symposium N. pp. 91–100. doi:10.1109/ISESE.2002.1166929. ISBN 978-0-7695-1796-4.
- "Overview of Memory Leaks". msdn.microsoft.com. Retrieved 2015-11-04.
- Martin Brown and Ken Milberg (16 November 2010). "Optimizing AIX 7 memory performance Part 3, Tuning swap space settings".CS1 maint: Uses authors parameter (link)
- "Preventing Memory Leaks in Windows Applications (Windows)". msdn.microsoft.com. Retrieved 2015-11-04.
- S.R., Chaitra; Basu, Anirban (2012). "Software Rejuvenation in Web Services" (PDF). International Journal of Computer Applications. 54 (8): 31–35. doi:10.5120/8589-2340.
- R. Matias Jr. and P. J. Freitas Filho, "An experimental study on software aging and rejuvenation in web servers," Proceedings of the 30th Annual International Computer Software and Applications Conference (COMPSAC'06), Vol. 01, pp. 189 – 196, 2006.
- M. Grottke, R. Matias Jr., and K. S. Trivedi, "The Fundamentals of Software Aging," Workshop of Software Aging and Rejuvenation (WoSAR/ISSRE), 2008.
- R. Matias Jr, P. Barbetta, K. Trivedi, P. Freitas Filho "Accelerated Degradation Tests Applied to Software Aging Experiments," IEEE Transactions on Reliability 59(1): 102-114,2010.
- M. Grottke, L. Li, K. Vaidyanathan, and K.S. Trivedi, "Analysis of software aging in a web server," IEEE Transactions on Reliability, vol. 55, no. 3, pp. 411–420, 2006.
- M. Grottke, K. Trivedi, "Fighting Bugs: Remove, Retry, Replicate, and Rejuvenate," IEEE Computer 40(2): 107-109, 2007.
- More papers on Proceedings of Workshop of Software Aging and Rejuvenation (WoSAR'08,'10, '11, '12, '13, '14) at IEEE Xplore.