A foundation model is an AI model that is trained on broad data such that it can be applied across a wide range of use cases.[1] Foundation models have transformed AI, powering prominent chatbots and generative AI.[1] The Stanford Institute for Human-Centered Artificial Intelligence's (HAI) Center for Research on Foundation Models (CRFM) created and popularized the term.[2]

Foundation models are general-purpose technologies that can support a diverse range of use cases. Building foundation models is often highly resource-intensive, with the most expensive models costing hundreds of millions of dollars to pay for the underlying data and compute.[3] Early examples of foundation models were pre-trained language models (LMs) like Google's BERT[4] and OpenAI's "GPT-n" series. Beyond text, foundation models have been developed across a range of modalities—including DALL-E and Flamingo[5] for images, MusicGen[6] for music, and RT-2[7] for robotic control. Foundation models constitute a broad shift in AI development: foundation models are being built for astronomy,[8] radiology,[9] robotics,[10] genomics,[11] music,[12] coding,[13] and mathematics.[14]

Definitions edit

The Stanford Institute for Human-Centered Artificial Intelligence's (HAI) Center for Research on Foundation Models (CRFM) coined the term "foundation model" in August 2021 to mean "any model that is trained on broad data (generally using self-supervision at scale) that can be adapted (e.g., fine-tuned) to a wide range of downstream tasks".[15] This was based on their observation that preexisting terms, while overlapping, were not adequate, stating that "'(large) language model' was too narrow given [the] focus is not only language; 'self-supervised model' was too specific to the training objective; and 'pretrained model' suggested that the noteworthy action all happened after 'pretraining."[16] After considering many terms, they settled on "foundation model" to emphasize the intended function (i.e., amenability to subsequent further development) rather than modality, architecture, or implementation. The term “foundation model” was chosen over “foundational model”[17] because “foundational” implies that these models provide fundamental principles in a way that “foundation” does not.[18]

As governments regulate foundation models, new legal definitions have emerged.

Overall, while many of these definitions stick close to the original Stanford CRFM definition, they do introduce some subtle distinctions. For example, the U.S. definition is the sole definition to make reference to the size of a foundation model. In contrast, the E.U. definition includes mention of whether the model is designed for generality of output. Nonetheless, all definitions share that foundation models must be trained on a broad range of data with potential applications in many domains.

Personalizing foundation models edit

Since foundation models are pre-trained on a massive dataset, they are not capable of handling specific "personal" concepts that a user may be interested in. A series of methods were designed to augment a foundation model with personal, specific items without retraining the full model. For example, for few-shot image retrieval it was shown how to adapt a vision-language foundation model (CLIP) by adding new concept to its vocabulary.[19] For text-to-image generation, an approach called textual inversion[20] can be similarly used to teach the system new concept that can later be generated in conjunction with the concepts that the foundation model is already familiar with.

Opportunities and risks edit

A 2021 arXiv report listed foundation models' capabilities in regards to "language, vision, robotics, reasoning, and human interaction", technical principles, such as "model architectures, training procedures, data, systems, security, evaluation, and theory", their applications, for example in law, healthcare, and education and their potential impact on society, including "inequity, misuse, economic and environmental impact, legal and ethical considerations".[15]

An article about foundation models in The Economist notes that "some worry that the technology's heedless spread will further concentrate economic and political power".[21]

References edit

  1. ^ a b c Competition and Markets Authority (2023). AI Foundation Models: Initial Report. Available at: https://assets.publishing.service.gov.uk/media/65081d3aa41cc300145612c0/Full_report_.pdf
  2. ^ "Introducing the Center for Research on Foundation Models (CRFM)". Stanford HAI. Retrieved 11 June 2022.
  3. ^ Nestor Maslej, Loredana Fattorini, Erik Brynjolfsson, John Etchemendy, Katrina Ligett, Terah Lyons, James Manyika, Helen Ngo, Juan Carlos Niebles, Vanessa Parli, Yoav Shoham, Russell Wald, Jack Clark, and Raymond Perrault, “The AI Index 2023 Annual Report,” AI Index Steering Committee, Institute for Human-Centered AI, Stanford University, Stanford, CA, April 2023.
  4. ^ Rogers, Anna; Kovaleva, Olga; Rumshisky, Anna (2020). "A Primer in BERTology: What we know about how BERT works". arXiv:2002.12327 [cs.CL].
  5. ^ Tackling multiple tasks with a single visual language model, 28 April 2022, retrieved 13 June 2022
  6. ^ Copet, Jade; Kreuk, Felix; Gat, Itai; Remez, Tal; Kant, David; Synnaeve, Gabriel; Adi, Yossi; Défossez, Alexandre (7 November 2023), Simple and Controllable Music Generation, doi:10.48550/arXiv.2306.05284, retrieved 11 December 2023
  7. ^ "Speaking robot: Our new AI model translates vision and language into robotic actions". Google. 28 July 2023. Retrieved 11 December 2023.
  8. ^ Nguyen, Tuan Dung; Ting, Yuan-Sen; Ciucă, Ioana; O'Neill, Charlie; Sun, Ze-Chang; Jabłońska, Maja; Kruk, Sandor; Perkowski, Ernest; Miller, Jack (12 September 2023), AstroLLaMA: Towards Specialized Foundation Models in Astronomy, doi:10.48550/arXiv.2309.06126, retrieved 11 December 2023
  9. ^ Tu, Tao; Azizi, Shekoofeh; Driess, Danny; Schaekermann, Mike; Amin, Mohamed; Chang, Pi-Chuan; Carroll, Andrew; Lau, Chuck; Tanno, Ryutaro (26 July 2023), Towards Generalist Biomedical AI, doi:10.48550/arXiv.2307.14334, retrieved 11 December 2023
  10. ^ Ahn, Michael; Brohan, Anthony; Brown, Noah; Chebotar, Yevgen; Cortes, Omar; David, Byron; Finn, Chelsea; Fu, Chuyuan; Gopalakrishnan, Keerthana (16 August 2022), Do As I Can, Not As I Say: Grounding Language in Robotic Affordances, doi:10.48550/arXiv.2204.01691, retrieved 11 December 2023
  11. ^ Zvyagin, Maxim; Brace, Alexander; Hippe, Kyle; Deng, Yuntian; Zhang, Bin; Bohorquez, Cindy Orozco; Clyde, Austin; Kale, Bharat; Perez-Rivera, Danilo (11 October 2022), GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics, doi:10.1101/2022.10.10.511571, retrieved 11 December 2023
  12. ^ Engineering, Spotify (13 October 2023). "LLark: A Multimodal Foundation Model for Music". Spotify Research. Retrieved 11 December 2023.
  13. ^ Li, Raymond; Allal, Loubna Ben; Zi, Yangtian; Muennighoff, Niklas; Kocetkov, Denis; Mou, Chenghao; Marone, Marc; Akiki, Christopher; Li, Jia (9 May 2023), StarCoder: may the source be with you!, doi:10.48550/arXiv.2305.06161, retrieved 11 December 2023
  14. ^ Azerbayev, Zhangir; Schoelkopf, Hailey; Paster, Keiran; Santos, Marco Dos; McAleer, Stephen; Jiang, Albert Q.; Deng, Jia; Biderman, Stella; Welleck, Sean (30 November 2023), Llemma: An Open Language Model For Mathematics, doi:10.48550/arXiv.2310.10631, retrieved 11 December 2023
  15. ^ a b Bommasani, Rishi; Hudson, Drew A.; Adeli, Ehsan; Altman, Russ; Arora, Simran; von Arx, Sydney; Bernstein, Michael S.; Bohg, Jeannette; Bosselut, Antoine; Brunskill, Emma; Brynjolfsson, Erik; Buch, Shyamal; Card, Dallas; Castellon, Rodrigo; Chatterji, Niladri; Chen, Annie; Creel, Kathleen; Davis, Jared Quincy; Demszky, Dora; Donahue, Chris; Doumbouya, Moussa; Durmus, Esin; Ermon, Stefano; Etchemendy, John; Ethayarajh, Kawin; Fei-Fei, Li; Finn, Chelsea; Gale, Trevor; Gillespie, Lauren; Goel, Karan; Goodman, Noah; Grossman, Shelby; Guha, Neel; Hashimoto, Tatsunori; Henderson, Peter; Hewitt, John; Ho, Daniel E.; Hong, Jenny; Hsu, Kyle; Huang, Jing; Icard, Thomas; Jain, Saahil; Jurafsky, Dan; Kalluri, Pratyusha; Karamcheti, Siddharth; Keeling, Geoff; Khani, Fereshte; Khattab, Omar; Koh, Pang Wei; Krass, Mark; Krishna, Ranjay; Kuditipudi, Rohith; Kumar, Ananya; Ladhak, Faisal; Lee, Mina; Lee, Tony; Leskovec, Jure; Levent, Isabelle; Li, Xiang Lisa; Li, Xuechen; Ma, Tengyu; Malik, Ali; Manning, Christopher D.; Mirchandani, Suvir; Mitchell, Eric; Munyikwa, Zanele; Nair, Suraj; Narayan, Avanika; Narayanan, Deepak; Newman, Ben; Nie, Allen; Niebles, Juan Carlos; Nilforoshan, Hamed; Nyarko, Julian; Ogut, Giray; Orr, Laurel; Papadimitriou, Isabel; Park, Joon Sung; Piech, Chris; Portelance, Eva; Potts, Christopher; Raghunathan, Aditi; Reich, Rob; Ren, Hongyu; Rong, Frieda; Roohani, Yusuf; Ruiz, Camilo; Ryan, Jack; Ré, Christopher; Sadigh, Dorsa; Sagawa, Shiori; Santhanam, Keshav; Shih, Andy; Srinivasan, Krishnan; Tamkin, Alex; Taori, Rohan; Thomas, Armin W.; Tramèr, Florian; Wang, Rose E.; Wang, William; Wu, Bohan; Wu, Jiajun; Wu, Yuhuai; Xie, Sang Michael; Yasunaga, Michihiro; You, Jiaxuan; Zaharia, Matei; Zhang, Michael; Zhang, Tianyi; Zhang, Xikun; Zhang, Yuhui; Zheng, Lucia; Zhou, Kaitlyn; Liang, Percy (18 August 2021). On the Opportunities and Risks of Foundation Models (Report). arXiv:2108.07258.
  16. ^ "Reflections on Foundation Models". Stanford HAI. 18 October 2021. Retrieved 22 May 2023.
  17. ^ Bommasani, Rishi; Liang, Percy (18 October 2021). "Reflections on Foundation Models". Stanford CRFM. Retrieved 11 December 2023.
  18. ^ Marcus, Gary (11 September 2021). "Has AI found a new Foundation?". The Gradient. Retrieved 11 December 2023.
  19. ^ Cohen, Niv; Gal, Rinon; Meirom, Eli A.; Chechik, Gal; Atzmon, Yuval (23 October 2022). ""This is My Unicorn, Fluffy": Personalizing Frozen Vision-Language Representations". Computer Vision – ECCV 2022. Lecture Notes in Computer Science. Vol. 13680. Berlin, Heidelberg: Springer-Verlag. pp. 558–577. arXiv:2204.01694. doi:10.1007/978-3-031-20044-1_32. ISBN 978-3-031-20043-4.
  20. ^ Gal, Rinon; Alaluf, Yuval; Atzmon, Yuval; Patashnik, Or; Bermano, Amit H.; Chechik, Gal; Cohen-Or, Daniel (2 August 2022). "An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion". arXiv:2208.01618 [cs.CV].
  21. ^ "Huge "foundation models" are turbo-charging AI progress". The Economist. ISSN 0013-0613. Retrieved 24 October 2022.