According to a business document dated Sept. 20, 2022, Meta CEO Mark Zuckerberg assembled his top lieutenants for a five-hour analysis of the firm’s processing power, focusing on its capability to perform cutting-edge artificial intelligence work.
The social media giant had a tricky issue: in spite of high-profile investments in AI research, it had been slow to adopt pricey AI-friendly software and hardware platforms for its core business, impeding its ability to keep up with innovation at scale even as it progressively depended on AI to fuel its expansion, in accordance to the memo, company statements, and conversations with 12 people familiar with the changes who spoke on the condition of anonymity about internal competition.
In terms of building AI, we have a large gap in our tooling, workflows, and procedures. The letter, authored by Santosh Janardhan, the new head of infrastructure, was published on Meta’s internal discussion board in September and is currently being distributed.
In order to support AI work, Meta would have to “fundamentally alter our infrastructure’s design, our software networks, and our strategy for providing a stable platform,” the report noted.
Meta has been working on a sizable initiative to get its AI network in shape for over a year. Details of the makeover, which included capacity constraints, leadership changes, and a shelved AI chip project, have not previously been revealed, despite the company’s public admission that it is “playing a little bit of catch-up” on AI hardware trends.
A spokeswoman for Meta, Jon Carvill, responded to questions about the memo and the restructure by saying the business “has an established history in developing and implementing cutting-edge technology at scale alongside substantial experience in AI research and engineering.”
As we add innovative AI-powered encounters to our family of applications and consumer products, we’re optimistic about our ability to keep enhancing the capabilities of our infrastructure to suit both our immediate and long-term demands, added Carvill. He would not say whether Meta had given up on its AI chip.
Attempts for interviews offered via the corporation were declined by Janardhan and the other executives.
According to corporate reports, the redesign increased Meta’s capital expenditures by around $4 billion a quarter – virtually doubling its spending as of 2021 – and forced it to postpone or cancel previously scheduled data center developments in four locations.
Those investments came at a time when Meta was experiencing extreme financial hardship; since November, it has been firing staff at a rate not seen since the dotcom crisis.
An arms race across tech giants to introduce products employing so-called generative AI, which, in addition to recognizing trends in data like other AI, establishes human-like written and visual content in reaction to prompts, has been sparked by Microsoft-backed OpenAI’s ChatGPT, which after its Nov. 30 debut surged into the position of the most rapidly expanding consumer application in history.
Five of the sources claimed that generative AI devours vast amounts of computer resources, intensifying the urgency of Meta’s capacity scramble.
Meta falling behind Mark Zuckerberg
Those five sources claimed that Meta’s tardy adoption of the graphics processing unit, or GPU, for AI development, was a major contributor to the issue.
Because they can do several jobs at once, GPU chips are ideally designed for artificial intelligence processing as they can quickly process billions of bits of data.
Although Nvidia Corp. controls 80% of the industry and holds a commanding lead in supporting software, GPUs are also far more costly than other processors, according to the sources.
A request made for clarification for this story from Nvidia was not met.
The company’s fleet of commodities central processing units (CPUs), the primary chip of the computer industry that has long populated data centers, was used by Meta to run AI workloads instead until last year. However, AI workloads performed poorly on commodity CPUs.
Two of those people claim that the business also began utilizing a unique chip that it had created in-house for inference, an AI procedure where algorithms trained on vast quantities of data make decisions and provide replies to requests.
By 2021, the two sources claimed, the two-pronged strategy had proven to be slower and less effective than one based on GPUs, which were also more adaptable in running various models than Meta’s processor.
On the effectiveness of its AI processor, Meta declined to comment.
Four of the sources claimed that as Zuckerberg steered the company towards the metaverse, a collection of digital worlds made possible by augmented and virtual reality, a capacity crunch was impeding its ability to use AI to counter threats like the emergence of social networking rival TikTok and Apple’s changes to ad privacy.
Peter Thiel, a previous member of the Meta board, noticed the errors and abruptly resigned in early 2022.
According to two persons familiar with the situation, Thiel complained to Zuckerberg and his colleagues at a board meeting before he left that they were overly concerned with the metaverse and complacent about Meta’s main social network business, which left the firm open to TikTok’s threat.
Meta opted not to remark on the exchange.
Meta catching up with Mark Zuckerberg
Executives switched direction and made orders for billions of dollars worth of Nvidia GPUs in 2022 instead of launching Meta’s own bespoke inference hardware on a massive scale as originally intended, a source claimed.
On the order, Meta opted not to comment.
By that time, Meta had already fallen behind rivals like Google, who had started deploying its own specifically designed GPUs in 2015 under the name TPU.
Executives began reorganizing Meta’s AI divisions that spring as well, hiring Janardhan, the creator of the September message, as one of two new engineering chiefs.
In accordance with their LinkedIn profiles and a person familiar with the departures, more than a dozen executives departed Meta over the months-long turmoil, representing an almost complete transition in the leadership of the AI infrastructure.
In order to handle the forthcoming GPUs, which must be packed closely together with specialized networking between them as they demand more power and generate more heat than CPUs, Meta began redesigning its data centers.
The facilities needed to be “entirely redesigned,” according to Janardhan’s memo and four persons familiar with the project, the specifics of which have not yet been made public. The facilities required 24 to 32 times the networking capacity and new liquid cooling systems to regulate the clusters’ heat.
As the project got started, Meta had internal plans to begin creating a new, more ambitious internal processor that, like a GPU, would be able to handle both inference and model training. According to two sources, the project, which hasn’t been previously publicized, is expected to be completed in or around 2025.
The building of data centers, which was put on hold while the company switched to the new designs, will continue later this year, according to Carvill, a spokeswoman for Meta. On the chip project, he opted not to comment.
Trade-offs
While expanding its GPU capacity, Meta has so far not made much of a splash while rivals like Microsoft and Google advertise the public debuts of their own generative AI technologies.
According to Chief Financial Officer Susan Li, “basically all of our AI capacity is going towards ads, feeds, and Reels,” Meta’s short video format akin to TikTok is well-liked by younger users, but the company is not currently allocating much of its compute to generative work.
Four of the individuals claim that Meta did not give generative AI products a high priority prior to the November launch of ChatGPT. Despite the fact that it’s research unit FAIR, or Facebook AI Research, has been disseminating technological concepts since late 2021, the business was not focused on turning its renowned
product-related studies.
That is altering as investor interest skyrockets. In February, Zuckerberg revealed a new top-tier generative AI team that he said would “turbocharge” the business’s efforts in the field.
A product from Meta is expected to be released this year, according to Chief Technology Officer Andrew Bosworth, who also stated last month that generative AI was the area in which he and CEO Mark Zuckerberg were investing the most effort.
According to two people who are aware of the new team, its work is still in its early phases and is focused on creating a foundation model—a basic program that can subsequently be adjusted and customized for other goods.
The business has been creating generative AI products on several teams for more than a year, according to Carvill, a spokeswoman for Meta. He acknowledged that since ChatGPT’s arrival, the work has advanced.