Saturday, April 13, 2024
HomeTechnologyMeta quietly releases Llama 2 Lengthy AI mannequin

Meta quietly releases Llama 2 Lengthy AI mannequin

[ad_1]

VentureBeat presents: AI Unleashed – An unique govt occasion for enterprise knowledge leaders. Community and be taught with business friends. Study Extra


Meta Platforms confirmed off a bevy of latest AI options for its consumer-facing companies Fb, Instagram and WhatsApp at its annual Meta Join convention in Menlo Park, California, this week.

However the largest information from Mark Zuckerberg’s firm might have really come within the type of a pc science paper printed with out fanfare by Meta researchers on the open entry and non-peer reviewed web site arXiv.org.

The paper introduces Llama 2 Lengthy, a brand new AI mannequin primarily based on Meta’s open supply Llama 2 launched in the summertime, however that has undergone “continuous pretraining from Llama 2 with longer coaching sequences and on a dataset the place lengthy texts are upsampled,” in keeping with the researcher-authors of the paper.

Because of this, Meta’s newly elongated AI mannequin outperforms a few of the main competitors in producing responses to lengthy (greater character rely) person prompts, together with OpenAI’s GPT-3.5 Turbo with 16,000-character context window, in addition to Claude 2 with its 100,000-character context window.

Occasion

AI Unleashed

An unique invite-only night of insights and networking, designed for senior enterprise executives overseeing knowledge stacks and techniques.

 


Study Extra

How LLama 2 Lengthy got here to be

Meta researchers took the unique Llama 2 obtainable in its totally different coaching parameter sizes — the values of knowledge and knowledge the algorithm can change by itself because it learns, which within the case of Llama 2 are available in 7 billion, 13 billion, 34 billion, and 70 billion variants — and included extra longer textual content knowledge sources than the unique Llama 2 coaching dataset. One other 400 billion tokens-worth, to be precise.

Then, the researchers saved the unique Llama 2’s structure the identical, and solely made a “obligatory modification to the positional encoding that’s essential for the mannequin to attend longer.”

That modification was to the Rotary Positional Embedding (RoPE) encoding, a way of programming the transformer mannequin underlying LLMs resembling Llama 2 (and LLama 2 Lengthy), which primarily maps their token embeddings (the numbers used to signify phrases, ideas, and concepts) onto a 3D graph that reveals their positions relative to different tokens, even when rotated. This permits a mannequin to provide correct and useful responses, with much less info (and thus, much less computing storage taken up) than different approaches.

The Meta researchers “decreased the rotation angle” of its RoPE encoding from Llama 2 to Llama 2 Lengthy, which enabled them to make sure extra “distant tokens,” these occurring extra hardly ever or with fewer different relationships to different items of knowledge, have been nonetheless included within the mannequin’s data base.

Utilizing reinforcement studying from human suggestions (RLHF), a typical AI mannequin coaching methodology the place AI is rewarded for proper solutions with human oversight to examine it, and artificial knowledge generated by Llama 2 chat itself, the researchers have been capable of enhance its efficiency in frequent LLM duties together with coding, math, language understanding, frequent sense reasoning, and answering a human person’s prompted questions.

Graph of Llama 2 Lengthy outcomes taken from the paper “Efficient Lengthy-Context Scaling of Basis Fashions,” dated September 27, 2023.

With such spectacular outcomes relative to each Llama 2 common and Anthropic’s Claude 2 and OpenAI’s GPT-3.5 Turbo, it’s little marvel the open-source AI neighborhood on Reddit and Twitter and Hacker Information have been expressing their admiration and pleasure about Llama 2 because the paper’s launch earlier this week — it’s an enormous validation of Meta’s “open supply” strategy towards generative AI, and signifies that open supply can compete with the closed supply, “pay to play” fashions supplied by well-funded startups.

VentureBeat’s mission is to be a digital city sq. for technical decision-makers to achieve data about transformative enterprise know-how and transact. Uncover our Briefings.



[ad_2]

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments