Loading…
Attending this event?
Friday September 20, 2024 13:30 - 14:30 MDT
Have you ever wanted to run a Llama 2 model in C++? In this talk, we'll dive into C++ techniques for Llama 2 model inference. We'll start with a model trained in PyTorch and explore different ways to implement the inference solution for the Llama 2 language model, focusing on keeping things simple and minimal.

Llama 2 is a cutting-edge language model that's making waves in the field of natural language processing. It can generate human-like text, understand complex language tasks, and is used in everything from chatbots to content creation. Its efficiency and accuracy are setting new industry standards, making it an invaluable tool for developers and researchers.

Inspired by the awesome llama.cpp and llama2.c projects, this talk aims to show a simple and educational approach. We'll hard-code the Llama 2 architecture and create the inference in pure C++ with no dependencies. Join us to learn how to build Llama 2 models efficiently using a streamlined and dependency-free C++ solution. If time allows, we will also explore additional C++ techniques for fast Llama 2 model inference.

By the end of the talk, you'll have an understanding of what Llama 2 is and how it works. You'll also learn practical ways to implement the Llama 2 model inference in C++ and potential optimizations.
Speakers
avatar for Filipe Mulonde

Filipe Mulonde

Modelling Engineer(GPU), ARM
Filipe Mulonde is a GPU modeling engineer at ARM Holdings, where he works on the world's most sold mobile GPUs. ARM is a global leader, producing technology that powers countless devices worldwide. Filipe holds a bachelor's degree in Software Engineering and a master's degree in Artificial... Read More →
Friday September 20, 2024 13:30 - 14:30 MDT
Maple 3/4/5

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link