Getting started with Generative Text and Fine-tuning LLMs in Hugging Face

img

This is the git repo for the files for the NLP from scratch “Getting started with Generative Text and Fine-tuning LLMs in Hugging Face” workshop presented at the 8th Annual Toronto Machine Learning Summit (TMLS), on Thursday, July 11th, 2024.

Description

Talk Abstract:
If you’re new to working with LLMs hands-on in code, this is the session for you! In this introductory workshop, you’ll get working with Hugging Face and the transformers library for generating text from LLMs and applying performance efficient fine-tuning methods to a generative text model.

Whether you are starting from near zero or have some prior knowledge of large language models, this workshop is your jumping off point to get you started on working with LLMs.

What You’ll Learn:

  • Define large language models (LLMs) and the transformer architecture; understand the history of their development, key concepts, and high-level details of their structure and function
  • Be introduced to Hugging Face and the transformers library and see applications thereof in code, using LLMs for generative text
  • Define fine-tuning and understand the motivation for applying it to existing large language models for generative text
  • Applying fine-tuning to a generative text model using the Hugging Face transformers library and a text dataset
  • Be introduced to approaches for working with large language models efficiently on consumer hardware: performance efficient fine-tuning (PEFT) and model quantization

Dependencies

These notebooks can be run entirely in Google Colab. If you wish to run these locally in your own python install (or virtualenv / conda environment), you should install the following dependencies:

pip install transformers datasets accelerate evaluate bitsandbytes peft huggingface_hub

Files

Authors

Myles Harrison, Consultant & Trainer at NLP from scratch.