{"cells":[{"cell_type":"markdown","metadata":{"id":"fwukZZnNTYWE"},"source":["<a href=\"https://www.nlpfromscratch.com?utm_source=notebook&utm_medium=nb-header&utm_campaign=TMLS2024\"><center><img src=\"https://drive.google.com/uc?export=view&id=1-lt6Uft8lgBG9jPD0dO6w3dAcv_EUQRP\"></center></a>\n","\n","# Getting Started with Generative Text and Fine-tuning LLMs in Hugging Face 🤗\n","\n","<div width=\"100%\"><img src=\"https://raw.githubusercontent.com/nlpfromscratch/workshops/master/finetuning-TMLS-2024/speaker_card.png\"/></div>\n","\n","Copyright, NLP from scratch, 2024.\n","\n","[nlpfromscratch.com](https://www.nlpfromscratch.com)\n","\n","------------"]},{"cell_type":"markdown","metadata":{"id":"uuznFjxWVV0n"},"source":["## Introduction & Setup\n","In this notebook, we will explore Large Language Models (LLMs) for generative text, and show how they can be leveraged the open source libraries from [Hugging Face](https://huggingface.co/).\n","\n","This notebook is best run in [Google Colab](https://colab.research.google.com/), where the majority of dependencies are already installed. However, if you wish to run the notebook locally, please follow the [directions for setting up a local environment](https://drive.google.com/file/d/1EV1seK-dUHRCzj2EDuu3ETAhUyjzOGRd/view?usp=drive_link) and you may then download the notebook as a `.ipynb` and run in either Jupyter or Jupyterlab.\n","\n","Since we will be using GPU in this notebook for compute-intensive tasks, please ensure that if running on Colab the runtime type is set to GPU. In the menu in Colab, select *Runtime -> Change runtime type*, then select T4 GPU (if using Colab Free) or another GPU instance type if using Colab Pro.\n","\n","<center><img src=\"https://drive.google.com/uc?export=view&id=1t5rRQIHd12xVFXHRvzaDnjR2Vd-QJQPf\" width=\"50%\"/></center><br/>\n","\n","Though Google Colab comes with many useful data science libraries included by default (including Pytorch), the Hugging Face libraries are not, so we will first install those here using `pip`, as they will be used in the remainder of the notebook.\n","\n","- The `transformers` library, for general usage of transformer models\n","- The `datasets` library, for working with datasets hosted on Hugging Face\n","- The `accelerate` library, for using GPU for inference\n","- The `evaluate` library, for metrics for measuring model performance in training\n","- The `bitsandbytes` library for model quantization\n","- The `peft` library, for efficient fine-tuning of models in the second half of the workshop\n","- The `huggingface_hub` library, for interacting with models on the Hugging Face hub\n","\n","We will also be using custom datasets from the NLP from scratch [github repo](https://github.com/nlpfromscratch/datasets/) and so we will clone this repo to have these all available locally.\n","\n"]},{"cell_type":"code","execution_count":1,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"nI1qpfWOM_rQ","outputId":"096e647c-9c72-4d58-ac10-d861714d9d5d","executionInfo":{"status":"ok","timestamp":1720650018287,"user_tz":240,"elapsed":9331,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"stream","name":"stdout","text":["Cloning into 'datasets'...\n","remote: Enumerating objects: 70, done.\u001b[K\n","remote: Counting objects: 100% (70/70), done.\u001b[K\n","remote: Compressing objects: 100% (62/62), done.\u001b[K\n","remote: Total 70 (delta 14), reused 61 (delta 8), pack-reused 0\u001b[K\n","Receiving objects: 100% (70/70), 34.61 MiB | 8.40 MiB/s, done.\n","Resolving deltas: 100% (14/14), done.\n","Updating files: 100% (27/27), done.\n"]}],"source":["!git clone https://github.com/nlpfromscratch/datasets.git"]},{"cell_type":"code","execution_count":2,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"Hvc-b3aG53dv","outputId":"3fac5673-7d20-4193-fb74-5cf5e86f0bac","executionInfo":{"status":"ok","timestamp":1720650111621,"user_tz":240,"elapsed":93337,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"stream","name":"stdout","text":["Requirement already satisfied: transformers in /usr/local/lib/python3.10/dist-packages (4.41.2)\n","Collecting datasets\n","  Downloading datasets-2.20.0-py3-none-any.whl (547 kB)\n","\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m547.8/547.8 kB\u001b[0m \u001b[31m7.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[?25hCollecting accelerate\n","  Downloading accelerate-0.32.1-py3-none-any.whl (314 kB)\n","\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m314.1/314.1 kB\u001b[0m \u001b[31m12.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[?25hCollecting evaluate\n","  Downloading evaluate-0.4.2-py3-none-any.whl (84 kB)\n","\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m84.1/84.1 kB\u001b[0m \u001b[31m11.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[?25hCollecting bitsandbytes\n","  Downloading bitsandbytes-0.43.1-py3-none-manylinux_2_24_x86_64.whl (119.8 MB)\n","\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m119.8/119.8 MB\u001b[0m \u001b[31m6.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[?25hCollecting peft\n","  Downloading peft-0.11.1-py3-none-any.whl (251 kB)\n","\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m251.6/251.6 kB\u001b[0m \u001b[31m24.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[?25hRequirement already satisfied: huggingface_hub in /usr/local/lib/python3.10/dist-packages (0.23.4)\n","Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from transformers) (3.15.4)\n","Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from transformers) (1.25.2)\n","Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from transformers) (24.1)\n","Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from transformers) (6.0.1)\n","Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers) (2024.5.15)\n","Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers) (2.31.0)\n","Requirement already satisfied: tokenizers<0.20,>=0.19 in /usr/local/lib/python3.10/dist-packages (from transformers) (0.19.1)\n","Requirement already satisfied: safetensors>=0.4.1 in /usr/local/lib/python3.10/dist-packages (from transformers) (0.4.3)\n","Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-packages (from transformers) (4.66.4)\n","Collecting pyarrow>=15.0.0 (from datasets)\n","  Downloading pyarrow-16.1.0-cp310-cp310-manylinux_2_28_x86_64.whl (40.8 MB)\n","\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m40.8/40.8 MB\u001b[0m \u001b[31m11.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[?25hRequirement already satisfied: pyarrow-hotfix in /usr/local/lib/python3.10/dist-packages (from datasets) (0.6)\n","Collecting dill<0.3.9,>=0.3.0 (from datasets)\n","  Downloading dill-0.3.8-py3-none-any.whl (116 kB)\n","\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m116.3/116.3 kB\u001b[0m \u001b[31m7.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[?25hRequirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from datasets) (2.0.3)\n","Collecting requests (from transformers)\n","  Downloading requests-2.32.3-py3-none-any.whl (64 kB)\n","\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m64.9/64.9 kB\u001b[0m \u001b[31m4.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[?25hCollecting xxhash (from datasets)\n","  Downloading xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)\n","\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m194.1/194.1 kB\u001b[0m \u001b[31m21.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[?25hCollecting multiprocess (from datasets)\n","  Downloading multiprocess-0.70.16-py310-none-any.whl (134 kB)\n","\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m134.8/134.8 kB\u001b[0m \u001b[31m18.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[?25hRequirement already satisfied: fsspec[http]<=2024.5.0,>=2023.1.0 in /usr/local/lib/python3.10/dist-packages (from datasets) (2023.6.0)\n","Requirement already satisfied: aiohttp in /usr/local/lib/python3.10/dist-packages (from datasets) (3.9.5)\n","Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from accelerate) (5.9.5)\n","Requirement already satisfied: torch>=1.10.0 in /usr/local/lib/python3.10/dist-packages (from accelerate) (2.3.0+cu121)\n","Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/dist-packages (from huggingface_hub) (4.12.2)\n","Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets) (1.3.1)\n","Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets) (23.2.0)\n","Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets) (1.4.1)\n","Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets) (6.0.5)\n","Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets) (1.9.4)\n","Requirement already satisfied: async-timeout<5.0,>=4.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets) (4.0.3)\n","Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->transformers) (3.3.2)\n","Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->transformers) (3.7)\n","Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->transformers) (2.0.7)\n","Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers) (2024.6.2)\n","Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->accelerate) (1.12.1)\n","Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->accelerate) (3.3)\n","Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->accelerate) (3.1.4)\n","Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.10.0->accelerate)\n","  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)\n","Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch>=1.10.0->accelerate)\n","  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)\n","Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch>=1.10.0->accelerate)\n","  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)\n","Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch>=1.10.0->accelerate)\n","  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)\n","Collecting nvidia-cublas-cu12==12.1.3.1 (from torch>=1.10.0->accelerate)\n","  Using cached nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)\n","Collecting nvidia-cufft-cu12==11.0.2.54 (from torch>=1.10.0->accelerate)\n","  Using cached nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)\n","Collecting nvidia-curand-cu12==10.3.2.106 (from torch>=1.10.0->accelerate)\n","  Using cached nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)\n","Collecting nvidia-cusolver-cu12==11.4.5.107 (from torch>=1.10.0->accelerate)\n","  Using cached nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)\n","Collecting nvidia-cusparse-cu12==12.1.0.106 (from torch>=1.10.0->accelerate)\n","  Using cached nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)\n","Collecting nvidia-nccl-cu12==2.20.5 (from torch>=1.10.0->accelerate)\n","  Using cached nvidia_nccl_cu12-2.20.5-py3-none-manylinux2014_x86_64.whl (176.2 MB)\n","Collecting nvidia-nvtx-cu12==12.1.105 (from torch>=1.10.0->accelerate)\n","  Using cached nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)\n","Requirement already satisfied: triton==2.3.0 in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->accelerate) (2.3.0)\n","Collecting nvidia-nvjitlink-cu12 (from nvidia-cusolver-cu12==11.4.5.107->torch>=1.10.0->accelerate)\n","  Downloading nvidia_nvjitlink_cu12-12.5.82-py3-none-manylinux2014_x86_64.whl (21.3 MB)\n","\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m21.3/21.3 MB\u001b[0m \u001b[31m65.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[?25hRequirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas->datasets) (2.8.2)\n","Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->datasets) (2023.4)\n","Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas->datasets) (2024.1)\n","Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.2->pandas->datasets) (1.16.0)\n","Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch>=1.10.0->accelerate) (2.1.5)\n","Requirement already satisfied: mpmath<1.4.0,>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from sympy->torch>=1.10.0->accelerate) (1.3.0)\n","Installing collected packages: xxhash, requests, pyarrow, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, dill, nvidia-cusparse-cu12, nvidia-cudnn-cu12, multiprocess, nvidia-cusolver-cu12, datasets, evaluate, bitsandbytes, accelerate, peft\n","  Attempting uninstall: requests\n","    Found existing installation: requests 2.31.0\n","    Uninstalling requests-2.31.0:\n","      Successfully uninstalled requests-2.31.0\n","  Attempting uninstall: pyarrow\n","    Found existing installation: pyarrow 14.0.2\n","    Uninstalling pyarrow-14.0.2:\n","      Successfully uninstalled pyarrow-14.0.2\n","\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n","cudf-cu12 24.4.1 requires pyarrow<15.0.0a0,>=14.0.1, but you have pyarrow 16.1.0 which is incompatible.\n","google-colab 1.0.0 requires requests==2.31.0, but you have requests 2.32.3 which is incompatible.\n","ibis-framework 8.0.0 requires pyarrow<16,>=2, but you have pyarrow 16.1.0 which is incompatible.\u001b[0m\u001b[31m\n","\u001b[0mSuccessfully installed accelerate-0.32.1 bitsandbytes-0.43.1 datasets-2.20.0 dill-0.3.8 evaluate-0.4.2 multiprocess-0.70.16 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.5.82 nvidia-nvtx-cu12-12.1.105 peft-0.11.1 pyarrow-16.1.0 requests-2.32.3 xxhash-3.4.1\n"]}],"source":["!pip install transformers datasets accelerate evaluate bitsandbytes peft huggingface_hub"]},{"cell_type":"markdown","metadata":{"id":"5dlwb33hLSmj"},"source":["## Working with Generative Text Models"]},{"cell_type":"markdown","metadata":{"id":"sHlaMxcb53dw"},"source":["In this section, we will start generating text with our first large language model, [GPT-2](https://huggingface.co/gpt2) and explore some of the parameters which affect the outputs from a generative text model.\n","\n","The GPT-2 (Generative Pre-trained Transformer 2) model was the last of the series of GPT models from OpenAI which was \"open\". Following its release in 2019, GPT-3 and subsequent models did not have their weights made available publicly (and in the case for more recent models such as GPT-4, nor the details of their training data and training process).\n","\n","We can easily work with GPT-2 in [Hugging Face](https://www.huggingface.co). The easiest way to get results as quickly as possible is to use a [pipeline](https://huggingface.co/docs/transformers/main_classes/pipelines) to generate text *i.e.* to perform inference.\n","\n","First, we import the Pipeline class from the `transformers` library, then creator an instance of it, specifying the model type we wish to use. In this case, we want to use GPT-2, which is hosted Hugging Face themselves, not as part of a user repo, so the URL for it is just `gpt2`.\n","\n","Pipelines can also be for a large variety of different tasks, we must specify that the pipeline is for text generation.\n","\n","Finally, we check whether GPU is available (it should be on Colab) and if so, set the model to use GPU. This requires importing [pytorch](https://en.wikipedia.org/wiki/PyTorch) (`torch`), which is the first line of code."]},{"cell_type":"code","execution_count":3,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"NG4aOzFD8owq","outputId":"ec1e14ff-c334-42a3-a6a6-d3bbbd364c53","executionInfo":{"status":"ok","timestamp":1720650142764,"user_tz":240,"elapsed":31150,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"stream","name":"stderr","text":["/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:89: UserWarning: \n","The secret `HF_TOKEN` does not exist in your Colab secrets.\n","To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.\n","You will be able to reuse this secret in all of your notebooks.\n","Please note that authentication is recommended but still optional to access public models or datasets.\n","  warnings.warn(\n","Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.\n","Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"]},{"output_type":"stream","name":"stdout","text":["[{'generated_text': 'I love applesauce! The only thing that bothers me when I buy applesauce is that it looks like a cake. And I find it a bit awkward to look at the cupcake I'}, {'generated_text': 'I love applesauce! For me, every time I make the cupcakes, I will freeze them, but only for fun. So in between these two amazing meals, I will keep making these'}, {'generated_text': \"I love applesauce! This is awesome: it's creamy, not as sticky or flaky, and tastes great in person.\\n\\nThe reason it's great is because I feel like I\"}]\n"]}],"source":["import torch\n","from transformers import pipeline\n","\n","# Check if GPU is available\n","device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n","\n","# Create a pipeline of the GPT-2 model\n","gpt2_pipeline = pipeline('text-generation', model='gpt2', device=device)\n","\n","# Create 3 output generations\n","outputs = gpt2_pipeline(\"I love applesauce!\", max_length=40, num_return_sequences=3)\n","\n","# Display the first output\n","print(outputs)"]},{"cell_type":"markdown","metadata":{"id":"RTXtOypX96kY"},"source":["We can see that even though we've only a few lines of code, Hugging Face has pulled down over half a gigabyte of data! These are the [model weights for GPT-2](https://huggingface.co/gpt2/blob/main/pytorch_model.bin). For this part of the notebook, we are also using a smaller version of GPT - the full GPT-2 model, [GPT2-XL](https://huggingface.co/gpt2-xl) is ~6.5 GB!\n","\n","Let's take a look at what's in the pipeline - it will contain both a `tokenizer`, for breaking inputs up into the tokens that GPT-2 expects, as well as a `model`, in this case, our GPT-2 model:"]},{"cell_type":"code","execution_count":4,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":377},"id":"4lPEw0UU96Ti","outputId":"12493402-3890-426f-b3f4-a54b1c409be1","executionInfo":{"status":"ok","timestamp":1720650142764,"user_tz":240,"elapsed":23,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"execute_result","data":{"text/plain":["transformers.models.gpt2.tokenization_gpt2_fast.GPT2TokenizerFast"],"text/html":["<div style=\"max-width:800px; border: 1px solid var(--colab-border-color);\"><style>\n","      pre.function-repr-contents {\n","        overflow-x: auto;\n","        padding: 8px 12px;\n","        max-height: 500px;\n","      }\n","\n","      pre.function-repr-contents.function-repr-contents-collapsed {\n","        cursor: pointer;\n","        max-height: 100px;\n","      }\n","    </style>\n","    <pre style=\"white-space: initial; background:\n","         var(--colab-secondary-surface-color); padding: 8px 12px;\n","         border-bottom: 1px solid var(--colab-border-color);\"><b>transformers.models.gpt2.tokenization_gpt2_fast.GPT2TokenizerFast</b><br/>def __call__(text: Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]]=None, text_pair: Optional[Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]]]=None, text_target: Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]]=None, text_pair_target: Optional[Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]]]=None, add_special_tokens: bool=True, padding: Union[bool, str, PaddingStrategy]=False, truncation: Union[bool, str, TruncationStrategy]=None, max_length: Optional[int]=None, stride: int=0, is_split_into_words: bool=False, pad_to_multiple_of: Optional[int]=None, return_tensors: Optional[Union[str, TensorType]]=None, return_token_type_ids: Optional[bool]=None, return_attention_mask: Optional[bool]=None, return_overflowing_tokens: bool=False, return_special_tokens_mask: bool=False, return_offsets_mapping: bool=False, return_length: bool=False, verbose: bool=True, **kwargs) -&gt; BatchEncoding</pre><pre class=\"function-repr-contents function-repr-contents-collapsed\" style=\"\"><a class=\"filepath\" style=\"display:none\" href=\"#\">/usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/tokenization_gpt2_fast.py</a>Construct a &quot;fast&quot; GPT-2 tokenizer (backed by HuggingFace&#x27;s *tokenizers* library). Based on byte-level\n","Byte-Pair-Encoding.\n","\n","This tokenizer has been trained to treat spaces like parts of the tokens (a bit like sentencepiece) so a word will\n","be encoded differently whether it is at the beginning of the sentence (without space) or not:\n","\n","```python\n","&gt;&gt;&gt; from transformers import GPT2TokenizerFast\n","\n","&gt;&gt;&gt; tokenizer = GPT2TokenizerFast.from_pretrained(&quot;openai-community/gpt2&quot;)\n","&gt;&gt;&gt; tokenizer(&quot;Hello world&quot;)[&quot;input_ids&quot;]\n","[15496, 995]\n","\n","&gt;&gt;&gt; tokenizer(&quot; Hello world&quot;)[&quot;input_ids&quot;]\n","[18435, 995]\n","```\n","\n","You can get around that behavior by passing `add_prefix_space=True` when instantiating this tokenizer, but since\n","the model was not pretrained this way, it might yield a decrease in performance.\n","\n","&lt;Tip&gt;\n","\n","When used with `is_split_into_words=True`, this tokenizer needs to be instantiated with `add_prefix_space=True`.\n","\n","&lt;/Tip&gt;\n","\n","This tokenizer inherits from [`PreTrainedTokenizerFast`] which contains most of the main methods. Users should\n","refer to this superclass for more information regarding those methods.\n","\n","Args:\n","    vocab_file (`str`, *optional*):\n","        Path to the vocabulary file.\n","    merges_file (`str`, *optional*):\n","        Path to the merges file.\n","    tokenizer_file (`str`, *optional*):\n","        Path to [tokenizers](https://github.com/huggingface/tokenizers) file (generally has a .json extension) that\n","        contains everything needed to load the tokenizer.\n","    unk_token (`str`, *optional*, defaults to `&quot;&lt;|endoftext|&gt;&quot;`):\n","        The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this\n","        token instead.\n","    bos_token (`str`, *optional*, defaults to `&quot;&lt;|endoftext|&gt;&quot;`):\n","        The beginning of sequence token.\n","    eos_token (`str`, *optional*, defaults to `&quot;&lt;|endoftext|&gt;&quot;`):\n","        The end of sequence token.\n","    add_prefix_space (`bool`, *optional*, defaults to `False`):\n","        Whether or not to add an initial space to the input. This allows to treat the leading word just as any\n","        other word. (GPT2 tokenizer detect beginning of words by the preceding space).</pre>\n","      <script>\n","      if (google.colab.kernel.accessAllowed && google.colab.files && google.colab.files.view) {\n","        for (const element of document.querySelectorAll('.filepath')) {\n","          element.style.display = 'block'\n","          element.onclick = (event) => {\n","            event.preventDefault();\n","            event.stopPropagation();\n","            google.colab.files.view(element.textContent, 34);\n","          };\n","        }\n","      }\n","      for (const element of document.querySelectorAll('.function-repr-contents')) {\n","        element.onclick = (event) => {\n","          event.preventDefault();\n","          event.stopPropagation();\n","          element.classList.toggle('function-repr-contents-collapsed');\n","        };\n","      }\n","      </script>\n","      </div>"]},"metadata":{},"execution_count":4}],"source":["# Check the class of the tokenizer in the pipeline\n","type(gpt2_pipeline.tokenizer)"]},{"cell_type":"code","execution_count":5,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":186},"id":"i83t1eMy90Nb","outputId":"4ba049a6-ba63-4fd7-86c3-14de162eb02f","executionInfo":{"status":"ok","timestamp":1720650142765,"user_tz":240,"elapsed":13,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"execute_result","data":{"text/plain":["transformers.models.gpt2.modeling_gpt2.GPT2LMHeadModel"],"text/html":["<div style=\"max-width:800px; border: 1px solid var(--colab-border-color);\"><style>\n","      pre.function-repr-contents {\n","        overflow-x: auto;\n","        padding: 8px 12px;\n","        max-height: 500px;\n","      }\n","\n","      pre.function-repr-contents.function-repr-contents-collapsed {\n","        cursor: pointer;\n","        max-height: 100px;\n","      }\n","    </style>\n","    <pre style=\"white-space: initial; background:\n","         var(--colab-secondary-surface-color); padding: 8px 12px;\n","         border-bottom: 1px solid var(--colab-border-color);\"><b>transformers.models.gpt2.modeling_gpt2.GPT2LMHeadModel</b><br/>def _wrapped_call_impl(*args, **kwargs)</pre><pre class=\"function-repr-contents function-repr-contents-collapsed\" style=\"\"><a class=\"filepath\" style=\"display:none\" href=\"#\">/usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py</a>The GPT2 Model transformer with a language modeling head on top (linear layer with weights tied to the input\n","embeddings).\n","\n","\n","This model inherits from [`PreTrainedModel`]. Check the superclass documentation for the generic methods the\n","library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads\n","etc.)\n","\n","This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.\n","Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage\n","and behavior.\n","\n","Parameters:\n","    config ([`GPT2Config`]): Model configuration class with all the parameters of the model.\n","        Initializing with a config file does not load the weights associated with the model, only the\n","        configuration. Check out the [`~PreTrainedModel.from_pretrained`] method to load the model weights.</pre>\n","      <script>\n","      if (google.colab.kernel.accessAllowed && google.colab.files && google.colab.files.view) {\n","        for (const element of document.querySelectorAll('.filepath')) {\n","          element.style.display = 'block'\n","          element.onclick = (event) => {\n","            event.preventDefault();\n","            event.stopPropagation();\n","            google.colab.files.view(element.textContent, 1165);\n","          };\n","        }\n","      }\n","      for (const element of document.querySelectorAll('.function-repr-contents')) {\n","        element.onclick = (event) => {\n","          event.preventDefault();\n","          event.stopPropagation();\n","          element.classList.toggle('function-repr-contents-collapsed');\n","        };\n","      }\n","      </script>\n","      </div>"]},"metadata":{},"execution_count":5}],"source":["# Check the class of the model in the pipeline\n","type(gpt2_pipeline.model)"]},{"cell_type":"markdown","metadata":{"id":"IlimROpL-60i"},"source":["Furthermore, we can check the number of parameters of any Hugging Face model by calling the `num_parameters` method of a model object. How many parameters (weights) does our GPT-2 model have?"]},{"cell_type":"code","execution_count":6,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":36},"id":"OiG0-igO-6Pz","outputId":"9a8598d6-1342-4fa8-d359-4ebd42b7bb43","executionInfo":{"status":"ok","timestamp":1720650142765,"user_tz":240,"elapsed":12,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"execute_result","data":{"text/plain":["'124,439,808'"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"string"}},"metadata":{},"execution_count":6}],"source":["# Get the number of model parameters, format nicely with an f-string\n","f\"{gpt2_pipeline.model.num_parameters():,}\""]},{"cell_type":"markdown","metadata":{"id":"HgOr3FNG_TfC"},"source":["Here we can see our GPT-2 model has just over 124 million parameters. The pipeline is a very high level abstraction which encapsulates other classes in Hugging Face (in the case of generative text models, a tokenizer and the model itself)."]},{"cell_type":"markdown","metadata":{"id":"wYyLP27G_gwm"},"source":["### Generating Text\n","In this section, we will generate some text using the GPT-2 model, and also explore the different decoding methods for doing so, and the effect they have on outputs.\n","\n","First, let us generate text from the pipeline using the default behavior. To do this, we simply pass in a string of text and no other arguments:"]},{"cell_type":"code","execution_count":7,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"pkMbKvSE_69J","outputId":"d080ebcf-8f96-4156-b24c-ae69a97e453e","executionInfo":{"status":"ok","timestamp":1720650143847,"user_tz":240,"elapsed":1094,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"stream","name":"stderr","text":["Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"]},{"output_type":"stream","name":"stdout","text":["[{'generated_text': 'The rain in Spain falls mainly in the plain, with the country receiving up to 400,000 rainfall and the capital Madrid only reporting 2,000. A man runs his hands over a floodwater drain in Madrid.\\n\\nDespite the rainfall, which'}]\n"]}],"source":["my_input_string = \"The rain in Spain falls mainly in the plain\"\n","\n","# Generate output\n","output = gpt2_pipeline(my_input_string)\n","\n","# Display\n","print(output)"]},{"cell_type":"markdown","metadata":{"id":"ecRdP52YAGQH"},"source":["We can see that the model has actually generated a `list` of outputs, each which are a dictionary. Let's take a look at the first output:"]},{"cell_type":"code","execution_count":8,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"f80Tt9cwANT6","outputId":"c5687a1d-b7c5-4c99-edb2-704f1263c64e","executionInfo":{"status":"ok","timestamp":1720650143847,"user_tz":240,"elapsed":8,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"execute_result","data":{"text/plain":["{'generated_text': 'The rain in Spain falls mainly in the plain, with the country receiving up to 400,000 rainfall and the capital Madrid only reporting 2,000. A man runs his hands over a floodwater drain in Madrid.\\n\\nDespite the rainfall, which'}"]},"metadata":{},"execution_count":8}],"source":["output[0]"]},{"cell_type":"markdown","metadata":{"id":"eluR-uhpASA3"},"source":["This is just a dictionary with a single key, `generated_text`, which contains both the input we sent into the model, as well as the tokens the model predicted. We can display the output a little more nicely using the [Markdown](https://ipython.readthedocs.io/en/stable/api/generated/IPython.display.html#IPython.display.Markdown) object from IPython (Jupyter), to render it inline like the rest of the text in our notebook here."]},{"cell_type":"code","execution_count":9,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":106},"id":"klR2eElYAsEF","outputId":"797aa68c-aff4-4bb8-9bfa-abca3cda2948","executionInfo":{"status":"ok","timestamp":1720650143847,"user_tz":240,"elapsed":7,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"---"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"The rain in Spain falls mainly in the plain, with the country receiving up to 400,000 rainfall and the capital Madrid only reporting 2,000. A man runs his hands over a floodwater drain in Madrid.\n\nDespite the rainfall, which"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"---"},"metadata":{}}],"source":["from IPython.display import Markdown\n","\n","display(Markdown(\"---\")) # dividing line\n","display(Markdown(output[0]['generated_text']))\n","display(Markdown(\"---\")) # dividing line"]},{"cell_type":"markdown","source":["\n","While pipelines are fine, the best practice is to instantiate the tokenizer and model separately. We initialize a tokenizer and model, and pass the outputs of the tokenizer to the model directly. To do this, we will be leveraging some of the [Auto Classes](https://huggingface.co/docs/transformers/model_doc/auto) in Hugging Face.\n","\n","Since we are doing text generation, *i.e.* [causal language modeling](https://huggingface.co/docs/transformers/tasks/language_modeling), we will using the `AutoModelforCausalLM` class to create the GPT-2 model, as well as creating a tokenizer using `AutoTokenizer`:"],"metadata":{"id":"iqSra9Ser9dG"}},{"cell_type":"code","execution_count":10,"metadata":{"id":"6YjdHo06GH_1","executionInfo":{"status":"ok","timestamp":1720650144792,"user_tz":240,"elapsed":950,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[],"source":["from transformers import AutoTokenizer, AutoModelForCausalLM\n","\n","# Instantiate the tokenizer\n","tokenizer = AutoTokenizer.from_pretrained(\"gpt2\")\n","\n","# add the EOS token as PAD token to avoid warnings\n","model = AutoModelForCausalLM.from_pretrained(\"gpt2\", pad_token_id=tokenizer.eos_token_id).to(device)\n","\n","# Text input string\n","input_string = \"The rain in Spain falls mainly in the plain\""]},{"cell_type":"markdown","metadata":{"id":"Q-t6vXODTYXr"},"source":["Great, now we have the tokenizer, model, and input string. We pass the input string into the tokenizer to get a back a list of token ids, as well as the attention mask for the transformer:"]},{"cell_type":"code","execution_count":11,"metadata":{"id":"OS1bCiCHTXSQ","executionInfo":{"status":"ok","timestamp":1720650144792,"user_tz":240,"elapsed":4,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[],"source":["# encode context the generation is conditioned on\n","model_inputs = tokenizer(input_string, return_tensors='pt').to(device)"]},{"cell_type":"code","execution_count":12,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"18GRfjvoTjjl","outputId":"b8960382-6078-4da0-8392-771767306d6f","executionInfo":{"status":"ok","timestamp":1720650144792,"user_tz":240,"elapsed":4,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"stream","name":"stdout","text":["{'input_ids': tensor([[ 464, 6290,  287, 8602, 8953, 8384,  287,  262, 8631]],\n","       device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1]], device='cuda:0')}\n"]}],"source":["# What is the result?\n","print(model_inputs)"]},{"cell_type":"markdown","metadata":{"id":"M6lehyjRTmYs"},"source":["We then pass this to the model method `generate`. Here we use the \"double-star\" syntax, where the dictionary that is passed in is \"unpacked\" by python, so the function receives separate arguments for `input_ids` and `attention_mask` from the associated values. Let's take a look at the result:"]},{"cell_type":"code","execution_count":13,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"ZvXdukKpRxNj","outputId":"f8b68476-875a-463b-9284-3da25ac1a82f","executionInfo":{"status":"ok","timestamp":1720650145504,"user_tz":240,"elapsed":715,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"stream","name":"stderr","text":["Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n","/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:1168: UserWarning: Using the model-agnostic default `max_length` (=20) to control the generation length. We recommend setting `max_new_tokens` to control the maximum length of the generation.\n","  warnings.warn(\n"]},{"output_type":"stream","name":"stdout","text":["tensor([[  464,  6290,   287,  8602,  8953,  8384,   287,   262,  8631,   286,\n","           262, 50206, 12010,    11,   475,   340,   318,   635,   287,   262]],\n","       device='cuda:0')\n"]}],"source":["# generate the output token ids\n","output = model.generate(**model_inputs)\n","\n","print(output)"]},{"cell_type":"markdown","metadata":{"id":"tRWp-4VUUO3V"},"source":["We can see that the result is just a list of integers. These are the token ids that were predicted by the model as the next most likely, based upon the tokenizer vocabulary. So we can convert these token ids back into text by passing them through the tokenizer as a final step:"]},{"cell_type":"code","execution_count":14,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":66},"id":"02Xae7mmUPff","outputId":"29daa73b-7430-4a83-92c4-b340e05421e6","executionInfo":{"status":"ok","timestamp":1720650145504,"user_tz":240,"elapsed":4,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"---"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"The rain in Spain falls mainly in the plain of the Canary Islands, but it is also in the"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"---"},"metadata":{}}],"source":["# Decode the tokens back to text using the tokenizer\n","output_string = tokenizer.decode(output[0])\n","\n","# Print the result\n","display(Markdown(\"---\")) # dividing line\n","display(Markdown(output_string))\n","display(Markdown(\"---\")) # dividing line"]},{"cell_type":"markdown","metadata":{"id":"SBAjuXYvUjLd"},"source":["And that's it! The whole text generation process goes like this:\n","1. Instantiate tokenizer and model\n","2. Pass input string to tokenizer to generate token ids and attention mask\n","3. Generate output token ids (predictions) from the model\n","4. Decode output token ids back into text using tokenizer\n","\n","We can visualize the whole process with the figure below:\n","\n","<center>\n","<img src=\"https://drive.google.com/uc?export=download&id=1-mYXn0obOPCjWg0wGHjEYlZL9b7wgUXh\" width=\"75%\"/>\n","</center>"]},{"cell_type":"markdown","source":[" Let's move on now to different parameters we have at our disposal for how a model generates text, or as in the language of LLMs, different *decoding strategies*."],"metadata":{"id":"Zj0xMZypsHI3"}},{"cell_type":"markdown","metadata":{"id":"eEmAIq7ky7dj"},"source":["### Text Decoding Strategies\n","\n","As we will see in this section, there is some complexity to creating text outputs with generative language models. Creating new outputs from a given prompt is not as simple as entering the input and getting a predicted output. Generative text models have parameters which control the amount of variability in their outputs; this is a desirable quality to make the outputs seem both more realistic (as if from a human) and  variety being injected into the model outputs also increases the likelihood of reaching a novel result that is pleasing to the user and deemed to be \"good\".\n","\n","First, we will consider the simplest (vanilla) text generation approaches in order to both gradually work our way up, and also contrast with, using them with methods which introduce variety and \"creativity\". The two simplest decoding methods for text generation we will consider first are *greedy search* and *beam search*.\n"]},{"cell_type":"markdown","metadata":{"id":"9_RgA715ClwR"},"source":["#### Greedy Search\n","\n","Greedy search is the simplest text generation approach: in this case, no variety is introduced as all. Recall a text generation model takes a sequence of input tokens and its task is to predict the next token given the input. For greedy search, the next predicted token is always just that with the highest probability.\n","\n","<center>\n","<img src=\"https://drive.google.com/uc?export=download&id=182wa0OBECLTS218FjA3EZIu3iiQB1jX2\" width=\"75%\"/>\n","</center>\n","<caption><i> Greedy Search. Here, for the next two tokens the words \"plain\" and \"which\" are selected, as they have the highest individual probabilities. </i></caption>\n","\n","Mathematically speaking, given an input sequence of tokens $x_1, x_2, x_3...$, the model seeks to produce an output $y_t$ at step $t$. Since generative text models (decoder models) are *autoregressive* and make predictions based upon previous predictions after the initial input, mathematically we can express the prediction task as:\n","\n","$ P(y_t|y_1, y_2, ..., y_{t-1},x)$\n","\n","Greedy search just takes the highest probability token for each prediction. Thus for the vocabulary and different calculated probabilities by the model, this is expressed mathmeatically as:\n","\n","$y_t = argmax_{y \\in V}P(y|y_1,y_2,...,y_{t-1},x)$\n","\n","Let's take a look at this with GPT-2, to do this we will play around with the [parameters](https://huggingface.co/docs/transformers/generation_strategies#customize-text-generation) we can pass to the call to `.generate` on our model in Hugging Face.\n","\n","This will be the same as what we covered in the section above, as the default behavior is to use greedy search:"]},{"cell_type":"code","source":["from transformers import AutoTokenizer, AutoModelForCausalLM\n","\n","# Instantiate the tokenizer\n","tokenizer = AutoTokenizer.from_pretrained(\"gpt2\")\n","\n","# add the EOS token as PAD token to avoid warnings\n","model = AutoModelForCausalLM.from_pretrained(\"gpt2\", pad_token_id=tokenizer.eos_token_id).to(device)\n","\n","# Text input string\n","input_string = \"The rain in Spain falls mainly in the plain\"\n","\n","# encode context the generation is conditioned on\n","model_inputs = tokenizer(input_string, return_tensors='pt').to(device)\n","\n","# Do greedy generation to generate the output token ids\n","greedy_output = model.generate(**model_inputs)\n","\n","print(greedy_output)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"AzaqSegbs_aK","executionInfo":{"status":"ok","timestamp":1720650146832,"user_tz":240,"elapsed":1331,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}},"outputId":"275ad088-8fd2-4041-9414-35c0209c5d41"},"execution_count":15,"outputs":[{"output_type":"stream","name":"stderr","text":["Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"]},{"output_type":"stream","name":"stdout","text":["tensor([[  464,  6290,   287,  8602,  8953,  8384,   287,   262,  8631,   286,\n","           262, 50206, 12010,    11,   475,   340,   318,   635,   287,   262]],\n","       device='cuda:0')\n"]}]},{"cell_type":"markdown","metadata":{"id":"YEgQirNNbTTL"},"source":["It should be noted that with greedy search, we will always be picking the most likely output tokens, and so the final result will be completely determinstic and the same each time. We can see this with the behavior of the model below by generating the same output over and over:"]},{"cell_type":"code","execution_count":16,"metadata":{"id":"5vM1XzhkbQCJ","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1720650147463,"user_tz":240,"elapsed":634,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}},"outputId":"e9c2aa42-c0a3-49ed-e3f7-3ed955242025"},"outputs":[{"output_type":"stream","name":"stderr","text":["Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"]}],"source":["# Initial generation\n","greedy_output = model.generate(**model_inputs)\n","output_string = tokenizer.decode(greedy_output[0])"]},{"cell_type":"code","execution_count":17,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"LO2cKTW4bunN","outputId":"caeef6bc-bb43-4e18-e09f-2488b3b8cf87","executionInfo":{"status":"ok","timestamp":1720650147463,"user_tz":240,"elapsed":7,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"stream","name":"stdout","text":["The rain in Spain falls mainly in the plain of the Canary Islands, but it is also in the\n"]}],"source":["# Output\n","print(output_string)"]},{"cell_type":"code","execution_count":18,"metadata":{"id":"EyUTG0NmbyHs","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1720650147463,"user_tz":240,"elapsed":6,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}},"outputId":"5ea4c4ae-6ba8-4864-8c3b-ff17042ea62f"},"outputs":[{"output_type":"stream","name":"stderr","text":["Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"]}],"source":["# Second generation\n","greedy_output2 = model.generate(**model_inputs)\n","output_string2 = tokenizer.decode(greedy_output2[0])"]},{"cell_type":"code","execution_count":19,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"fZXq6De8b5IZ","outputId":"0ad9b67b-0688-4f03-ae1f-c2a6f5a303df","executionInfo":{"status":"ok","timestamp":1720650147463,"user_tz":240,"elapsed":5,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"stream","name":"stdout","text":["The rain in Spain falls mainly in the plain of the Canary Islands, but it is also in the\n"]}],"source":["# Output\n","print(output_string2)"]},{"cell_type":"markdown","metadata":{"id":"s25JZeoCcBDO"},"source":["We can see that we will always get the same result as an output here based on the model. Let us now explore other approaches for generating text which generate the series of output tokens based on different approaches."]},{"cell_type":"markdown","metadata":{"id":"yEDuh9ysGIoy"},"source":["#### Beam Search\n","\n","Beam search is an improvement on greedy search which considers the most likely sequence of tokens *together*, based on their respective probabilities, as opposed to just taking the most probable individual token at each timestep.\n","\n","A *beam width* is specified, and over the width of the beam (number of generated tokens), the combination of tokens with the highest collective probability is selected, as opposed to just selecting the individual token with the highest probability, as with greedy search.\n","\n","<center>\n","<img src=\"https://drive.google.com/uc?export=download&id=1875NvbUveU2ya3IdMpQfKfa8k5D9h5Tu\" width=\"75%\"/>\n","</center>\n","<caption><i> Beam Search. Here, for the next two tokens the words \"meadow\" and \"grasses\" are selected, as they joint probability of 0.36 (0.4 x 0.9) is greater than that of the tokens selected in greedy search which is 0.33 (0.6 x 0.55). </i></caption>\n","\n","A couple points to note about beam search is that searching over a larger sequence of tokens (*i.e.* increasing `beam_size`) will result in significantly improved quality of outputs at the cost of increased computation.\n","\n","There is a \"law of diminishing returns\" with beam search: typically there is a saturation point beyond which increasing the beam size does not significantly change the most likely generated sequence, as the probabilities are dominated by the product of the most frequently occurring tokens in the sequence considered by beam search.\n","\n","Generally speaking, beam search can lead to repetitive outputs for open-ended generation. This is why it and greedy search are used in conjuction with sampling."]},{"cell_type":"markdown","metadata":{"id":"Jb1xyHrOumWy"},"source":["To generate text with beam search in Hugging Face, we set the `num_beams` parameter to a value greater than 1 (which would be equivalent to greedy search) and `early_stopping=True`, so generation finishes when all beams pass back an \"end of string\" (EOS) token.\n","\n","We have already created our tokenizer and model, so this can just be done in the call to `model.generate()`:"]},{"cell_type":"code","execution_count":20,"metadata":{"id":"gi98BPxVulNE","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1720650147802,"user_tz":240,"elapsed":342,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}},"outputId":"51048a3f-1bb3-4f50-ca2c-2d3df65316f3"},"outputs":[{"output_type":"stream","name":"stderr","text":["Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"]}],"source":["# Text input string\n","input_string = \"The rain in Spain falls mainly in the plain\"\n","\n","# Model input\n","model_inputs = tokenizer(input_string, return_tensors='pt').to(device)\n","\n","# Generate output with beam search\n","greedy_output = model.generate(**model_inputs, num_beams=10, early_stopping=True)\n","\n","# Decode the output\n","output_string = tokenizer.decode(greedy_output[0])"]},{"cell_type":"code","execution_count":21,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":46},"id":"1NKEP0qvvaUi","outputId":"1c70fb76-7a65-41e6-9a21-91046fc39200","executionInfo":{"status":"ok","timestamp":1720650147803,"user_tz":240,"elapsed":5,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"The rain in Spain falls mainly in the plain to the south of the city of Barcelona.\n\n"},"metadata":{}}],"source":["display(Markdown(output_string))"]},{"cell_type":"markdown","metadata":{"id":"gPHE6_KkvkFv"},"source":["We can see that beam search has returned quite a different result from that of greedy search, by looking over the collective probabilities of a number of predicted token possibilities, instead of just each following token."]},{"cell_type":"markdown","metadata":{"id":"iI2-0NaV53dx"},"source":["#### Sampling Strategies\n","\n","While the different search decoding strategies provide some varaibility in the outputs of a generative text model, they are still determinisitc in their outputs, and this can lead to either a.) poor outputs or b.) repeated identical outputs, the latter of which is not a desireable traits for end users.\n","\n","As such, there also exist different *sampling strategies* for introducing variability and novelty into the outputs of generative text models. The three main parameters available for different sampling strategies are *temperature*, *top-p,* and *top-k*.\n"]},{"cell_type":"markdown","metadata":{"id":"YJK4wlqFwrDA"},"source":["##### Temperature\n","\n","The temperature is a factor which normalizes or \"smooths out\" the output probabilities of predicted tokens. In practice, it is used to control the variability (or randomness, or \"creativity\") of the outputs of a model.\n","\n","Mathematically speaking, calculating the model probability for predicting any individual token as the next one, such that all probabilities lie between zero and one and sum to one, is attained using the softmax function:\n","\n","$ P(y_i) = \\frac{e^{z_i}}{\\sum_{j=1}^{N}e^{z_j}} $\n","\n","where:\n","- $P(y_i)$ is the probability of selecting the $i$th token.\n","- $z_i$ is the logit, the raw score or output, from the model for token $i$\n","- and $N$ is the total size of the vocabulary\n","\n","we introduce a new variable $\\tau$ for temperature and update the probability formula as below:\n","\n","$P(y_i) = \\frac{e^{z_i / \\tau}}{\\sum_{j=1}^{N}e^{z_j / \\tau}}$\n","\n","Given the above, if $\\tau = 1$, the formula for the probabilities, and thus the behavior of the model, is unchanged. It can be shown that as $\\tau \\to \\infty$, that $P(y_i) \\to 1$ for all $i$, and so the likelihood of any token predicted becomes equal. This results in a completely uniform distribution of probabilities acrosss all possible tokens.\n","\n","On the other hand, as $\\tau \\to 0$, the probability for any given token can be represented by:\n","\n","$$\n","P(i)=\\begin{cases}\n","    1 & \\text{if $i$ is max probability}\\\\\n","    0 & \\text{otherwise}\n","  \\end{cases}\n","$$\n","\n","That is to say, the most likely token will have a probability of 1, and then others will have their probabilities set to 0, and the output of the model will be completely deterministic.\n","\n","To put in another way, setting a low value to temperature (value of 0) means that the most likely next tokens will always be returned, whereas setting higher values to temperature flattens the probabilities across the different possible tokens, resulting in increasingly random outputs for greater values of $\\tau$.\n","\n","This is visualized in the figure below:\n","\n","<center>\n","<img src=\"https://drive.google.com/uc?export=download&id=1EEM8re3w97gFONaeRf6Sp76TVmtzoZaC\" width=\"80%\"/>\n","</center>\n","<center><caption> Visualizing the effect of changing temperature on next token probabilities </caption></center>\n","\n","There is a balance to be struck, as too low a temperature will result in a model always returning the same output for a given input - that is, acting deterministically - whereas setting the temperature too high can result in garbled and incoherent."]},{"cell_type":"markdown","metadata":{"id":"YwuaLcbX_UB-"},"source":["Now let's try experimenting with changing the temperature parameter for text geenration using GPT-2. In Hugging Face, this is controlled by the `temperature` parameter in either calls to a model pipeline, in directly in the text generation call in `model.generate()`. We must also set the `do_sample=True` argument, to tell Hugging Face to use sampling and not to do greedy search.\n","\n","First, let's set a temperature (close to that) of 0, which will always result in the most likely token be chosen. Note that this is equivalent to greedy search:"]},{"cell_type":"code","execution_count":22,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":130},"id":"AiAA5zi-_p4q","outputId":"6b276416-d73c-4066-8227-42e3775e7b5f","executionInfo":{"status":"ok","timestamp":1720650148496,"user_tz":240,"elapsed":697,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"stream","name":"stderr","text":["Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"]},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"---"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"The rain in Spain falls mainly in the plain of the Canary Islands, but it is also in the"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"The rain in Spain falls mainly in the plain of the Canary Islands, but it is also in the"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"The rain in Spain falls mainly in the plain of the Canary Islands, but it is also in the"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"---"},"metadata":{}}],"source":["# Text input string\n","input_string = \"The rain in Spain falls mainly in the plain\"\n","\n","# Generation =  temperature ~= 0 - deterministic\n","model_inputs = tokenizer(input_string, return_tensors='pt').to(device)\n","zero_temp_output = model.generate(**model_inputs, temperature=0.00001, do_sample=True, num_return_sequences=3)\n","\n","# Iterate over outputs and display in markdown\n","display(Markdown(\"---\"))\n","\n","for output in zero_temp_output:\n","  output_string = tokenizer.decode(output)\n","  display(Markdown(output_string))\n","\n","display(Markdown(\"---\"))"]},{"cell_type":"markdown","metadata":{"id":"BqmxxbfIGiHk"},"source":["We see that the same output is returned as before, and we can run the above cell multiple times and always get back the same input. Now let's set the temperature to 1, which will leave the next token probabilities unchanged. In this case, we should be able to get different outputs:"]},{"cell_type":"code","execution_count":23,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":153},"id":"GA7GMbtpAUx8","outputId":"8a8980d3-bd98-4b18-d601-574ad063df47","executionInfo":{"status":"ok","timestamp":1720650148496,"user_tz":240,"elapsed":9,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"stream","name":"stderr","text":["Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"]},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"---"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"The rain in Spain falls mainly in the plain, as is expected in Europe.\n\nSyd"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"The rain in Spain falls mainly in the plain and the snow in the central and southern reaches. The"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"The rain in Spain falls mainly in the plain of El Fuego and is spread all over and"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"---"},"metadata":{}}],"source":["# Text input string\n","input_string = \"The rain in Spain falls mainly in the plain\"\n","\n","# Generation: temperature = 1, default behavior\n","model_inputs = tokenizer(input_string, return_tensors='pt').to(device)\n","temp1_output = model.generate(**model_inputs, temperature=1, do_sample=True, num_return_sequences=3)\n","\n","# Iterate over outputs and display in markdown\n","display(Markdown(\"---\"))\n","\n","for output in temp1_output:\n","  output_string = tokenizer.decode(output)\n","  display(Markdown(output_string))\n","\n","display(Markdown(\"---\"))"]},{"cell_type":"markdown","metadata":{"id":"FepkJD38AV_I"},"source":["Cool, those all seem like reasonable outputs, even though they are all different. We have introduced some variability into the model outputs which makes for novelty.\n","\n","Finally, let's really crank up the temperature! This will make all output tokens equally likely, resulting in very \"creative\" outputs:"]},{"cell_type":"code","execution_count":24,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":130},"id":"f8Xp8KRIAdan","outputId":"3022a179-1663-4619-f020-f7dd03bbcd57","executionInfo":{"status":"ok","timestamp":1720650148738,"user_tz":240,"elapsed":250,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"stream","name":"stderr","text":["Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"]},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"---"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"The rain in Spain falls mainly in the plain. Since Spain won WWII almost twice our nation got under"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"The rain in Spain falls mainly in the plainlands outside Turreón with the rainfall often at its"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"The rain in Spain falls mainly in the plain (including the valleys from Isro & Galvesteria"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"---"},"metadata":{}}],"source":["# Text input string\n","input_string = \"The rain in Spain falls mainly in the plain\"\n","\n","# Generation: temperature = 1B, all tokens equally likely\n","model_inputs = tokenizer(input_string, return_tensors='pt').to(device)\n","high_temp_output = model.generate(**model_inputs, temperature=1.0e9, do_sample=True, num_return_sequences=3)\n","\n","# Iterate over outputs and display in markdown\n","display(Markdown(\"---\"))\n","\n","for output in high_temp_output:\n","  output_string = tokenizer.decode(output)\n","  display(Markdown(output_string))\n","\n","display(Markdown(\"---\"))"]},{"cell_type":"markdown","metadata":{"id":"NwTTbc5WH7yc"},"source":["As we can see above, setting a high value for temperature results in more \"creative\" outputs but some of these are less coherent than those with lower temperature.\n","\n","Now let us consider further sampling strategies for introducing variability in model outputs whilst attempting to maintain the quality thereof."]},{"cell_type":"markdown","metadata":{"id":"iNbJNYEDz7Vp"},"source":["##### Top-p & Top-k sampling\n","\n","Unlike temperature, which changes the different calculated probabilities of the next token, *top-p* and *top-k* instead function by reducing the size of the set of possible tokens to choose from. Though are differently in how they are applied, they both restrict the set of possible next tokens to only the most likely ones above a specified threshold, and then redistribute the probability mass amongst this smaller set. They are typically used in conjunction with temperature to produce varied but still comprehensible outputs."]},{"cell_type":"markdown","metadata":{"id":"IbO9-lrAn_UJ"},"source":["In *top-k* sampling, instead of calculating probabilities and sampling from all possible tokens, a cutoff integer value $k$ is specified, and only the top $k$ ranked tokens are used as the set of possible next tokens. The total probability (summing to 1) is redistributed amongst these top $k$ tokens.\n","\n","This is illustated in the figure below. Instead of choosing from all possible next words, only the top 5 words would be considered, and the probabilities would be redistributed amongst them:\n","\n","<center>\n","<img src=\"https://drive.google.com/uc?export=download&id=1FGD7yGPCBrHixsUyMX2U7nQZRMCbAD26\" width=\"50%\"/></center>\n","<center><caption> Top-k sampling: only the most probable tokens above and including rank $k$ are kept </caption></center>"]},{"cell_type":"markdown","metadata":{"id":"pnJgAvpJoApB"},"source":["*Top-p*, or *nucleus sampling* differs in that instead of specifying a rank $k$ and taking the most probable tokens this rank or above, in top-p a probability threshold $p$ is specified, and only the top tokens which a combined probability above this threshold are kept in the set of next possible tokens. This differs from top-k in that we don't specify the size of the set of next tokens, only the total probability.\n","\n","Coming back to our previous example, here using top-p, we wish only to keep tokens which have a combined probability equal to or above a threshold 0.8. In this case the top four most likely next tokens meet this criteria (as $0.5 + 0.15 + 0.1 + 0.05 = 0.8$) so the total probabilty would be redistributed only amongst them:\n","\n","<center>\n","<img src=\"https://drive.google.com/uc?export=download&id=1FHlQL-kkIK_7-kO4sM-tmIyZxn8jdkLh\" width=\"50%\"/></center>\n","<center><caption> Top-p sampling: only the tokens with cumulative probability above the specified threshold $p$ are kept </caption></center>"]},{"cell_type":"markdown","metadata":{"id":"8JzBA8_RpFod"},"source":["In Hugging Face, top-k and top-p sampling can be used by specifying them in with the arguments `top_k` and `top_p` respectively. `top_k` is an integer value, and `top_p` a floating point between 0 and 1.\n","\n","Note that both of these will still just return the most likely sequences (deterministically) and so should be combined with beam search and/or temperature. These allows returning multiple outputs with `num_return_sequences` as we've seen before:"]},{"cell_type":"code","execution_count":25,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":282},"id":"7OsY0Fxpp3Jd","outputId":"e8f55a1c-2bbe-4faa-ed26-199bfd7ba335","executionInfo":{"status":"ok","timestamp":1720650150033,"user_tz":240,"elapsed":1299,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"stream","name":"stderr","text":["Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n","Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"]},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"---"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"Top-k, $k=30$:"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"The rain in Spain falls mainly in the plain, which has only just become a few days old."},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"The rain in Spain falls mainly in the plain near Barcelona. But the air quality here has suffered,"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"The rain in Spain falls mainly in the plain of Pernigena, but the area still has"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"---"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"Top p, $p=0.5$:"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"The rain in Spain falls mainly in the plain of Barcelona, and is a very important part of the"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"The rain in Spain falls mainly in the plain of the Canary Islands, but the country is also known"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"The rain in Spain falls mainly in the plain of Valencia, but the rains also hit other parts of"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"---"},"metadata":{}}],"source":["# Text input string\n","input_string = \"The rain in Spain falls mainly in the plain\"\n","model_inputs = tokenizer(input_string, return_tensors='pt').to(device)\n","\n","# Generation - Top-k & Top-p\n","top_k_output = model.generate(**model_inputs, top_k=30, do_sample=True, num_return_sequences=3)\n","top_p_output = model.generate(**model_inputs, top_p=0.5, do_sample=True, num_return_sequences=3)\n","\n","# Top K\n","display(Markdown(\"---\"))\n","display(Markdown(\"Top-k, $k=30$:\"))\n","for output in top_k_output:\n","  output_string = tokenizer.decode(output)\n","  display(Markdown(output_string))\n","\n","# Low Top K\n","display(Markdown(\"---\"))\n","display(Markdown(\"Top p, $p=0.5$:\"))\n","for output in top_p_output:\n","  output_string = tokenizer.decode(output)\n","  display(Markdown(output_string))\n","display(Markdown(\"---\"))"]},{"cell_type":"markdown","metadata":{"id":"LiHUoVf5p16Q"},"source":["Top-p and Top-K can be used in conjunction, to avoid very low ranked words while allowing for variability. In pratice, this requires a fair bit of trial and error to find good values for $k$ and/or $p$, combined with temperature."]},{"cell_type":"code","execution_count":26,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":130},"id":"wYP6Io4AukbE","outputId":"3084cbfa-2ebb-4312-b034-5a4e7c9b2e77","executionInfo":{"status":"ok","timestamp":1720650150439,"user_tz":240,"elapsed":410,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"stream","name":"stderr","text":["Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"]},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"---"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"The rain in Spain falls mainly in the plain of La Liguori, but is also scattered"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"The rain in Spain falls mainly in the plain of the northern city of Alicante and is a common"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"The rain in Spain falls mainly in the plain of Catalonia and has become so intense that a group of"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"---"},"metadata":{}}],"source":["# Putting it all together\n","outputs = model.generate(\n","    **model_inputs,\n","    do_sample=True,\n","    top_k=30,\n","    top_p=0.5,\n","    temperature=1.5,\n","    num_return_sequences=3,\n",")\n","\n","display(Markdown(\"---\"))\n","for output in outputs:\n","  output_string = tokenizer.decode(output)\n","  display(Markdown(output_string))\n","display(Markdown(\"---\"))"]},{"cell_type":"markdown","metadata":{"id":"HNOwYVPNS4pB"},"source":["Great! Now let's move onto the next section of the notebook on fine-tuning models. We should restart the kernel of the notebook here to clear the RAM, then re-import the modules we need:"]},{"cell_type":"code","execution_count":27,"metadata":{"id":"RMsmhdVlTA7P","executionInfo":{"status":"ok","timestamp":1720650150439,"user_tz":240,"elapsed":19,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[],"source":["import torch\n","from IPython.display import Markdown\n","from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline\n","\n","# Check if GPU is available\n","device = \"cuda\" if torch.cuda.is_available() else \"cpu\""]},{"cell_type":"markdown","metadata":{"id":"5y4uwRzara2E"},"source":["## Fine-tuning Large Language Models\n","\n","Now that we have covered some of the fundamentals of working with pre-trained large language models for generative text, we will progress to the more advanced task of adapting pre-trained models to specific tasks or datasets to change their behavior.\n","\n","Adapting a pre-trained LLM for a new problem or specific dataset by updating its parameters through further training is referred to as [fine-tuning](https://en.wikipedia.org/wiki/Fine-tuning_(deep_learning)). Though the nomenclature has evolved and the terms are now sometimes used interchangeably, this technique is a type of [transfer learning](https://en.wikipedia.org/wiki/Transfer_learning). Other methods for transfer learning do exist, and so the term \"fine-tuning\" should be more specifically use to refer to cases in which model weights are updated (or new model weights are added and optimized) against a new dataset or objective.\n","\n","In practice, the term *fine-tuning* is more commonly now used with respect to LLMs, while that of *transfer learning* to refer to the larger group of approaches, usually inside the domain of machine learning and deep learning outside of language models.\n","\n","One of the most common applications for fine-tuning is to take a well-performing pre-trained language model (foundation model) as a base and adapt this to a new classification task. This is commonly done with the [BERT model](https://en.wikipedia.org/wiki/BERT_(language_model) as a base, which has already learned powerful and meaningful representations of language, and adapt it to other NLP tasks, for example, by attaching a classification \"head\", or even by having another simple classifier model take the outputs of the BERT model as inputs for a classification task.\n","\n","<center>\n","<img src=\"https://drive.google.com/uc?export=download&id=1FI4_Pq_MVTdjMDpMIB76qnbaBZ63AoEF\" width=\"75%\"/>\n","</center>\n","<center><caption> Adapting BERT to a new text NLP task via fine-tuning </caption></center>\n","\n","There is an example of the above, for full fine-tuning of BERT (no frozen layers) in the [Hugging Face documentation](https://huggingface.co/docs/transformers/training#fine-tune-a-pretrained-model).\n","\n","In the remainder of this workshop, we will focus on fully fine-tuning the GPT-2 model we've been working with already as a \"hello world\" example, and see if we can change the behavior (outputs) of this generative text model with fine-tuning."]},{"cell_type":"markdown","metadata":{"id":"IOVGEeC8wEUV"},"source":["### Fine-tuning a model in Hugging Face using the Trainer API\n","\n","In this section, we will write our own code to train a model using the Hugging Face library directly. Essentially, HF acts as a higher-level API around pytorch, handling all of the nitty-gritty lower level details of training for us.\n","\n","Here, we will work with the [Yoda dataset](https://github.com/nlpfromscratch/datasets/tree/master/yoda) to teach a GPT to speak like our favourite Jedi master. In this case, however, we will work directly with Hugging Face's `Trainer` class to fine-tune our model.\n","\n","<center>\n","<img src=\"https://drive.google.com/uc?export=download&id=1Gz5L_JNPApconeL-9IgBciEseRRKs6t-\" width=\"150px\"/>\n","</center>\n","\n","Now, we can load our base model that we wish to fine-tune. Doing this type of work is computationally demanding, so again, please make sure you are using a GPU runtime if you are running this notebook in Colab, or have sufficient computing resources (*i.e.* a GPU) if you are choosing to run the notebook locally."]},{"cell_type":"markdown","metadata":{"id":"JS8XCECYwEUW"},"source":["#### Loading the Data\n","\n","As with all machine learning, we first need data. The Yoda dataset is conveniently stored in the NLP from scratch [datasets repo](https://github.com/nlpfromscratch/datasets) on Github, which we already pulled down at the beginning of the notebook using `git`.\n","\n","Let's take a look at the data:"]},{"cell_type":"code","execution_count":28,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"D4Jadlo_sjEh","outputId":"748f7002-71d7-4813-f80b-05ddece3e31e","executionInfo":{"status":"ok","timestamp":1720650150439,"user_tz":240,"elapsed":18,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"stream","name":"stdout","text":["README.md  yoda.csv\n"]}],"source":["# Take a look at the yoda data\n","!ls datasets/yoda/"]},{"cell_type":"markdown","metadata":{"id":"xRCm3ildyYty"},"source":["We can see there is one data file, `yoda.csv`. This contains all the lines spoken by Yoda in the Star Wars films. Let's take a look at some of the data we'll be using for fine-tuning a generative text model:"]},{"cell_type":"code","execution_count":29,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":206},"id":"kXCRPmjwyj58","outputId":"5a429e6b-73ac-43af-e206-ebced91d132c","executionInfo":{"status":"ok","timestamp":1720650150439,"user_tz":240,"elapsed":13,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"execute_result","data":{"text/plain":["                                                text\n","0  The very Republic is threatened, if involved t...\n","1  Hard to see, the dark side is. Discover who th...\n","2  With this Naboo queen you must stay, Qui-Gon. ...\n","3                         May the Force be with you.\n","4      (Cont'd) Master Qui-Gon more to say have you?"],"text/html":["\n","  <div id=\"df-1c4a8465-04a2-4bba-bedd-27c4ff407f09\" class=\"colab-df-container\">\n","    <div>\n","<style scoped>\n","    .dataframe tbody tr th:only-of-type {\n","        vertical-align: middle;\n","    }\n","\n","    .dataframe tbody tr th {\n","        vertical-align: top;\n","    }\n","\n","    .dataframe thead th {\n","        text-align: right;\n","    }\n","</style>\n","<table border=\"1\" class=\"dataframe\">\n","  <thead>\n","    <tr style=\"text-align: right;\">\n","      <th></th>\n","      <th>text</th>\n","    </tr>\n","  </thead>\n","  <tbody>\n","    <tr>\n","      <th>0</th>\n","      <td>The very Republic is threatened, if involved t...</td>\n","    </tr>\n","    <tr>\n","      <th>1</th>\n","      <td>Hard to see, the dark side is. Discover who th...</td>\n","    </tr>\n","    <tr>\n","      <th>2</th>\n","      <td>With this Naboo queen you must stay, Qui-Gon. ...</td>\n","    </tr>\n","    <tr>\n","      <th>3</th>\n","      <td>May the Force be with you.</td>\n","    </tr>\n","    <tr>\n","      <th>4</th>\n","      <td>(Cont'd) Master Qui-Gon more to say have you?</td>\n","    </tr>\n","  </tbody>\n","</table>\n","</div>\n","    <div class=\"colab-df-buttons\">\n","\n","  <div class=\"colab-df-container\">\n","    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-1c4a8465-04a2-4bba-bedd-27c4ff407f09')\"\n","            title=\"Convert this dataframe to an interactive table.\"\n","            style=\"display:none;\">\n","\n","  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n","    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n","  </svg>\n","    </button>\n","\n","  <style>\n","    .colab-df-container {\n","      display:flex;\n","      gap: 12px;\n","    }\n","\n","    .colab-df-convert {\n","      background-color: #E8F0FE;\n","      border: none;\n","      border-radius: 50%;\n","      cursor: pointer;\n","      display: none;\n","      fill: #1967D2;\n","      height: 32px;\n","      padding: 0 0 0 0;\n","      width: 32px;\n","    }\n","\n","    .colab-df-convert:hover {\n","      background-color: #E2EBFA;\n","      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n","      fill: #174EA6;\n","    }\n","\n","    .colab-df-buttons div {\n","      margin-bottom: 4px;\n","    }\n","\n","    [theme=dark] .colab-df-convert {\n","      background-color: #3B4455;\n","      fill: #D2E3FC;\n","    }\n","\n","    [theme=dark] .colab-df-convert:hover {\n","      background-color: #434B5C;\n","      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n","      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n","      fill: #FFFFFF;\n","    }\n","  </style>\n","\n","    <script>\n","      const buttonEl =\n","        document.querySelector('#df-1c4a8465-04a2-4bba-bedd-27c4ff407f09 button.colab-df-convert');\n","      buttonEl.style.display =\n","        google.colab.kernel.accessAllowed ? 'block' : 'none';\n","\n","      async function convertToInteractive(key) {\n","        const element = document.querySelector('#df-1c4a8465-04a2-4bba-bedd-27c4ff407f09');\n","        const dataTable =\n","          await google.colab.kernel.invokeFunction('convertToInteractive',\n","                                                    [key], {});\n","        if (!dataTable) return;\n","\n","        const docLinkHtml = 'Like what you see? Visit the ' +\n","          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n","          + ' to learn more about interactive tables.';\n","        element.innerHTML = '';\n","        dataTable['output_type'] = 'display_data';\n","        await google.colab.output.renderOutput(dataTable, element);\n","        const docLink = document.createElement('div');\n","        docLink.innerHTML = docLinkHtml;\n","        element.appendChild(docLink);\n","      }\n","    </script>\n","  </div>\n","\n","\n","<div id=\"df-4c4bdece-3a22-4a83-936d-a26a6de92aa2\">\n","  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-4c4bdece-3a22-4a83-936d-a26a6de92aa2')\"\n","            title=\"Suggest charts\"\n","            style=\"display:none;\">\n","\n","<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n","     width=\"24px\">\n","    <g>\n","        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n","    </g>\n","</svg>\n","  </button>\n","\n","<style>\n","  .colab-df-quickchart {\n","      --bg-color: #E8F0FE;\n","      --fill-color: #1967D2;\n","      --hover-bg-color: #E2EBFA;\n","      --hover-fill-color: #174EA6;\n","      --disabled-fill-color: #AAA;\n","      --disabled-bg-color: #DDD;\n","  }\n","\n","  [theme=dark] .colab-df-quickchart {\n","      --bg-color: #3B4455;\n","      --fill-color: #D2E3FC;\n","      --hover-bg-color: #434B5C;\n","      --hover-fill-color: #FFFFFF;\n","      --disabled-bg-color: #3B4455;\n","      --disabled-fill-color: #666;\n","  }\n","\n","  .colab-df-quickchart {\n","    background-color: var(--bg-color);\n","    border: none;\n","    border-radius: 50%;\n","    cursor: pointer;\n","    display: none;\n","    fill: var(--fill-color);\n","    height: 32px;\n","    padding: 0;\n","    width: 32px;\n","  }\n","\n","  .colab-df-quickchart:hover {\n","    background-color: var(--hover-bg-color);\n","    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n","    fill: var(--button-hover-fill-color);\n","  }\n","\n","  .colab-df-quickchart-complete:disabled,\n","  .colab-df-quickchart-complete:disabled:hover {\n","    background-color: var(--disabled-bg-color);\n","    fill: var(--disabled-fill-color);\n","    box-shadow: none;\n","  }\n","\n","  .colab-df-spinner {\n","    border: 2px solid var(--fill-color);\n","    border-color: transparent;\n","    border-bottom-color: var(--fill-color);\n","    animation:\n","      spin 1s steps(1) infinite;\n","  }\n","\n","  @keyframes spin {\n","    0% {\n","      border-color: transparent;\n","      border-bottom-color: var(--fill-color);\n","      border-left-color: var(--fill-color);\n","    }\n","    20% {\n","      border-color: transparent;\n","      border-left-color: var(--fill-color);\n","      border-top-color: var(--fill-color);\n","    }\n","    30% {\n","      border-color: transparent;\n","      border-left-color: var(--fill-color);\n","      border-top-color: var(--fill-color);\n","      border-right-color: var(--fill-color);\n","    }\n","    40% {\n","      border-color: transparent;\n","      border-right-color: var(--fill-color);\n","      border-top-color: var(--fill-color);\n","    }\n","    60% {\n","      border-color: transparent;\n","      border-right-color: var(--fill-color);\n","    }\n","    80% {\n","      border-color: transparent;\n","      border-right-color: var(--fill-color);\n","      border-bottom-color: var(--fill-color);\n","    }\n","    90% {\n","      border-color: transparent;\n","      border-bottom-color: var(--fill-color);\n","    }\n","  }\n","</style>\n","\n","  <script>\n","    async function quickchart(key) {\n","      const quickchartButtonEl =\n","        document.querySelector('#' + key + ' button');\n","      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n","      quickchartButtonEl.classList.add('colab-df-spinner');\n","      try {\n","        const charts = await google.colab.kernel.invokeFunction(\n","            'suggestCharts', [key], {});\n","      } catch (error) {\n","        console.error('Error during call to suggestCharts:', error);\n","      }\n","      quickchartButtonEl.classList.remove('colab-df-spinner');\n","      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n","    }\n","    (() => {\n","      let quickchartButtonEl =\n","        document.querySelector('#df-4c4bdece-3a22-4a83-936d-a26a6de92aa2 button');\n","      quickchartButtonEl.style.display =\n","        google.colab.kernel.accessAllowed ? 'block' : 'none';\n","    })();\n","  </script>\n","</div>\n","\n","    </div>\n","  </div>\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","variable_name":"yoda_df","summary":"{\n  \"name\": \"yoda_df\",\n  \"rows\": 103,\n  \"fields\": [\n    {\n      \"column\": \"text\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 102,\n        \"samples\": [\n          \" . . . close to you?\",\n          \" Surprised?\",\n          \" To fight this Lord Sidious, strong enough, you are not.\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"}},"metadata":{},"execution_count":29}],"source":["import pandas as pd\n","\n","# Read\n","yoda_df = pd.read_csv('datasets/yoda/yoda.csv')\n","\n","# Show\n","yoda_df.head()"]},{"cell_type":"code","execution_count":30,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"_WwqdKPIy6e3","outputId":"a0efa43f-7a3b-4d77-faea-f10d326086ec","executionInfo":{"status":"ok","timestamp":1720650150440,"user_tz":240,"elapsed":12,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"execute_result","data":{"text/plain":["(103, 1)"]},"metadata":{},"execution_count":30}],"source":["yoda_df.shape"]},{"cell_type":"markdown","metadata":{"id":"9OAeSk5Oyuik"},"source":["In total there are 103 lines of dialog we'll be using for fine-tuning our model. As a general rule, pre-training LLMs requires very large datasets and is highly computationally expensive, whereas fine-tuning them requires much smaller datasets and is not."]},{"cell_type":"markdown","metadata":{"id":"76fnNWalwEUW"},"source":["Now we will import the `load_dataset` function from the `datasets` library, to [load the CSV file](https://huggingface.co/docs/datasets/loading) into a format the Hugging Face expects. We pass a dictionary with a single key, `train` and value of the filename, here `yoda.csv`. We also pass a string of the `dataset_name`, which is the path (directory) where the data reside:"]},{"cell_type":"code","execution_count":31,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":84,"referenced_widgets":["b51a121daa394443970f334d6671688b","57db97b938a04966b48d77c509ffb525","173b43eb9bc74962a52821c5f5ce441c","69278d8631e8487c9158082207cdacac","bdf0a8ee2171422389baa5eecedba86b","daf3d480df264ff38d3d5cd1217adb2e","4e34a3dd228f4c22b5b9c1294c4f455c","1e81f31e3df946bc85b2a187453e5c0b","4ade9b5e523249abb7ec0708582e06fa","3b73922497ac477c94be4231a3ddc0aa","6adfd93c30fc44d384282c9f9e12dbc0"]},"id":"gYvrIrQIsNw3","outputId":"bb272676-f1bb-4d9d-e872-d391434db2d7","executionInfo":{"status":"ok","timestamp":1720650151077,"user_tz":240,"elapsed":645,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"stream","name":"stderr","text":["Repo card metadata block was not found. Setting CardData to empty.\n","WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.\n"]},{"output_type":"display_data","data":{"text/plain":["Generating train split: 0 examples [00:00, ? examples/s]"],"application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"b51a121daa394443970f334d6671688b"}},"metadata":{}}],"source":["from datasets import load_dataset\n","\n","data_files = {\"train\": \"yoda.csv\"}\n","dataset_name = 'datasets/yoda/'\n","dataset = load_dataset(dataset_name, data_files=data_files)"]},{"cell_type":"markdown","metadata":{"id":"qDyoH3-bwEUX"},"source":["Great, that seems to have worked. Let's do a quick check here and take a look at the dataset object:"]},{"cell_type":"code","execution_count":32,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"8JYtMj-twEUX","outputId":"80bd85f7-3307-461c-a154-99f6bc750ae0","executionInfo":{"status":"ok","timestamp":1720650151077,"user_tz":240,"elapsed":5,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"execute_result","data":{"text/plain":["DatasetDict({\n","    train: Dataset({\n","        features: ['text'],\n","        num_rows: 103\n","    })\n","})"]},"metadata":{},"execution_count":32}],"source":["# Quick check\n","dataset"]},{"cell_type":"code","execution_count":33,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"86SOVjvLzjoh","outputId":"65b1281b-2d94-41f6-f7a7-8790fb26e5ca","executionInfo":{"status":"ok","timestamp":1720650151077,"user_tz":240,"elapsed":4,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"execute_result","data":{"text/plain":["datasets.dataset_dict.DatasetDict"]},"metadata":{},"execution_count":33}],"source":["type(dataset)"]},{"cell_type":"markdown","metadata":{"id":"4Rba8xwIziHg"},"source":["We can see the dataset is a dataset dictionary from the `datasets` library, and contains a single feature, the column `text`. This is suitable for causal langauge modeling, so we may proceed."]},{"cell_type":"markdown","metadata":{"id":"GhpPig07wEUX"},"source":["#### Loading the Tokenizer and Model\n","\n","Now that we have the raw data, we need to preprocess it using a tokenizer. Here, we will just be using an [AutoTokenizer](https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoTokenizer) and load the that for the base type of model we are using, which in this case is GPT-2.\n","\n","We also need to load the base model with the original weights, the model that we will be fine-tuning. Since we are doing causal language modeling (*i.e.* text generation), here we will load GPT-2 using the [AutoModelforCausalLM](https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModelForCausalLM) class."]},{"cell_type":"code","execution_count":34,"metadata":{"id":"brc6pXaVjdSK","executionInfo":{"status":"ok","timestamp":1720650153425,"user_tz":240,"elapsed":2351,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[],"source":["import torch\n","from transformers import AutoTokenizer\n","from transformers import AutoModelForCausalLM\n","\n","# Use GPU\n","device = \"cuda:0\" if torch.cuda.is_available() else \"cpu\"\n","\n","# Instantiate model and tokenizer\n","tokenizer = AutoTokenizer.from_pretrained(\"gpt2\")\n","model = AutoModelForCausalLM.from_pretrained(\"gpt2\").to(device)"]},{"cell_type":"markdown","metadata":{"id":"solT_UNIwEUY"},"source":["Let's just do a quick check of our tokenizer and model now. First for tokenizing input text:"]},{"cell_type":"code","execution_count":35,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"m59tWYfz0PdB","outputId":"0d511d62-16d9-4d0c-e502-5d3bb04f4e61","executionInfo":{"status":"ok","timestamp":1720650153425,"user_tz":240,"elapsed":7,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"stream","name":"stdout","text":["{'input_ids': tensor([[ 464, 6290,  287, 8602]], device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1]], device='cuda:0')}\n"]}],"source":["# Generate inputs for model\n","input = tokenizer(\"The rain in Spain\", return_tensors=\"pt\").to(device)\n","print(input)"]},{"cell_type":"markdown","metadata":{"id":"rp-H4Ua31OcJ"},"source":["Next, we generate the model outputs by passing the input through the model:"]},{"cell_type":"code","execution_count":36,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"BhE4vLvd0e88","outputId":"2b49614c-3096-4ab7-f004-d5860131d31d","executionInfo":{"status":"ok","timestamp":1720650154753,"user_tz":240,"elapsed":1333,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"stream","name":"stderr","text":["Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"]},{"output_type":"stream","name":"stdout","text":["tensor([[  464,  6290,   287,  8602,   468,   587,   523,  2089,   326,   262,\n","          1748,   286, 15142,   468,   587,  4137,   284,  1969,   663,  8215,\n","            13,   198,   198,   464]], device='cuda:0')\n"]}],"source":["# Generate model outputs\n","output = model.generate(**input, max_new_tokens=20)\n","print(output)"]},{"cell_type":"markdown","metadata":{"id":"HPIsB4GM1S8Z"},"source":["Finally, we decode the model's output token ids back to text:"]},{"cell_type":"code","execution_count":37,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"qW7ZHaEXjs1I","outputId":"7af00b31-5019-4ddd-aa22-f23f130c8741","executionInfo":{"status":"ok","timestamp":1720650154753,"user_tz":240,"elapsed":15,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"stream","name":"stdout","text":["The rain in Spain has been so bad that the city of Barcelona has been forced to close its doors.\n","\n","The\n"]}],"source":["print(tokenizer.decode(output[0]))"]},{"cell_type":"markdown","metadata":{"id":"IObOjjZrwEUY"},"source":["Great! Everything appears to be working fine. Currently, our input dataset is all freeform text:"]},{"cell_type":"code","execution_count":38,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"a81j2YPg2AQW","outputId":"06d65480-354e-4dfd-8ed9-da7c9e564452","executionInfo":{"status":"ok","timestamp":1720650154753,"user_tz":240,"elapsed":13,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"execute_result","data":{"text/plain":["{'text': ['The very Republic is threatened, if involved the Sith are.',\n","  'Hard to see, the dark side is. Discover who this assassin is, we must.',\n","  'With this Naboo queen you must stay, Qui-Gon. Protect her.',\n","  'May the Force be with you.',\n","  \"(Cont'd) Master Qui-Gon more to say have you?\"]}"]},"metadata":{},"execution_count":38}],"source":["dataset['train'][0:5]"]},{"cell_type":"markdown","metadata":{"id":"mU3Uca1I2LPO"},"source":["We'll need to apply the tokenizer to each record to create a tokenized version to pass into the model as input. This is the name as what we did above, only now we need to do this for each row in the dataset. To do this, we'll create a simple function then apply it over the entire dataset using the `.map` method:"]},{"cell_type":"code","execution_count":39,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":49,"referenced_widgets":["b1657544ac924428bad89dc4c85cdcc8","5649660d73ef41c3b6c6ea03b09be4ca","734c4b4989744b1eb27248b89ae07090","5c8bdca7cbbc48bcbb669ad907ea0d02","6b536828ccc040cba0d818fab4568044","1e95cf7358d1460a99a20682ecd4c97d","baa94316e4824e24b80e76ce42d2c209","d78b5de0332e4d69844242e9bd8ddaec","32b6881377154016ab6ef51d70d38a1b","bcce6096c8e0482389e65e2ce7fa44ab","a30cdca663c54ed1bea1468142edff08"]},"id":"su5u5VVuv5Hi","outputId":"6d31e51b-8605-4a51-fb47-731c311e276f","executionInfo":{"status":"ok","timestamp":1720650154754,"user_tz":240,"elapsed":13,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"display_data","data":{"text/plain":["Map:   0%|          | 0/103 [00:00<?, ? examples/s]"],"application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"b1657544ac924428bad89dc4c85cdcc8"}},"metadata":{}}],"source":["# Add a padding token to the tokenizer (required)\n","tokenizer.pad_token = tokenizer.eos_token\n","\n","# Define tokenization function using the already instantiated tokenizer\n","def tokenize_function(data):\n","    my_tokenizer = tokenizer(data[\"text\"], padding=\"max_length\", truncation=True, return_tensors=\"pt\", max_length=128)\n","    return my_tokenizer\n","\n","# Apply the tokenizer function to each row of data in the dataset\n","tokenized_dataset = dataset.map(tokenize_function, batched=True)"]},{"cell_type":"markdown","metadata":{"id":"WwGUm1bz2nN8"},"source":["Now if we take a look at our tokenized dataset, we should see each row is a list of input ids for the tokens, plus an attention mask, as expected:"]},{"cell_type":"code","execution_count":40,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"ox9LZbwXwEUZ","outputId":"360bb179-d70f-44b2-9fd1-bd2814345089","executionInfo":{"status":"ok","timestamp":1720650154754,"user_tz":240,"elapsed":12,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"stream","name":"stdout","text":["{'text': 'The very Republic is threatened, if involved the Sith are.', 'input_ids': [464, 845, 2066, 318, 8556, 11, 611, 2950, 262, 26455, 389, 13, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}\n","{'text': 'Hard to see, the dark side is. Discover who this assassin is, we must.', 'input_ids': [17309, 284, 766, 11, 262, 3223, 1735, 318, 13, 29704, 508, 428, 31120, 318, 11, 356, 1276, 13, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}\n","{'text': 'With this Naboo queen you must stay, Qui-Gon. Protect her.', 'input_ids': [3152, 428, 36099, 2238, 16599, 345, 1276, 2652, 11, 2264, 72, 12, 38, 261, 13, 21916, 607, 13, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}\n","{'text': 'May the Force be with you.', 'input_ids': [6747, 262, 5221, 307, 351, 345, 13, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}\n","{'text': \"(Cont'd) Master Qui-Gon more to say have you?\", 'input_ids': [7, 4264, 1549, 8, 5599, 2264, 72, 12, 38, 261, 517, 284, 910, 423, 345, 30, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}\n"]}],"source":["for i in range(0,5):\n","  print(tokenized_dataset['train'][i])"]},{"cell_type":"markdown","metadata":{"id":"jEaptjcwwEUZ"},"source":["#### Training (Fine-tuning) the Model\n","\n","Now we can proceed to what we really want to do - fine-tuning the model! First we need to set up a `Trainer` object from Hugging Face, as well as a `TrainingArguments` object. This was being done for us previously in the training script, where each argument we gave to the script was passed along into the Trainer.\n","\n","We'll also import the `evaluate` package and load the accuracy metric from it, which will be used to evaluate the performance of our model as it is tuned:"]},{"cell_type":"code","execution_count":41,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":104,"referenced_widgets":["bed0ffc892df43e8a3576fb776de6774","32e42f9bb697473190c3722d296a1d87","da73cdc696bd45f2955de42e42cc9a93","f01158f7ad994646ba950ee65d56c00a","71902a253ac04225920ffd5ce8c18686","479fffa5d1aa417aa672f9c27787a03b","52ca7bceddf94614a6e8390a1016add5","cd1d0ded57284545805e4fc9cb8c86f8","319f422a71974ae884a1c3f5264353f9","a5afbd9e364548dc90d62fdba5f4d7d2","90b784cc1b3d4cc590e440da790b4a03"]},"id":"PXbhIhrVh2IV","outputId":"6758a88b-4e2f-4853-ea2c-db4bb4c83f80","executionInfo":{"status":"ok","timestamp":1720650157621,"user_tz":240,"elapsed":2876,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"stream","name":"stderr","text":["/usr/local/lib/python3.10/dist-packages/transformers/training_args.py:1474: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead\n","  warnings.warn(\n"]},{"output_type":"display_data","data":{"text/plain":["Downloading builder script:   0%|          | 0.00/4.20k [00:00<?, ?B/s]"],"application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"bed0ffc892df43e8a3576fb776de6774"}},"metadata":{}}],"source":["from transformers import TrainingArguments, Trainer\n","import evaluate\n","\n","# Set up the training arguments\n","training_args = TrainingArguments(\n","    output_dir=\"yoda-gpt2\",\n","    num_train_epochs=10,\n","    remove_unused_columns=True,\n","    evaluation_strategy=\"epoch\")\n","\n","# Set up the metric used to evaluate the training\n","metric = evaluate.load(\"accuracy\")"]},{"cell_type":"markdown","metadata":{"id":"vvct01mywEUZ"},"source":["Here, we need a utility function, called `compute_metrics`, to get the output probabilities (logits) for each token and token labels, then compute the predictions of the most likely token, and finally calculate the accuracy based upon these predictions with respect to the training data:"]},{"cell_type":"code","execution_count":42,"metadata":{"id":"x21MGLhP0DbX","executionInfo":{"status":"ok","timestamp":1720650157622,"user_tz":240,"elapsed":6,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[],"source":["def compute_metrics(eval_pred):\n","    logits, labels = eval_pred\n","    predictions = np.argmax(logits, axis=-1)\n","    return metric.compute(predictions=predictions, references=labels)"]},{"cell_type":"markdown","metadata":{"id":"xyhOzlTOwEUa"},"source":["We'll also need to create a [DataCollator](https://huggingface.co/docs/transformers/main_classes/data_collator) object. What does the data collator do? The data collator takes the input data and creates batches of it to pass into the model. Remember, underneath it all, a large language model is still just a deep learning model, and expects batches of input data for training.\n","\n","We create the data collator by importing the class from the `transformers` library, then instantiating it and passing the tokenizer object. We also set the argument `mlm=False` here, since we are doing causal language modeling, not masked language modeling:"]},{"cell_type":"code","execution_count":43,"metadata":{"id":"v3RTO9Ap24HR","executionInfo":{"status":"ok","timestamp":1720650157622,"user_tz":240,"elapsed":6,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[],"source":["from transformers import DataCollatorForLanguageModeling\n","\n","data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)"]},{"cell_type":"markdown","metadata":{"id":"JCWq7wg4452b"},"source":["Ok, let's take a look at the collator in action. We can pass it a sample of data from our dataset and take a look at the output:"]},{"cell_type":"code","execution_count":44,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"0bYOVGOK4_nx","outputId":"7b9dec2d-3fe6-4ad5-e5f6-504d07f9e210","executionInfo":{"status":"ok","timestamp":1720650157622,"user_tz":240,"elapsed":5,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"stream","name":"stdout","text":["Text:\n","['The very Republic is threatened, if involved the Sith are.', 'Hard to see, the dark side is. Discover who this assassin is, we must.', 'With this Naboo queen you must stay, Qui-Gon. Protect her.', 'May the Force be with you.']\n","\n","\n","Tokenized text:\n","[{'input_ids': [464, 845, 2066, 318, 8556, 11, 611, 2950, 262, 26455, 389, 13], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}, {'input_ids': [17309, 284, 766, 11, 262, 3223, 1735, 318, 13, 29704, 508, 428, 31120, 318, 11, 356, 1276, 13], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}, {'input_ids': [3152, 428, 36099, 2238, 16599, 345, 1276, 2652, 11, 2264, 72, 12, 38, 261, 13, 21916, 607, 13], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}, {'input_ids': [6747, 262, 5221, 307, 351, 345, 13], 'attention_mask': [1, 1, 1, 1, 1, 1, 1]}]\n","\n","\n","Collated data\n","{'input_ids': tensor([[  464,   845,  2066,   318,  8556,    11,   611,  2950,   262, 26455,\n","           389,    13, 50256, 50256, 50256, 50256, 50256, 50256],\n","        [17309,   284,   766,    11,   262,  3223,  1735,   318,    13, 29704,\n","           508,   428, 31120,   318,    11,   356,  1276,    13],\n","        [ 3152,   428, 36099,  2238, 16599,   345,  1276,  2652,    11,  2264,\n","            72,    12,    38,   261,    13, 21916,   607,    13],\n","        [ 6747,   262,  5221,   307,   351,   345,    13, 50256, 50256, 50256,\n","         50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0],\n","        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],\n","        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],\n","        [1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'labels': tensor([[  464,   845,  2066,   318,  8556,    11,   611,  2950,   262, 26455,\n","           389,    13,  -100,  -100,  -100,  -100,  -100,  -100],\n","        [17309,   284,   766,    11,   262,  3223,  1735,   318,    13, 29704,\n","           508,   428, 31120,   318,    11,   356,  1276,    13],\n","        [ 3152,   428, 36099,  2238, 16599,   345,  1276,  2652,    11,  2264,\n","            72,    12,    38,   261,    13, 21916,   607,    13],\n","        [ 6747,   262,  5221,   307,   351,   345,    13,  -100,  -100,  -100,\n","          -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100]])}\n","\n","\n"]}],"source":["# Sample text\n","texts = dataset['train'][0:4]['text']\n","print(\"Text:\")\n","print(texts)\n","print(\"\\n\")\n","\n","# Tokenize\n","print(\"Tokenized text:\")\n","tokens = [tokenizer(t) for t in texts]\n","print(tokens)\n","print(\"\\n\")\n","\n","# Collate\n","print(\"Collated data\")\n","\n","dataloader = torch.utils.data.DataLoader(dataset=tokens, collate_fn=data_collator, batch_size=4)\n","\n","for batch in dataloader:\n","    print(batch)\n","print(\"\\n\")"]},{"cell_type":"markdown","metadata":{"id":"k2RXiMw-8jrs"},"source":["Here we can see that the collator has reformatted the sample data (4 records) into a batch, where each key has an array of the different inputs (`input_ids`, `attention_mask`, and `labels`). We probably don't need to worry about this level of detail, but this is the format the model expects the data in, so we are really just using the data collator to restructure everything."]},{"cell_type":"markdown","metadata":{"id":"GqSdTQKGwEUa"},"source":["Finally, we can instantiate a `Trainer` object, passing in the base model to be fine-tuned, the training arguments, datasets for training and evaluation, associated evaluation metric(s), and the data collator, as defined above:"]},{"cell_type":"code","execution_count":45,"metadata":{"id":"R87EerabiiOd","executionInfo":{"status":"ok","timestamp":1720650157622,"user_tz":240,"elapsed":4,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[],"source":["trainer = Trainer(\n","    model=model,\n","    args=training_args,\n","    train_dataset=tokenized_dataset[\"train\"],\n","    eval_dataset=tokenized_dataset[\"train\"],\n","    data_collator=data_collator,\n",")"]},{"cell_type":"markdown","metadata":{"id":"qSgxEwylwEUb"},"source":["Now that everything is good to go, we can simply call `trainer.train()` and Hugging Face takes care of the rest!"]},{"cell_type":"code","execution_count":46,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":423},"id":"2PwK1rkRwEUb","outputId":"fe123726-5905-457f-fd29-40ad02a03528","executionInfo":{"status":"ok","timestamp":1720650207287,"user_tz":240,"elapsed":49669,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.HTML object>"],"text/html":["\n","    <div>\n","      \n","      <progress value='130' max='130' style='width:300px; height:20px; vertical-align: middle;'></progress>\n","      [130/130 00:47, Epoch 10/10]\n","    </div>\n","    <table border=\"1\" class=\"dataframe\">\n","  <thead>\n"," <tr style=\"text-align: left;\">\n","      <th>Epoch</th>\n","      <th>Training Loss</th>\n","      <th>Validation Loss</th>\n","    </tr>\n","  </thead>\n","  <tbody>\n","    <tr>\n","      <td>1</td>\n","      <td>No log</td>\n","      <td>3.601274</td>\n","    </tr>\n","    <tr>\n","      <td>2</td>\n","      <td>No log</td>\n","      <td>3.097119</td>\n","    </tr>\n","    <tr>\n","      <td>3</td>\n","      <td>No log</td>\n","      <td>2.685338</td>\n","    </tr>\n","    <tr>\n","      <td>4</td>\n","      <td>No log</td>\n","      <td>2.355658</td>\n","    </tr>\n","    <tr>\n","      <td>5</td>\n","      <td>No log</td>\n","      <td>2.095090</td>\n","    </tr>\n","    <tr>\n","      <td>6</td>\n","      <td>No log</td>\n","      <td>1.899915</td>\n","    </tr>\n","    <tr>\n","      <td>7</td>\n","      <td>No log</td>\n","      <td>1.736595</td>\n","    </tr>\n","    <tr>\n","      <td>8</td>\n","      <td>No log</td>\n","      <td>1.626166</td>\n","    </tr>\n","    <tr>\n","      <td>9</td>\n","      <td>No log</td>\n","      <td>1.555423</td>\n","    </tr>\n","    <tr>\n","      <td>10</td>\n","      <td>No log</td>\n","      <td>1.531016</td>\n","    </tr>\n","  </tbody>\n","</table><p>"]},"metadata":{}},{"output_type":"execute_result","data":{"text/plain":["TrainOutput(global_step=130, training_loss=2.6315586970402642, metrics={'train_runtime': 48.3882, 'train_samples_per_second': 21.286, 'train_steps_per_second': 2.687, 'total_flos': 67282698240000.0, 'train_loss': 2.6315586970402642, 'epoch': 10.0})"]},"metadata":{},"execution_count":46}],"source":["trainer.train()"]},{"cell_type":"markdown","metadata":{"id":"tm8joz8VXbpW"},"source":["Great! We can visualize the model training (loss), as it is stored in the `Trainer` state:"]},{"cell_type":"code","execution_count":47,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":472},"id":"sRJLKyeKXhAy","outputId":"dbfbf9f3-74d6-4381-c450-e35ea893fb48","executionInfo":{"status":"ok","timestamp":1720650207773,"user_tz":240,"elapsed":502,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"display_data","data":{"text/plain":["<Figure size 640x480 with 1 Axes>"],"image/png":"iVBORw0KGgoAAAANSUhEUgAAAjcAAAHHCAYAAABDUnkqAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAABTsElEQVR4nO3deVhUZRsG8PsMywzbjCyyCQKiiYjiiqKllppbKmqLpqlploqaSxtf5ZIVlVmWlmaLa2ZpqWluuKeiiIqJ+4qKLCI7yDZzvj+IyZFFQODMcv+ua67izHvOPMMiN+c8530FURRFEBERERkJmdQFEBEREdUkhhsiIiIyKgw3REREZFQYboiIiMioMNwQERGRUWG4ISIiIqPCcENERERGheGGiIiIjArDDRERERkVhhsySaNHj4a3t3e19p09ezYEQajZgozQvHnz0KhRI5iZmaFVq1ZSl0MP2LdvHwRBwL59+6QupcYIgoDZs2dXeb/r169DEAQsX768xmsiaTDckF4RBKFSD2P6B7kqRo8eDVtbW6nLeKidO3firbfeQufOnbFs2TJ8/PHHdfK6f//9N55//nk0aNAAlpaWUKlU6NChAz744AMkJSXpjO3WrZvO95SDgwPat2+Pn376CRqNRvvLvzKPsty9exfz5s1Dly5dUL9+fdSrVw8dO3bEr7/++tD38cknn0AQBOzYsaPM5/v27QuVSoXbt29X/ZNEZALMpS6A6H6rVq3S+XjlypWIiIgotb1Zs2aP9Drff/89NBpNtfZ977338M477zzS6xu7PXv2QCaT4ccff4SlpWWdvObMmTMxd+5cNGrUCKNHj0ajRo2Ql5eH48ePY/78+VixYgWuXLmis4+HhwfCw8MBAHfu3MHKlSsxduxYXLx4EdOmTSv1fRcWFgZbW1u8++67D60nMjIS7777Lvr27Yv33nsP5ubm+P333zF06FCcPXsWc+bMKXffGTNmYM2aNZg4cSJiY2NhZWWlfW7dunXYtm0bvvnmG7i7u1flU0RkOkQiPRYaGipW5ts0JyenDqqR3qhRo0QbGxupy3iol19+uUbr1Gg0Ym5ubrnPr127VgQgPv/882J+fn6p59PT08VZs2bpbOvatavYvHlznW05OTmih4eHaGNjIxYUFJQ6TvPmzcWuXbtWquarV6+K169fL/U+nnrqKVEul4vZ2dkV7h8ZGSnKZDIxLCxMuy0zM1N0d3cXO3bsKKrV6krVUZ69e/eKAMS9e/c+0nH0CYBSX+fKuHbtmghAXLZsWY3XRNLgZSkyON26dUNAQACOHz+OLl26wNraGv/73/8AAJs2bUK/fv3g7u4OuVwOX19fzJ07F2q1WucYD/bclFxz//zzz7F06VL4+vpCLpejffv2OHbsmM6+ZfXcCIKASZMmYePGjQgICIBcLkfz5s2xffv2UvXv27cP7dq1g0KhgK+vL7777rsa7+NZt24d2rZtCysrKzg5OWHEiBGIj4/XGZOYmIiXX34ZHh4ekMvlcHNzw8CBA3H9+nXtmOjoaPTq1QtOTk6wsrKCj48PxowZU+FrC4KAZcuWIScnR3vZpqSXoaioCHPnztV+fr29vfG///0P+fn5Osfw9vbGM888gx07dqBdu3awsrLCd999V+5rzpw5E05OTuWeKVKpVJXqxbC2tkbHjh2Rk5ODO3fuPHR8RXx8fODl5aWzTRAEhISEID8/H1evXq1w/44dO2L8+PH4/PPPcfbsWQDFZw2Tk5OxdOlSyGQyXL16Fc899xwcHBy0tf/111+ljnXr1i2EhITAxsYGzs7OmDZtWqnPOVB8We+5555Dw4YNIZfL4enpiWnTpuHevXsPfb/Lly+HIAg4ePAgpkyZor0U99prr6GgoADp6ekYOXIk7O3tYW9vj7feeguiKOocIycnBzNmzICnpyfkcjmaNm2Kzz//vNS4/Px8TJs2DfXr14ednR0GDBiAW7dulVlXfHw8xowZAxcXF+3P5U8//fTQ90OGjZelyCDdvXsXffr0wdChQzFixAi4uLgAKP4H1tbWFtOnT4etrS327NmDmTNnIjMzE/PmzXvocdesWYOsrCy89tprEAQBn332GQYPHoyrV6/CwsKiwn0PHjyIP/74AxMnToSdnR2+/vprDBkyBDdu3ICjoyMA4OTJk+jduzfc3NwwZ84cqNVqfPDBB6hfv/6jf1L+tXz5crz88sto3749wsPDkZSUhK+++gqHDh3CyZMnUa9ePQDAkCFDcObMGUyePBne3t5ITk5GREQEbty4of346aefRv369fHOO++gXr16uH79Ov74448KX3/VqlVYunQpoqKi8MMPPwAAOnXqBAB45ZVXsGLFCjz77LOYMWMGjh49ivDwcJw7dw4bNmzQOc6FCxcwbNgwvPbaaxg3bhyaNm1a5utdvHgRFy9exCuvvFIj/UhXr16FmZmZ9vNU0xITEwEATk5ODx0bHh6OjRs34rXXXsOCBQvwzTff4M0330SLFi2QlJSETp06ITc3F1OmTIGjoyNWrFiBAQMGYP369Rg0aBAA4N69e+jevTtu3LiBKVOmwN3dHatWrcKePXtKvd66deuQm5uLCRMmwNHREVFRUVi4cCFu3bqFdevWVer9TZ48Ga6urpgzZw6OHDmCpUuXol69ejh8+DAaNmyIjz/+GFu3bsW8efMQEBCAkSNHAgBEUcSAAQOwd+9ejB07Fq1atcKOHTvw5ptvIj4+Hl9++aX2NV555RWsXr0aL774Ijp16oQ9e/agX79+pWpJSkpCx44dtX981K9fH9u2bcPYsWORmZmJqVOnVuo9kQGS+MwRUYXKuizVtWtXEYC4ZMmSUuPLunTx2muvidbW1mJeXp5226hRo0QvLy/txyWnpR0dHcXU1FTt9k2bNokAxM2bN2u3zZo1q1RNAERLS0vx8uXL2m2nTp0SAYgLFy7Ubuvfv79obW0txsfHa7ddunRJNDc3r9Tlt4ddliooKBCdnZ3FgIAA8d69e9rtW7ZsEQGIM2fOFEVRFNPS0kQA4rx588o91oYNG0QA4rFjxx5aV2XqjImJEQGIr7zyis72N954QwQg7tmzR7vNy8tLBCBu3779oa9V8jVasGCBznaNRiPeuXNH51FYWKh9vmvXrqKfn5/2uXPnzolTpkwRAYj9+/cv87WqclmqLHfv3hWdnZ3FJ554otL7rF+/XgQgOjg4iI0aNdJ+j0+dOlUEIP7999/asVlZWaKPj4/o7e2tvWy1YMECEYD422+/acfl5OSIjRs3LnVZqqyfn/DwcFEQBDEuLq7COpctWyYCEHv16iVqNBrt9uDgYFEQBHH8+PHabUVFRaKHh4fO53Ljxo0iAPHDDz/UOe6zzz4rCoKg/dkq+T6aOHGizrgXX3yx1GWpsWPHim5ubmJKSorO2KFDh4oqlUr7fnlZyvjwshQZJLlcjpdffrnU9vsbL7OyspCSkoInnngCubm5OH/+/EOP+8ILL8De3l778RNPPAEAD72EAAA9evSAr6+v9uOWLVtCqVRq91Wr1di1axdCQkJ0GkEbN26MPn36PPT4lREdHY3k5GRMnDgRCoVCu71fv37w8/PTXrKwsrKCpaUl9u3bh7S0tDKPVXLmYsuWLSgsLHzk2rZu3QoAmD59us72GTNmAECpyyk+Pj7o1avXQ4+bmZkJAKXO2mRkZKB+/fo6j5iYGJ0x58+f1z7XrFkzLFy4EP369auVyxYajQbDhw9Heno6Fi5cWOn9hgwZgr59+yI1NRXffPON9nt869atCAoKwuOPP64da2tri1dffRXXr1/XXsraunUr3Nzc8Oyzz2rHWVtb49VXXy31Wvf//OTk5CAlJQWdOnWCKIo4efJkpeodO3asziXWDh06QBRFjB07VrvNzMwM7dq10/m52rp1K8zMzDBlyhSd482YMQOiKGLbtm3acQBKjXvwLIwoivj999/Rv39/iKKIlJQU7aNXr17IyMjAiRMnKvWeyPAw3JBBKrnV90FnzpzBoEGDoFKpoFQqUb9+fYwYMQJA8S+7h2nYsKHOxyVBp7wAUNG+JfuX7JucnIx79+6hcePGpcaVta064uLiAKDMSzh+fn7a5+VyOT799FNs27YNLi4u6NKlCz777DPtJRMA6Nq1K4YMGYI5c+bAyckJAwcOxLJly8rs1ahsbTKZrNR7dXV1Rb169bS1lfDx8anUce3s7AAA2dnZOtttbW0RERGBiIgIvPnmm2Xu6+3tjYiICOzatQsHDx5EYmIitmzZUqlLRiVSU1ORmJiofZT3fTZ58mRs374dP/zwAwIDAyt9fABo3749AKBdu3babXFxcWV+nUvuJCz5fMbFxaFx48alerrK2vfGjRsYPXo0HBwcYGtri/r166Nr164AKvfzA5T+OVCpVAAAT0/PUtvv/7mKi4uDu7u79utZ0fuRyWQ6f0iU9X7u3LmD9PR0LF26tFTILfnDKDk5uVLviQwPe27IIN3/F2aJ9PR0dO3aFUqlEh988AF8fX2hUChw4sQJvP3225W69dvMzKzM7eIDDY01va8Upk6div79+2Pjxo3YsWMH3n//fYSHh2PPnj1o3bo1BEHA+vXrceTIEWzevBk7duzAmDFjMH/+fBw5cqTa/S2VbZwu62tcFj8/PwBAbGysznZzc3P06NEDAMptNrWxsdGOqa7Bgwdj//792o9HjRpVajK4OXPm4Ntvv8Unn3yCl1566ZFer7ao1Wr07NkTqampePvtt+Hn5wcbGxvEx8dj9OjRlZ46obyfg7K21+bPRkm9I0aMwKhRo8oc07Jly1p7fZIWww0ZjX379uHu3bv4448/0KVLF+32a9euSVjVf5ydnaFQKHD58uVSz5W1rTpK7s65cOECnnrqKZ3nLly4UOruHV9fX8yYMQMzZszApUuX0KpVK8yfPx+rV6/WjunYsSM6duyIjz76CGvWrMHw4cOxdu1avPLKK1WuTaPR4NKlSzrzFCUlJSE9Pb1UbZXVtGlTNGnSBBs3bsSCBQtgY2NTreNU1/z583XOQDw498w333yD2bNnY+rUqXj77bdr7HW9vLxw4cKFUttLLr+WfD69vLwQGxsLURR1guWD+54+fRoXL17EihUrtE2+ABAREVFjNVfEy8sLu3btQlZWls7Zm7Lej0ajwZUrV3TO1jz4fkrupFKr1Y8cYMnw8LIUGY2Svwzv/2uwoKAA3377rVQl6TAzM0OPHj2wceNGnZllL1++rO0neFTt2rWDs7MzlixZonP5aNu2bTh37pz2jpLc3Fzk5eXp7Ovr6ws7OzvtfmlpaaX+si5ZRqE6l6b69u0LAFiwYIHO9i+++AIAyrzbpbJmz56NlJQUjBs3rsz+oNo8Q9C2bVv06NFD+/D399c+9+uvv2LKlCkYPny49n3WlL59+yIqKgqRkZHabTk5OVi6dCm8vb21dfTt2xe3b9/G+vXrteNyc3OxdOlSneOV9fMjiiK++uqrGq27PH379oVarcaiRYt0tn/55ZcQBEHbl1by36+//lpn3IPfV2ZmZhgyZAh+//33Umf1ADzyrf6k33jmhoxGp06dYG9vj1GjRmHKlCkQBAGrVq3Sq8tCs2fPxs6dO9G5c2dMmDBB+495QEBAqWbX8hQWFuLDDz8std3BwQETJ07Ep59+ipdffhldu3bFsGHDtLeCe3t7Y9q0aQCKb5/u3r07nn/+efj7+8Pc3BwbNmxAUlIShg4dCgBYsWIFvv32WwwaNAi+vr7IysrC999/D6VSqQ0qVREYGIhRo0Zh6dKl2kuIUVFRWLFiBUJCQvDkk09W+ZglXnzxRcTGxiI8PBxRUVEYOnQofHx8kJOTg9jYWPzyyy+ws7PTaRavbVFRURg5ciQcHR3RvXt3/PzzzzrPd+rUCY0aNar28d955x388ssv6NOnD6ZMmQIHBwesWLEC165dw++//w6ZrPhv13HjxmHRokUYOXIkjh8/Djc3N6xatQrW1tY6x/Pz84Ovry/eeOMNxMfHQ6lU4vfff69Uv1lN6N+/P5588km8++67uH79OgIDA7Fz505s2rQJU6dO1fbYtGrVCsOGDcO3336LjIwMdOrUCbt37y7z7Ocnn3yCvXv3okOHDhg3bhz8/f2RmpqKEydOYNeuXUhNTa2T90YSkOAOLaJKK+9W8Adnli1x6NAhsWPHjqKVlZXo7u4uvvXWW+KOHTtK3fJa3q3gZd0ajQduLy3vVvDQ0NBS+3p5eYmjRo3S2bZ7926xdevWoqWlpejr6yv+8MMP4owZM0SFQlHOZ+E/o0aNEgGU+fD19dWO+/XXX8XWrVuLcrlcdHBwEIcPHy7eunVL+3xKSooYGhoq+vn5iTY2NqJKpRI7dOigc7vwiRMnxGHDhokNGzYU5XK56OzsLD7zzDNidHR0peos65b1wsJCcc6cOaKPj49oYWEhenp6imFhYTq36Zd83vr16/fQ13nQvn37xGeffVZ0c3MTLSwsRKVSKbZr106cNWuWmJCQoDO2ou+j8lTlVvCSW6PLe1TltuOS77k7d+7obL9y5Yr47LPPivXq1RMVCoUYFBQkbtmypdT+cXFx4oABA0Rra2vRyclJfP3118Xt27eX+rk4e/as2KNHD9HW1lZ0cnISx40bp53S4GH1lrzfB6cOKK/2sr5HsrKyxGnTponu7u6ihYWF2KRJE3HevHk6t5aLoijeu3dPnDJliujo6Cja2NiI/fv3F2/evFnmDMVJSUliaGio6OnpKVpYWIiurq5i9+7dxaVLl2rH8FZw4yOIoh79WUtkokJCQnDmzBlcunRJ6lKIiAwee26I6tiDU9lfunQJW7duRbdu3aQpiIjIyPDMDVEdc3Nz065aHRcXh8WLFyM/Px8nT55EkyZNpC6PiMjgsaGYqI717t0bv/zyCxITEyGXyxEcHIyPP/6YwYaIqIbwzA0REREZFfbcEBERkVFhuCEiIiKjYnI9NxqNBrdv34adnV2l17ghIiIiaYmiiKysLLi7u2snqSyPyYWb27dvl1qdloiIiAzDzZs34eHhUeEYkws3JQuy3bx5E0qlUuJqiIiIqDIyMzPh6emps7BqeUwu3JRcilIqlQw3REREBqYyLSVsKCYiIiKjwnBDRERERoXhhoiIiIwKww0REREZFYYbIiIiMioMN0RERGRUGG6IiIjIqDDcEBERkVFhuCEiIiKjYnIzFNcWtUZE1LVUJGflwdlOgSAfB5jJuDAnERFRXWO4qQHbYxMwZ/NZJGTkabe5qRSY1d8fvQPcJKyMiIjI9PCy1CPaHpuACatP6AQbAEjMyMOE1SewPTZBosqIiIhME8PNI1BrRMzZfBZiGc+VbJuz+SzUmrJGEBERUW1guHkEUddSS52xuZ8IICEjD1HXUuuuKCIiIhPHcPMIkrPKDzbVGUdERESPjuHmETjbKWp0HBERET06hptHEOTjADeVAuXd8C2g+K6pIB+HuiyLiIjIpDHcPAIzmYBZ/f0BoMyAIwKY1d+f890QERHVIYabR9Q7wA2LR7SBq6r0pSc3lQLdm7lIUBUREZHp4iR+NaB3gBt6+rtqZyi2tjTDW+v/QUJGHn48eA3ju/pKXSIREZHJ4JmbGmImExDs64iBrRqgp78r/te3GQDgq12XcCstV+LqiIiITAfDTS15tq0HgrwdcK9QjQ82n5W6HCIiIpPBcFNLBEHA3JAAmMsE7DybhN3nkqQuiYiIyCQw3NSipq52GPu4DwBg1p9ncK9ALXFFRERExo/hppZN6d4E7ioFbqXdw8I9l6Quh4iIyOgx3NQyG7k5Zg1oDgD4/u+ruJycJXFFRERExk3ScLN48WK0bNkSSqUSSqUSwcHB2LZtW7njly9fDkEQdB4Khf4vbfC0vwu6+zmjUC3ivY2xEEWuEk5ERFRbJA03Hh4e+OSTT3D8+HFER0fjqaeewsCBA3HmzJly91EqlUhISNA+4uLi6rDi6hEEAbMHNIfCQoYjV1OxMSZe6pKIiIiMlqThpn///ujbty+aNGmCxx57DB999BFsbW1x5MiRcvcRBAGurq7ah4uLYcwA7OlgjclPNQEAfPTXOWTkFkpcERERkXHSm54btVqNtWvXIicnB8HBweWOy87OhpeXFzw9PR96lgcA8vPzkZmZqfOQyrgnGsG3vg1Ssgvw+c4LktVBRERkzCQPN6dPn4atrS3kcjnGjx+PDRs2wN/fv8yxTZs2xU8//YRNmzZh9erV0Gg06NSpE27dulXu8cPDw6FSqbQPT0/P2norD2VpLsPckAAAwOqjcTh1M12yWoiIiIyVIErc3VpQUIAbN24gIyMD69evxw8//ID9+/eXG3DuV1hYiGbNmmHYsGGYO3dumWPy8/ORn5+v/TgzMxOenp7IyMiAUqmssfdRFVPXnsTGmNto0UCFjaGduWo4ERHRQ2RmZkKlUlXq97fkZ24sLS3RuHFjtG3bFuHh4QgMDMRXX31VqX0tLCzQunVrXL58udwxcrlcezdWyUNq7/bzh53CHKfjM7D6iP43RBMRERkSycPNgzQajc6Zloqo1WqcPn0abm5utVxVzapvJ8dbvZoCAD7fcQHJWXkSV0RERGQ8JA03YWFhOHDgAK5fv47Tp08jLCwM+/btw/DhwwEAI0eORFhYmHb8Bx98gJ07d+Lq1as4ceIERowYgbi4OLzyyitSvYVqe7GDF1p6qJCVX4SP/jondTlERERGQ9Jwk5ycjJEjR6Jp06bo3r07jh07hh07dqBnz54AgBs3biAhIUE7Pi0tDePGjUOzZs3Qt29fZGZm4vDhw5Xqz9E3ZjIBH4YEQBCATTG3cehyitQlERERGQXJG4rrWlUakurCzE2xWBkZh0b1bbDt9ScgNzeTuiQiIiK9Y1ANxaZuxtNN4WQrx9U7Ofj+wFWpyyEiIjJ4DDcSU1lZ4P1nmgEAFu65jBt3cyWuiIiIyLAx3OiBAYHu6OTriPwiDWb9yYU1iYiIHgXDjR4QBAEfDAyAhZmAvRfuYMeZJKlLIiIiMlgMN3qisbMtXuviCwCYs/kMcvKLJK6IiIjIMDHc6JFJTzWGp4MVEjLy8NXuS1KXQ0REZJAYbvSIwsIMcwY0BwD8ePAazidKt4I5ERGRoWK40TNP+bmgV3MXqDUi3tsQC42GzcVERERVwXCjh2b1bw5rSzNEx6Vh/YlbUpdDRERkUBhu9JB7PStM7dEEABC+9RzScgokroiIiMhwMNzoqZc7+6Cpix3Scgvx6fbzUpdDRERkMBhu9JSFmQwfDgoAAKw9dhPH49IkroiIiMgwMNzosfbeDniurQcA4L2NsShSaySuiIiISP8x3Oi5sL7NUM/aAucSMrH88HWpyyEiItJ7DDd6zsHGEm/39gMAfBlxEQkZ9ySuiIiISL8x3BiAF9p5ok3DesgpUOPDLeekLoeIiEivMdwYAJlMwIchLWAmE/DX6QTsv3hH6pKIiIj0FsONgfB3V2J0J28AwMxNscgrVEtbEBERkZ5iuDEg03o+BhelHHF3c/HtvitSl0NERKSXGG4MiK3cHDOfKV5Yc8m+K7iWkiNxRURERPqH4cbA9G3hii6P1UeBWoOZm2IhilxYk4iI6H4MNwZGEAR8MKA5LM1l+PtSCrb8kyB1SURERHqF4cYAeTvZYGI3XwDA3C1nkZVXKHFFRERE+oPhxkCN7+oLb0drJGfl44uIi1KXQ0REpDcYbgyUwsIMc0OKF9Zccfg6YuMzJK6IiIhIPzDcGLAnmtTHMy3doBGLF9bUaNhcTERExHBj4N5/xh+2cnPE3EzH2mM3pS6HiIhIcgw3Bs5FqcD0no8BAD7dfh4p2fkSV0RERCQthhsjMDLYC/5uSmTcK0T41vNSl0NERCQphhsjYG4mw0eDAiAIwO8nbuHo1btSl0RERCQZhhsj0bqhPYa2bwiguLm4UK2RuCIiIiJpMNwYkbd7N4WjjSUuJWfjx4PXpC6HiIhIEgw3RqSetSXC+jYDAHy16xJupeVKXBEREVHdY7gxMkPaNECQtwPuFaoxZ/NZqcshIiKqcww3RkYQBHw4KADmMgERZ5Ow62yS1CURERHVKYYbI/SYix3GPuEDAJi9+QzuFaglroiIiKjuMNwYqde7N0GDela4lXYPC/dckrocIiKiOsNwY6SsLc0xq78/AOD7v6/icnKWxBURERHVDYYbI9bT3wXd/ZxRqBbx3sZYiCIX1iQiIuPHcGPEBEHA7AHNobCQ4cjVVGyMiZe6JCIiolrHcGPkPB2sMfmpJgCAj/46h4zcQokrIiIiql0MNyZg3BON4FvfBinZBZi3kwtrEhGRcWO4MQGW5jLMDQkAAPx89AZO3UyXtiAiIqJaxHBjIjr5OmFQ6wYQxeKFNdUaNhcTEZFxYrgxIf/r2wxKhTlOx2dg9ZE4qcshIiKqFQw3JqS+nRxv9vYDAHy+4wKSM/MkroiIiKjmMdyYmBeDGqKlhwpZ+UX4aOs5qcshIiKqcQw3JsZMJuCjkBaQCcCmmNs4dDlF6pKIiIhqFMONCWrhocJLHb0AAO9vjEV+ERfWJCIi48FwY6Jm9GoKJ1s5rqbk4PsDV6Uuh4iIqMYw3JgopcIC7z/TDACwcM9l3LibK3FFRERENYPhxoQNCHRH58aOyC/SYNafXFiTiIiMA8ONCRMEAR8MDIClmQx7L9zBjjOJUpdERET0yBhuTJxvfVu81rURAGDO5rPIyS+SuCIiIqJHw3BDCH2yMTwdrJCQkYcvd11E5JW72BQTj8grd7lMAxERGRxBNLFGi8zMTKhUKmRkZECpVEpdjt7Yez4ZLy8/Vmq7m0qBWf390TvATYKqiIiIilXl9zfP3BAAlDvXTWJGHiasPoHtsQl1XBEREVH1MNwQ1BoRczafLfO5ktN6czaf5SUqIiIyCAw3hKhrqUjIKH8RTRFAQkYeoq6l1l1RRERE1cRwQ0jOqtzq4JUdR0REJCWGG4KznaJGxxEREUmJ4YYQ5OMAN5UCQgVj3FQKBPk41FlNRERE1cVwQzCTCZjV3x8Ayg04k59qDDNZRfGHiIhIPzDcEACgd4AbFo9oA1eV7qUnC7PiQPP7iXgUqjVSlEZERFQlnMSPdKg1IqKupSI5Kw/Odgq4KhUY8M1BZOUV4bWujRDWp5nUJRIRkQniJH5UbWYyAcG+jhjYqgGCfR3hU98G855tCQD4bv9V7DmfJHGFREREFWO4oYfqHeCG0Z28AQDTfzuF2+n3pC2IiIioAgw3VClhff3Q0kOF9NxCTP7lJPtviIhIbzHcUKXIzc2waFgb2MnNcTwuDZ/vvCB1SURERGViuKFKa+hojc/Yf0NERHpO0nCzePFitGzZEkqlEkqlEsHBwdi2bVuF+6xbtw5+fn5QKBRo0aIFtm7dWkfVEgD0acH+GyIi0m+ShhsPDw988sknOH78OKKjo/HUU09h4MCBOHPmTJnjDx8+jGHDhmHs2LE4efIkQkJCEBISgtjY2Dqu3LSF9fVDiwbsvyEiIv2kd/PcODg4YN68eRg7dmyp51544QXk5ORgy5Yt2m0dO3ZEq1atsGTJkkodn/Pc1Iwbd3PR7+u/kZXP+W+IiKj2GeQ8N2q1GmvXrkVOTg6Cg4PLHBMZGYkePXrobOvVqxciIyProkS6D/tviIhIX0kebk6fPg1bW1vI5XKMHz8eGzZsgL+/f5ljExMT4eLiorPNxcUFiYmJ5R4/Pz8fmZmZOg+qGey/ISIifSR5uGnatCliYmJw9OhRTJgwAaNGjcLZs2dr7Pjh4eFQqVTah6enZ40dm9h/Q0RE+kfycGNpaYnGjRujbdu2CA8PR2BgIL766qsyx7q6uiIpSffyR1JSElxdXcs9flhYGDIyMrSPmzdv1mj9pk5uboZvXvxv/pv5Oy9KXRIREZk4ycPNgzQaDfLz88t8Ljg4GLt379bZFhERUW6PDgDI5XLtreYlD6pZ9/ffLNl/BXvPJ0tcERERmTJJw01YWBgOHDiA69ev4/Tp0wgLC8O+ffswfPhwAMDIkSMRFhamHf/6669j+/btmD9/Ps6fP4/Zs2cjOjoakyZNkuot0L90+29i2H9DRESSkTTcJCcnY+TIkWjatCm6d++OY8eOYceOHejZsycA4MaNG0hISNCO79SpE9asWYOlS5ciMDAQ69evx8aNGxEQECDVW6D7lPTfpLH/hoiIJKR389zUNs5zU7vi7ubgma8PIiu/COO7+uKdPn5Sl0REREbAIOe5IePg5WiDT9l/Q0REEmK4oRrXt4UbRgV7AWD/DRER1T2GG6oV/+vXDAENlOy/ISKiOsdwQ7WC898QEZFUGG6o1rD/hoiIpMBwQ7Xqwf6bhAz23xARUe1iuKFap9N/s+Ykith/Q0REtYjhhmrd/f030XFpmB/B/hsiIqo9DDdUJ+7vv1m87wr2XmD/DRER1Q6GG6ozOv03v7L/hoiIagfDDdUp9t8QEVFtY7ihOsX+GyIiqm0MN1TnvBxt8MkQ9t8QEVHtYLghSfRr6YaR7L8hIqJawHBDkvlfX/bfEBFRzWO4IckoLNh/Q0RENY/hhiTF/hsiIqppDDckOfbfEBFRTWK4Ib1wf//NlF/Yf0NERNXHcEN64f7+m2PX0/AF+2+IiKiaGG5Ib9zff/PtvivYx/4bIiKqBoYb0iv9WrrhpY7/9t/8dor9N0REVGUMN6R33u3XDM3dlUjNKWD/DRERVRnDDemdkv4bW/bfEBFRNTDckF7ydrLBJ0NaAGD/DRERVQ3DDemtZ1q6s/+GiIiqjOGG9Br7b4iIqKoYbkivsf+GiIiqiuGG9B77b4iIqCoYbsggsP+GiIgqi+GGDAb7b4iIqDIYbshgsP+GiIgqg+GGDMqD/Tf7L96RuCIiItI3DDdkcJ5p6Y4RHRsCAKb9GoPEjDyJKyIiIn3CcEMG6b1+/vB3Y/8NERGVxnBDBklhYYZvhhf330RdT8WXu9h/Q0RExRhuyGD5ONkgfHBx/803e9l/Q0RExRhuyKD1D2T/DRER6WK4IYPH/hsiIrofww0ZPPbfEBHR/RhuyCg82H+z93wyIq/cxaaYeEReuQu1RpS4QiIiqivmUhdAVFP6B7rj6LW7WH3kBsasOAbxvjzjplJgVn9/9A5wk65AIiKqEzxzQ0YlyNsBAHSCDQAkZuRhwuoT2B6bIEFVRERUlxhuyGioNSLCt50v87mSrDNn81leoiIiMnIMN2Q0oq6lIqGCW8FFAAkZeYi6llp3RRERUZ1juCGjkZxVuTluKjuOiIgME8MNGQ1nO0WNjiMiIsPEcENGI8jHAW4qBYQKxjjaWCLIx6HOaiIiorrHcENGw0wmYFZ/fwAoN+DkFBThfGJm3RVFRER1juGGjErvADcsHtEGrirdS0+uSgUa17dFXqEGI3+MwtU72RJVSEREtU0QxQdnBDFumZmZUKlUyMjIgFKplLocqiVqjYioa6lIzsqDs50CQT4OyCkowovfH0FsfCbcVQqsm9AJDepZSV0qERFVQlV+fzPckEm5m52P57+LxJU7OWjkZINfXwtGfTu51GUREdFDVOX3d7UuS928eRO3bt3SfhwVFYWpU6di6dKl1TkcUZ1xtJVj9Ssd0KCeFa6m5GDkT1HIuFcodVlERFSDqhVuXnzxRezduxcAkJiYiJ49eyIqKgrvvvsuPvjggxotkKimuamssPqVDnCyleNcQibGLj+G3IIiqcsiIqIaUq1wExsbi6CgIADAb7/9hoCAABw+fBg///wzli9fXpP1EdUKHycbrBobBKXCHNFxaXht1XHkF6mlLouIiGpAtcJNYWEh5PLiPoVdu3ZhwIABAAA/Pz8kJHBhQjIMzdyUWPZyEKwszPD3pRRM+zWG604RERmBaoWb5s2bY8mSJfj7778RERGB3r17AwBu374NR0fHGi2QqDa19bLH0pFtYWkmw9bTiQj74x+YWI89EZHRqVa4+fTTT/Hdd9+hW7duGDZsGAIDAwEAf/75p/ZyFZGheKJJfXw9rBVkAvBb9C18+Nc5BhwiIgNW7VvB1Wo1MjMzYW9vr912/fp1WFtbw9nZucYKrGm8FZzKsy76Jt5c/w8AYHrPxzClexOJKyIiohK1fiv4vXv3kJ+frw02cXFxWLBgAS5cuKDXwYaoIs+188TMZ4qXb/gi4iKWH7omcUVERFQd1Qo3AwcOxMqVKwEA6enp6NChA+bPn4+QkBAsXry4RgskqktjHvfB1B7FZ2xmbz6L34/fesgeRESkb6oVbk6cOIEnnngCALB+/Xq4uLggLi4OK1euxNdff12jBRLVtde7N8GYzj4AgLd+/wc7ziRKXBEREVVFtcJNbm4u7OzsAAA7d+7E4MGDIZPJ0LFjR8TFxdVogUR1TRAEvNevGZ5r6wG1RsTkNSdx6HKK1GUREVElVSvcNG7cGBs3bsTNmzexY8cOPP300wCA5ORkNumSUZDJBIQPboE+Aa4oUGswbmU0TtxIk7osIiKqhGqFm5kzZ+KNN96At7c3goKCEBwcDKD4LE7r1q1rtEAiqZibybBgaCs80cQJuQVqjP4pCucSMqUui4iIHqLat4InJiYiISEBgYGBkMmKM1JUVBSUSiX8/PxqtMiaxFvBqapyC4rw0o9ROB6XBidbOdaPD4a3k43UZRERmZSq/P6udrgpUbI6uIeHx6Mcps4w3FB1ZNwrxNClR3AuIRMN6llh/YRguKmspC6LiMhk1Po8NxqNBh988AFUKhW8vLzg5eWFevXqYe7cudBoNNUqmkifqawssHJMEHycbBCffg8jfjiKu9n5UpdFRERlqFa4effdd7Fo0SJ88sknOHnyJE6ePImPP/4YCxcuxPvvv1/TNRLphfp2cqwaGwQ3lQJX7uRg9LJjyMorlLosIiJ6QLUuS7m7u2PJkiXa1cBLbNq0CRMnTkR8fHyNFVjTeFmKHtXl5Gy88F0k7uYUIMjHASvHBEFhYSZ1WURERq3WL0ulpqaW2TTs5+eH1NTUSh8nPDwc7du3h52dHZydnRESEoILFy5UuM/y5cshCILOQ6FQVPk9EFVXY2dbrBgTBDu5OaKupWLC6uMoKOLlWCIifVGtcBMYGIhFixaV2r5o0SK0bNmy0sfZv38/QkNDceTIEURERKCwsBBPP/00cnJyKtxPqVQiISFB++DEgVTXAhqo8OPo9lBYyLD3wh3MWHcKag1XEici0gfm1dnps88+Q79+/bBr1y7tHDeRkZG4efMmtm7dWunjbN++Xefj5cuXw9nZGcePH0eXLl3K3U8QBLi6ulandKIaE+TjgCUj2mLcymhsPnUbdgpzfBQSAEEQpC6NiMikVevMTdeuXXHx4kUMGjQI6enpSE9Px+DBg3HmzBmsWrWq2sVkZGQAABwcHCocl52dDS8vL3h6emLgwIE4c+ZMtV+T6FF0a+qMBS+0hkwA1hy9gU+3V3xZlYiIat8jz3Nzv1OnTqFNmzZQq9VV3lej0WDAgAFIT0/HwYMHyx0XGRmJS5cuoWXLlsjIyMDnn3+OAwcO4MyZM2XOtZOfn4/8/P9u2c3MzISnpycbiqlGrY26gXf+OA0AeKt3U0zs1ljiioiIjEutNxTXhtDQUMTGxmLt2rUVjgsODsbIkSPRqlUrdO3aFX/88Qfq16+P7777rszx4eHhUKlU2oenp2dtlE8mbmhQQ7zbtxkA4LPtF7DqCPvAiIikohfhZtKkSdiyZQv27t1b5ZmOLSws0Lp1a1y+fLnM58PCwpCRkaF93Lx5syZKJiplXJdGmPxU8RmbmZtisSlGf6dEICIyZpKGG1EUMWnSJGzYsAF79uyBj49PlY+hVqtx+vRpuLm5lfm8XC6HUqnUeRDVluk9H8OoYC+IIjD9t1PYdTZJ6pKIiExOle6WGjx4cIXPp6enV+nFQ0NDsWbNGmzatAl2dnZITEwEAKhUKlhZFa/bM3LkSDRo0ADh4eEAgA8++AAdO3ZE48aNkZ6ejnnz5iEuLg6vvPJKlV6bqDYIgoBZ/ZsjM68IG07GY+KaE1jxchCCfR2lLo2IyGRUKdyoVKqHPj9y5MhKH2/x4sUAgG7duulsX7ZsGUaPHg0AuHHjhnbVcQBIS0vDuHHjkJiYCHt7e7Rt2xaHDx+Gv79/pV+XqDbJZALmPdsS2flFiDibhFdWHMOacR0R6FlP6tKIiExCjd4tZQi4/ALVlbxCNcYsP4bDV+6inrUFfnstGI+52EldFhGRQTLIu6WIjI3CwgxLR7ZDoGc9pOcW4qUfj+Jmaq7UZRERGT2GG6JaZCs3x4qX26Opix2SMvMx/IejSM7Mk7osIiKjxnBDVMvqWVti1dggNHSwxo3UXIz48SjScgqkLouIyGgx3BDVAWelAj+/0gEuSjkuJmVj9PJjyM4vkrosIiKjxHBDVEc8HayxemwH2Ftb4NTNdIxbEY28wqovVUJERBVjuCGqQ01c7LBiTBBs5eaIvHoXk385iUK1RuqyiIiMCsMNUR1r6VEP349sB0tzGSLOJuGt9f9AozGpGRmIiGoVww2RBIJ9HbF4eBuYywRsOBmP2ZvPwMSmnCIiqjUMN0QS6d7MBfOfD4QgACsj4/BFxEWpSyIiMgoMN0QSGtiqAeYODAAALNxzGUsPXJG4IiIiw8dwQySxER298FbvpgCAj7eex9qoGxJXRERk2BhuiPTAxG6NMb6rLwAgbMNpbPnntsQVEREZriqtCk5Eteft3k2RmVeINUdvYNqvMbCRm6NLk/qIupaK5Kw8ONspEOTjADOZIHWpRER6jeGGSE8IgoC5AwOQlVeEzadu49WV0VAqLHD3vqUa3FQKzOrvj94BbhJWSkSk33hZikiPmMkEfPF8IAIaKFGoFnWCDQAkZuRhwuoT2B6bIFGFRET6j+GGSM/IBAEpWfllPlcyE86czWeh5sR/RERlYrgh0jNR11KRmFl2uAGKA05CRh6irqXWXVFERAaE4YZIzyRn5dXoOCIiU8NwQ6RnnO0UNTqOiMjUMNwQ6ZkgHwe4qRSo6IZva0sztG5Yr65KIiIyKAw3RHrGTCZgVn9/ACg34OQWqDFuZTQy7hXWXWFERAaC4YZID/UOcMPiEW3gqtK99OSmUmB810awsjDD35dSMPjbQ4i7myNRlURE+kkQRdGk7ifNzMyESqVCRkYGlEql1OUQVUitEcucoTg2PgPjVkYjISMP9tYW+O6ldgjycZC6XCKiWlOV398MN0QGKjkzD6+sjMY/tzJgYSbgk8EtMaSth9RlERHViqr8/uZlKSID5axU4NdXg9G3hSsK1SJmrDuFz7afh4aT+xGRiWO4ITJgVpZmWDSsDSY92RgA8O2+K5j48wncK1BLXBkRkXQYbogMnEwm4I1eTfHF84GwNJNh+5lEPP9dJJIyOckfEZkmhhsiIzG4jQd+HtcBDjaWOB2fgYGLDiE2PkPqsoiI6hzDDZERae/tgI0TO6Oxsy0SM/Pw3JJI7DiTKHVZRER1iuGGyMg0dLTGHxM74YkmTrhXqMb41cfx3f4rMLEbI4nIhDHcEBkhpcICy0a3x0sdvSCKQPi283j7939QUKSRujQiolrHcENkpMzNZJgbEoDZ/f0hE4Dfom/hpR+PIi2nQOrSiIhqFcMNkZEb3dkHP45uD1u5OY5eS8Wgbw/hyp1sqcsiIqo1DDdEJuDJps74fUIneNhb4frdXAz65hAOX06RuiwiolrBcENkIpq62mFjaGe0aVgPmXlFGPlTFH6JuiF1WURENY7hhsiEONnKsWZcRwxs5Y4ijYiwP07jwy1noeaSDURkRBhuiEyMwsIMC15ohek9HwMA/HDwGl5bFY2c/CKJKyMiqhkMN0QmSBAETOneBAuHtYbcXIZd55Lx7JJI3E6/J3VpRESPjOGGyIT1D3TH2lc7wslWjnMJmRj4zSHE3EyXuiwiokfCcENk4lo3tMemSZ3h52qHO1n5eOG7SGz557bUZRERVRvDDRGhQT0rrJ/QCd39nJFfpMGkNSexcPclLtlARAaJ4YaIAAC2cnMsHdkOYx/3AQDMj7iIab/GIK9QLXFlRERVw3BDRFpmMgHvP+OPjwe1gLlMwMaY2xj+w1GkZOdLXRoRUaUx3BBRKS92aIgVY4KgVJjjeFwaQr45hItJWVKXRURUKQw3RFSmzo2d8MfEzvBytMattHsY8u1h7LuQLHVZREQPxXBDROVq7GyLjRM7I8jHAVn5RRiz/BhWHL4udVlERBViuCGiCtnbWGL12A54rq0HNCIw688zmLkpFkVqjdSlERGVieGGiB7K0lyGz55tiXf6+EEQgJWRcRizIhqZeYVSl0ZEVArDDRFViiAIGN/VF4uHt4WVhRkOXLyDId8exs3UXKlLIyLSwXBDRFXSO8AV68YHw0Upx6XkbAz85hCir6dKXRYRkRbDDRFVWUADFTaFPo6ABkqk5hTgxe+PYsPJW1KXRUQEgOGGiKrJVaXAb68Fo3dzVxSoNZj26ynM33kBGg2XbCAiaTHcEFG1WVua49vhbTCxmy8AYOGey5j8y0ncK+CSDUQkHYYbInokMpmAt3r74fPnAmFhJuCv0wkYujQSyZl5UpdGRCaK4YaIasSzbT2wemwH1LO2wKlbGQj55hDO3s4EAKg1IiKv3MWmmHhEXrkLNS9dEVEtEkRRNKl/ZTIzM6FSqZCRkQGlUil1OURG53pKDsasOIard3JgbWmGUcHe2BgTj4SM/87kuKkUmNXfH70D3CSslIgMSVV+f/PMDRHVKG8nG2yY0BmPN3ZCboEai/df0Qk2AJCYkYcJq09ge2yCRFUSkTFjuCGiGqeytsAPo9rB2tKszOdLThfP2XyWl6iIqMYx3BBRrTh5Ix25Fdw1JQJIyMhD1DVOAEhENYvhhohqRXJW5e6Wquw4IqLKYrgholrhbKeo0XFERJXFcENEtSLIxwFuKgWEh4zbdzEZeYWc9I+Iag7DDRHVCjOZgFn9/QGgwoDz3f6r6LXgAA5eSqmbwojI6DHcEFGt6R3ghsUj2sBVpXvpyU2lwJIRbbD0pbZwVSoQdzcXI348ium/xSA1p0CiaonIWHASPyKqdWqNiKhrqUjOyoOznQJBPg4wkxWfz8nKK8T8nRexIvI6RBGwt7bAe/38MbhNAwjCwy5qEZGpqMrvb4YbItILJ2+kIeyP0zifmAUA6NzYER+FtIC3k43ElRGRPuAMxURkcFo3tMfmyY/jrd5NITeX4dDlu+i14AC+3XcZhWqN1OURkQFhuCEivWFhJsPEbo2xc1oXPN7YCflFGny2/QL6LzyIkzfSpC6PiAwEww0R6R0vRxusGhuEL54PhL21Bc4nZmHw4sOY/ecZZOcXSV0eEek5hhsi0kuCIGBwGw/sntENg9s0gCgCyw9fR88v9iPibJLU5RGRHmO4ISK95mBjiS+eb4XVYzugoYM1EjLyMG5lNMavOo6kTC7dQESlSRpuwsPD0b59e9jZ2cHZ2RkhISG4cOHCQ/dbt24d/Pz8oFAo0KJFC2zdurUOqiUiKT3exAk7pnbBhG6+MJMJ2H4mET3m78eqI3HQcGVxIrqPpOFm//79CA0NxZEjRxAREYHCwkI8/fTTyMnJKXefw4cPY9iwYRg7dixOnjyJkJAQhISEIDY2tg4rJyIpWFma4e3eftgy+XEEetZDVn4R3t8Yi+e+i8TFpCypyyMiPaFX89zcuXMHzs7O2L9/P7p06VLmmBdeeAE5OTnYsmWLdlvHjh3RqlUrLFmy5KGvwXluiIyDWiNi9ZE4fLb9PHIK1LAwEzC+qy9Cn2wMhYWZ1OURUQ0z2HluMjIyAAAODg7ljomMjESPHj10tvXq1QuRkZFljs/Pz0dmZqbOg4gMn5lMwKhO3oiY3hU9mrmgUC1i4Z7L6PPV3zh8hetUEZkyvQk3Go0GU6dORefOnREQEFDuuMTERLi4uOhsc3FxQWJiYpnjw8PDoVKptA9PT88arZuIpOVezwrfj2yLJSPawNlOjmspOXjx+6N4c90ppHGdKiKTpDfhJjQ0FLGxsVi7dm2NHjcsLAwZGRnax82bN2v0+EQkPUEQ0DvADbtmdMWIjg0BAOuO30KPL/ZjU0w89OjqOxHVAb0IN5MmTcKWLVuwd+9eeHh4VDjW1dUVSUm6c1wkJSXB1dW1zPFyuRxKpVLnQUTGSamwwIchLbB+fDCaONvibk4BXl8bg5E/ReHG3VypyyOiOiJpuBFFEZMmTcKGDRuwZ88e+Pj4PHSf4OBg7N69W2dbREQEgoODa6tMIjIw7bwd8NeUJ/DG04/B0lyGvy+l4OkF+/Hd/iso4jpVREZP0nATGhqK1atXY82aNbCzs0NiYiISExNx79497ZiRI0ciLCxM+/Hrr7+O7du3Y/78+Th//jxmz56N6OhoTJo0SYq3QER6ytJchklPNcH2159Ax0YOyCvUIHzbeQxYdAj/3EqXujwiqkWS3gouCEKZ25ctW4bRo0cDALp16wZvb28sX75c+/y6devw3nvv4fr162jSpAk+++wz9O3bt1KvyVvBiUyPKIpYd/wWPvrrHDLuFUImAKM7+WDG04/BRm4udXlEVAlV+f2tV/Pc1AWGGyLTlZKdj7lbzmJTzG0AQIN6Vpgb0hxP+bk8ZE8ikprBznNDRFSbnGzl+Gpoa6wYEwQPeyvEp9/DmOXRCF1zAslZXKeKyFgw3BCRyen6WH3snNYFr3VpBDOZgL/+SUD3+fux5ugNrlNFZAQYbojIJFlbmiOsbzNsCu2MFg1UyMorwv82nMYLSyNxOZnrVBEZMoYbIjJpAQ1U2DCxE95/xh/WlmY4dj0Nfb76G19GXER+kVrq8oioGthQTET0r1tpuZi56Qz2nE8GADSqb4PwQS3QoZEjgOLFOqOupSI5Kw/OdgoE+TjATFb2XZ9EVLN4t1QFGG6IqCKiKOKv0wmY/edZpGTnAwCGtvdEe28HfL7zAhIy/ms8dlMpMKu/P3oHuElVLpHJYLipAMMNEVVGRm4hPtl+Hr9E3Sh3TMk5m8Uj2jDgENUy3gpORPSIVNYWCB/cAmvHdSz30lPJX4ZzNp+FmndZEekNhhsiogqIQIXBRQSQkJGHqGupdVYTEVWM4YaIqAKVndyPkwAS6Q+GGyKiCjjbKSo1LjY+gyuOE+kJhhsiogoE+TjATaXAw274/v7va3j6ywPY8s9tznJMJDGGGyKiCpjJBMzq7w8ApQKO8O/j2bYecLCxxNWUHExacxL9Fx3E/ot3YGI3oxLpDd4KTkRUCdtjEzBn89ly57nJzi/CD39fxQ9/X0N2fhEAoIOPA97q7Ye2XvZSlU1kNDjPTQUYboiouiozQ/Hd7Hx8u+8KVh2JQ0FRcQ9Oj2YueLNXUzR1tZOibCKjwHBTAYYbIqoLt9Pv4atdl7Du+E1oREAQgEGtGmBaz8fg6WAtdXlEBofhpgIMN0RUly4nZ+OLiAvYejoRAGBhJuDFoIaY9FQT1LeTS1wdkeFguKkAww0RSeGfW+mYt+MC/r6UAgCwsjDD2Md98GrXRlAqLCSujkj/MdxUgOGGiKR0+HIKPt1xAadupgMAVFYWmNjNF6M6eUNhYSZtcUR6jOGmAgw3RCQ1URSx40wSPt95AZeTswEALko5Xu/+GJ5r5wELM87SQfQghpsKMNwQkb5Qa0T8ceIWFuy6hPj0ewAAb0drTH+6KZ5p4QZZOQt2EpkihpsKMNwQkb7JL1JjzdEbWLTnMu7mFAAA/N2UeLN3U3R7rD4EgSGHiOGmAgw3RKSvsvOL8NPBa1h64Kp2IsAgHwe83bsp2no5SFwdkbQYbirAcENE+i41pwCL913Gisj/JgLs7ueMN3o1RTM3/rtFponhpgIMN0RkKG6n38PXuy9h3fFbUGtECAIwMNAd03s2RUNHTgRIpoXhpgIMN0RkaK7cycYXOy/ir9MJAABzmYBhQQ0xuXtjONspJK6OqG4w3FSA4YaIDNXpWxmYt/MCDly8A6B4IsCXO3vjta6+UFlxIkAybgw3FWC4ISJDF3nlLj7bcR4nb6QDKJ4IcHxXX4zu5A0rS04ESMaJ4aYCDDdEZAxEUUTE2eKJAC8mFU8E6Gwnx5TuTfBCe09OBEhGh+GmAgw3RGRM1BoRm2Li8UXERdxKK54I0MvRGtN7Pob+Ld05ESAZDYabCjDcEJExyi9S45ejN7Bo72WkZBdPBNjMTYm3ejVFt6acCJAMH8NNBRhuiMiY5dw3EWDWvxMBtve2x1u9/dDeu3giQLVGRNS1VCRn5cHZToEgHweY8QwP6TmGmwow3BCRKUjLKcCS/Vew/PB15P87EeBTfs4IbuSInw5dQ0JGnnasm0qBWf390TvATapyiR6K4aYCDDdEZEoSM/Lw1e5L+C36JtSasv+5Lzlns3hEGwYc0ltV+f3NdnoiIiPmqlIgfHALbH/9CSgsyv4nvyTyzNl8ttwARGRIGG6IiExASnYB8go15T4vAkjIyEPUtdS6K4qoljDcEBGZgOSsvIcPAjBzUyx+i76JzLzCWq6IqPYw3BARmYDKrkF1KTkbb63/B+0+3IXQn09g55lE7crkRIbCXOoCiIio9gX5OMBNpUBiRh7K6qoRANS3k+OlYC9sirmNy8nZ+Ot0Av46nYB61hbo18INIa0boG1De04MSHqPd0sREZmI7bEJmLD6BADoBJwH75YSRRFnbmdi48l4bDp1G3ey8rVjPeytMLCVOwa1boDGznZ1VzyZPN4KXgGGGyIyZdtjEzBn89lKz3Oj1oiIvHIXG07GY3tsAnIK1NrnmrsrMah1A/QPdIeLsnKXvYiqi+GmAgw3RGTqqjtD8b0CNXadS8LGk/HYf/EOiv69bVwmAJ18nRDSugF6NXeBncKitt8CmSCGmwow3BARPbrUnAL89c9tbIy5jeNxadrtcnMZevq7IKRVA3R5rD4szXnfCtUMhpsKMNwQEdWsuLs52BRzGxtj4nH1To52u721BZ5p6Y6Q1u5o09Cei3fSI2G4qQDDDRFR7RBFEafjM7Dx5G38eeo2UrL/a0Ru6GCNga3cMbBVAzR2tpWwSjJUDDcVYLghIqp9RWoNDl+5i40n47H9TCJy72tEbtFAhZDWDdA/0K3S8+8QMdxUgOGGiKhu5RYUIeJscSPygUsp2vWrZALQubETBrVugKebu8JWzqnXqHwMNxVguCEikk5Kdj7++icBG2PicfJGuna7wkKGp/1dEdLaHU80qQ8LMzYiky6Gmwow3BAR6YfrKf81Il9L+a8R2cHGEv1bumFg6wZo7Vmvwkbk6t7WToaH4aYCDDdERPpFFEWcupWBjSfjsfnUbdzNKdA+5+VojYGtGiCklTsa1ddtRK7qhIRk2BhuKsBwQ0Skv4rUGhy8nIKNJ+Ox40wS7hX+14gc6FHciPxMS3ccj0vFhNUnSq2T9eBSEmQ8GG4qwHBDRGQYcvKLG5E3nIzHwcu6jcjmZrJyVysXALiqFDj49lO8RGVEqvL7m63pRESkl2zk5ghp3QAhrRvgTlY+tvw7I/Kpm+nlBhugeFHQhIw8RF1LRbCvY90VTHqD7ehERKT36tvJ8XJnH2wK7Yx3+zar1D5Hr91FfpH64QPJ6PDMDRERGZSABqpKjVuw6xK+3XcFLRuo0NbLXvtwtJXXcoUkNYYbIiIyKEE+DnBTKZCYkVeqobiE3FwGa0szpOUWIjouDdH3Le7p42SDtl72aPdv2PGtbwsZe3OMChuKiYjI4GyPTcCE1ScAQCfg3H+3VK/mrrh+NxfR11NxPC4Nx+PScCk5u9SxVFYWOmd2Aj3qwcrSrPbfBFUJ75aqAMMNEZFxqM48N+m5BThxozjoRF9Pw6lb6cgr1G1ONpcJaN5AhbYN7dHOu/gMj7OSa2BJjeGmAgw3RETG41FnKC5Ua3D2diai49JwPC4V0dfTkJyVX2qch71V8WUsbwe0bWiPpq52vM28jjHcVIDhhoiIyiOKIm6l3dNexoqOS8P5xEw8+JvSTm6OVg3r/du744BWDetx4c9axnBTAYYbIiKqiqy8QsTcTEf09eLAc/JGGnIKdG8xlwmAn6sS7bz/691pUM+qwnWx7sc1sh6O4aYCDDdERPQo1BoR5xMz/zu7cz0N8en3So1zVSrQ1tte27vTzE1Z5mrnXCOrchhuKsBwQ0RENS0xI+/fy1jFd2aduZ2pXS6ihJWFGVp5Fl/KauttjzYN7RF5JYVrZFUSw00FGG6IiKi25RYU4dTNDByP++829My8olLjzGUCijRl/xrmGlm6GG4qwHBDRER1TaMRcflOtvYy1vG4VFy/m1upfbs85gQ/VyXsrS3haGMJextLONhY/PuxHHYKc72ZhLA2e4cYbirAcENERPpg9ZE4vLcx9pGPYyYTYG9dHHbsbe4LQGV+bAFHG3mtTFJY271DXBWciIhIz/nWt63UuKHtPWGnMMfdnAKk5RQgNbew+L85BcjOL4JaIyIluwAp2QWVfm2FhUwbfhz+fdhb//vfkkCk/bg4OJXVDF2iZMboB8+WJGbkYcLqE3XeO8RwQ0REJIGHrZFV0nPz0aAW5V7ayS9SIz23EKn/hp3UnAKk5f7735yC4kCUW4DUnP8CUYFag7xCDW5n5OH2fWdZHkapMNeGHwfr/0KRytoC3+2/WuZ7EP99H3M2n0VPf9c66x1iuCEiIpKAmUzArP7+mLD6BASUvUbWrP7+FQYCubkZXJRmcKnk8hCiKCKnQK0NOg8GorI+Tr9XCFEEMvOKkJlXVOleIe1rAkjIyEPUtVQE+zpWad/qYrghIiKSSO8ANywe0aZUr4prLc1zIwgCbOXmsJWbw9PBulL7qDUiMu4VakPP3Wzds0Onbqbj2H2rrpcnOavyZ4kelaTh5sCBA5g3bx6OHz+OhIQEbNiwASEhIeWO37dvH5588slS2xMSEuDq6lqLlRIREdWO3gFu6OnvqrczFJvJBO0lqLJEXrmLYd8feehxnO3qbvFRScNNTk4OAgMDMWbMGAwePLjS+124cEGnU9rZ2bk2yiMiIqoTZjKhzi7Z1LTK9g4F+TjUWU2Shps+ffqgT58+Vd7P2dkZ9erVq/mCiIiIqEpqoneoppV/X5cea9WqFdzc3NCzZ08cOnSowrH5+fnIzMzUeRAREVHNKekdclXpXnpyVSkkWULCoBqK3dzcsGTJErRr1w75+fn44Ycf0K1bNxw9ehRt2rQpc5/w8HDMmTOnjislIiIyLfrUO6Q3MxQLgvDQhuKydO3aFQ0bNsSqVavKfD4/Px/5+fnajzMzM+Hp6ckZiomIiAyISc1QHBQUhIMHD5b7vFwuh1wur8OKiIiISEoG2XNzv5iYGLi5cTl4IiIiKibpmZvs7GxcvnxZ+/G1a9cQExMDBwcHNGzYEGFhYYiPj8fKlSsBAAsWLICPjw+aN2+OvLw8/PDDD9izZw927twp1VsgIiIiPSNpuImOjtaZlG/69OkAgFGjRmH58uVISEjAjRs3tM8XFBRgxowZiI+Ph7W1NVq2bIldu3aVObEfERERmSa9aSiuK1VpSCIiIiL9UJXf3wbfc0NERER0P4YbIiIiMioMN0RERGRUGG6IiIjIqBj8JH5VVdI/zTWmiIiIDEfJ7+3K3AdlcuEmKysLAODp6SlxJURERFRVWVlZUKlUFY4xuVvBNRoNbt++DTs7OwhCzS7mVbJu1c2bN3mbuR7g10O/8OuhX/j10D/8mlRMFEVkZWXB3d0dMlnFXTUmd+ZGJpPBw8OjVl9DqVTyG1OP8OuhX/j10C/8eugffk3K97AzNiXYUExERERGheGGiIiIjArDTQ2Sy+WYNWsW5HK51KUQ+PXQN/x66Bd+PfQPvyY1x+QaiomIiMi48cwNERERGRWGGyIiIjIqDDdERERkVBhuiIiIyKgw3NSQb775Bt7e3lAoFOjQoQOioqKkLslkhYeHo3379rCzs4OzszNCQkJw4cIFqcuif33yyScQBAFTp06VuhSTFR8fjxEjRsDR0RFWVlZo0aIFoqOjpS7LJKnVarz//vvw8fGBlZUVfH19MXfu3Eqtn0TlY7ipAb/++iumT5+OWbNm4cSJEwgMDESvXr2QnJwsdWkmaf/+/QgNDcWRI0cQERGBwsJCPP3008jJyZG6NJN37NgxfPfdd2jZsqXUpZistLQ0dO7cGRYWFti2bRvOnj2L+fPnw97eXurSTNKnn36KxYsXY9GiRTh37hw+/fRTfPbZZ1i4cKHUpRk03gpeAzp06ID27dtj0aJFAIrXr/L09MTkyZPxzjvvSFwd3blzB87Ozti/fz+6dOkidTkmKzs7G23atMG3336LDz/8EK1atcKCBQukLsvkvPPOOzh06BD+/vtvqUshAM888wxcXFzw448/arcNGTIEVlZWWL16tYSVGTaeuXlEBQUFOH78OHr06KHdJpPJ0KNHD0RGRkpYGZXIyMgAADg4OEhciWkLDQ1Fv379dH5WqO79+eefaNeuHZ577jk4OzujdevW+P7776Uuy2R16tQJu3fvxsWLFwEAp06dwsGDB9GnTx+JKzNsJrdwZk1LSUmBWq2Gi4uLznYXFxecP39eoqqohEajwdSpU9G5c2cEBARIXY7JWrt2LU6cOIFjx45JXYrJu3r1KhYvXozp06fjf//7H44dO4YpU6bA0tISo0aNkro8k/POO+8gMzMTfn5+MDMzg1qtxkcffYThw4dLXZpBY7ghoxYaGorY2FgcPHhQ6lJM1s2bN/H6668jIiICCoVC6nJMnkajQbt27fDxxx8DAFq3bo3Y2FgsWbKE4UYCv/32G37++WesWbMGzZs3R0xMDKZOnQp3d3d+PR4Bw80jcnJygpmZGZKSknS2JyUlwdXVVaKqCAAmTZqELVu24MCBA/Dw8JC6HJN1/PhxJCcno02bNtptarUaBw4cwKJFi5Cfnw8zMzMJKzQtbm5u8Pf319nWrFkz/P777xJVZNrefPNNvPPOOxg6dCgAoEWLFoiLi0N4eDjDzSNgz80jsrS0RNu2bbF7927tNo1Gg927dyM4OFjCykyXKIqYNGkSNmzYgD179sDHx0fqkkxa9+7dcfr0acTExGgf7dq1w/DhwxETE8NgU8c6d+5camqEixcvwsvLS6KKTFtubi5kMt1fxWZmZtBoNBJVZBx45qYGTJ8+HaNGjUK7du0QFBSEBQsWICcnBy+//LLUpZmk0NBQrFmzBps2bYKdnR0SExMBACqVClZWVhJXZ3rs7OxK9TvZ2NjA0dGRfVASmDZtGjp16oSPP/4Yzz//PKKiorB06VIsXbpU6tJMUv/+/fHRRx+hYcOGaN68OU6ePIkvvvgCY8aMkbo0g8ZbwWvIokWLMG/ePCQmJqJVq1b4+uuv0aFDB6nLMkmCIJS5fdmyZRg9enTdFkNl6tatG28Fl9CWLVsQFhaGS5cuwcfHB9OnT8e4ceOkLsskZWVl4f3338eGDRuQnJwMd3d3DBs2DDNnzoSlpaXU5RkshhsiIiIyKuy5ISIiIqPCcENERERGheGGiIiIjArDDRERERkVhhsiIiIyKgw3REREZFQYboiIiMioMNwQkckTBAEbN26UugwiqiEMN0QkqdGjR0MQhFKP3r17S10aERkori1FRJLr3bs3li1bprNNLpdLVA0RGTqeuSEiycnlcri6uuo87O3tARRfMlq8eDH69OkDKysrNGrUCOvXr9fZ//Tp03jqqadgZWUFR0dHvPrqq8jOztYZ89NPP6F58+aQy+Vwc3PDpEmTdJ5PSUnBoEGDYG1tjSZNmuDPP/+s3TdNRLWG4YaI9N7777+PIUOG4NSpUxg+fDiGDh2Kc+fOAQBycnLQq1cv2Nvb49ixY1i3bh127dqlE14WL16M0NBQvPrqqzh9+jT+/PNPNG7cWOc15syZg+effx7//PMP+vbti+HDhyM1NbVO3ycR1RCRiEhCo0aNEs3MzEQbGxudx0cffSSKoigCEMePH6+zT4cOHcQJEyaIoiiKS5cuFe3t7cXs7Gzt83/99Zcok8nExMREURRF0d3dXXz33XfLrQGA+N5772k/zs7OFgGI27Ztq7H3SUR1hz03RCS5J598EosXL9bZ5uDgoP3/4OBgneeCg4MRExMDADh37hwCAwNhY2Ojfb5z587QaDS4cOECBEHA7du30b179wpraNmypfb/bWxsoFQqkZycXN23REQSYrghIsnZ2NiUukxUU6ysrCo1zsLCQudjQRCg0WhqoyQiqmXsuSEivXfkyJFSHzdr1gwA0KxZM5w6dQo5OTna5w8dOgSZTIamTZvCzs4O3t7e2L17d53WTETS4ZkbIpJcfn4+EhMTdbaZm5vDyckJALBu3Tq0a9cOjz/+OH7++WdERUXhxx9/BAAMHz4cs2bNwqhRozB79mzcuXMHkydPxksvvQQXFxcAwOzZszF+/Hg4OzujT58+yMrKwqFDhzB58uS6faNEVCcYbohIctu3b4ebm5vOtqZNm+L8+fMAiu9kWrt2LSZOnAg3Nzf88ssv8Pf3BwBYW1tjx44deP3119G+fXtYW1tjyJAh+OKLL7THGjVqFPLy8vDll1/ijTfegJOTE5599tm6e4NEVKcEURRFqYsgIiqPIAjYsGEDQkJCpC6FiAwEe26IiIjIqDDcEBERkVFhzw0R6TVeOSeiquKZGyIiIjIqDDdERERkVBhuiIiIyKgw3BAREZFRYbghIiIio8JwQ0REREaF4YaIiIiMCsMNERERGRWGGyIiIjIq/weYHRB/lAyn+QAAAABJRU5ErkJggg==\n"},"metadata":{}}],"source":["import matplotlib.pyplot as plt\n","\n","# Make a dataframe of the log history\n","metrics_df = pd.DataFrame(trainer.state.log_history)\n","\n","# Plot\n","metrics_df['eval_loss'].plot(marker='o')\n","plt.xlabel('Epoch')\n","plt.ylabel('Loss')\n","plt.title('Training Loss for GPT-2 Yoda model')\n","plt.show()"]},{"cell_type":"markdown","metadata":{"id":"_0pOHaHH9dEU"},"source":["Finally, we should save our model from memory locally to disk. Later, we'll push the model to the Hugging Face hub:"]},{"cell_type":"code","execution_count":48,"metadata":{"id":"GfmFGU8_9cfR","executionInfo":{"status":"ok","timestamp":1720650212057,"user_tz":240,"elapsed":4290,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[],"source":["trainer.save_model()"]},{"cell_type":"markdown","metadata":{"id":"n0MQWcw39oTm"},"source":["Now, if we take a look at the folder, we can see the (updated) model weights have been saved locally, as well as the different types of model configuration files that Hugging Face expects. Here, since we set the output directory to be `yoda-distilgpt2` in the `TrainingArguments`, we can find the saved model and files there:"]},{"cell_type":"code","execution_count":49,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"Th1z_gEm91bo","outputId":"32551280-f35f-4805-97c6-070a58762a5f","executionInfo":{"status":"ok","timestamp":1720650212057,"user_tz":240,"elapsed":5,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"stream","name":"stdout","text":["config.json  generation_config.json  model.safetensors\truns  training_args.bin\n"]}],"source":["!ls yoda-gpt2"]},{"cell_type":"code","execution_count":50,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"l805ZAMk-SPD","outputId":"7822ecdb-f5f8-4e5e-cc0a-8ae5b864a1c7","executionInfo":{"status":"ok","timestamp":1720650212480,"user_tz":240,"elapsed":426,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"stream","name":"stdout","text":["475M\tyoda-gpt2/model.safetensors\n"]}],"source":["# How big is our model?\n","!du -h yoda-gpt2/model.safetensors"]},{"cell_type":"markdown","metadata":{"id":"X5L6Df-t_ax2"},"source":["### Testing our fine-tuned model\n","Now that we've done some (very) quick fine-tuning, let's check out the results by reloading the model and generating some text as we covered earlier. Hopefully our tuned GPT-2 model will have taken on some of the qualities of how a Jedi master speaks!\n","\n","First, we can do this in the most straightforward way using a pipeline:"]},{"cell_type":"code","execution_count":51,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":154},"id":"sOS8mluw_r6Q","outputId":"b1d40311-62a0-4fb8-edc7-eef8900c86c2","executionInfo":{"status":"ok","timestamp":1720650215745,"user_tz":240,"elapsed":3268,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"stream","name":"stderr","text":["Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.\n"]},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"Luke, you must know that you are not. The Jedi must confront you. The Sith must destroy you. The dark side must be eliminated. Your"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"Luke, you must know that you are not alone. Fear is the path to the dark side. Fear leads to anger. Anger leads to suffering."},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"Luke, you must know that the Jedi are my closest friends.... Your training has paid off.... Your faith in me has"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"Luke, you must know that Master Kenobi is not. Already, a long time has passed. The dark side of the Force has been discovered. The"},"metadata":{}}],"source":["from transformers import pipeline\n","from IPython.display import Markdown\n","\n","# Input\n","input_text = \"Luke, you must know that\"\n","\n","# Load the model into a pipeline, here the argument to is the local path, not the model name on Hugging Face hub\n","yodagpt2 = pipeline('text-generation',model='./yoda-gpt2', tokenizer=tokenizer, max_length=30)\n","\n","# Generate the outputs\n","outputs = yodagpt2(input_text, num_return_sequences=4, do_sample=True, temperature=0.8, top_k=5, top_p=0.9)\n","\n","# Print\n","for output in outputs:\n","  display(Markdown(output['generated_text']))"]},{"cell_type":"markdown","metadata":{"id":"dlIYCJzwAA1B"},"source":["That's looking pretty good! But we see a lot of repetition here. Let's try using the tokenizer and model directly, to apply some of the different decoding strategies we learned earlier for more varied outputs:"]},{"cell_type":"code","execution_count":52,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":310},"id":"bMuIkYatAMyE","outputId":"1cc4fc88-54ea-4da2-b670-2e013938c77c","executionInfo":{"status":"ok","timestamp":1720650216058,"user_tz":240,"elapsed":318,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"stream","name":"stderr","text":["Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n","Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"]},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"Sampling"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"---"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"Luke, you must know that you must, for the Sith Lord is on my side. The boy I have trained. Already a master, I have. But not yet ready to"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"Luke, you must know that the Force is with you. The boy you call Skywalker, he is. May the Force be with you. May the Force rest in your hands."},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"Luke, you must know that the Force is with you. May the Force be with you. May all who fear it have. May the Force rest in your power.  Chancellor"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"Luke, you must know that you are not alone. The rest of the Jedi, they will not let you escape. The Sith Lord, he will. They will not let you"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"Beam Search"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"---"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"Luke, you must know that the Force is with you. May the Force be with you. May freedom and security be with you.  Chancellor Palpatine    "},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"Luke, you must know that the Force is with you. May the Force be with you. May freedom and security be with you.  Chancellor Palpatine is gone. "},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"Luke, you must know that the Force is with you. May the Force be with you. May all that remains of you. May all who fear the dark side of the Force"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"Luke, you must know that the Force is with you. May the Force be with you. May freedom and security be with you.  Chancellor Palpatine    May"},"metadata":{}}],"source":["# You can either reload the model or use it directly from the trainer object\n","model = trainer.model\n","\n","# Input\n","input_text = \"Luke, you must know that\"\n","\n","# Generate the model inputs (token ids)\n","model_inputs = tokenizer(input_text, return_tensors=\"pt\").to(device)\n","\n","# Create the outputs with sampling\n","sample_outputs = model.generate(\n","    **model_inputs,\n","    max_new_tokens=30,\n","    do_sample=True,\n","    temperature=0.8,\n","    no_repeat_ngram_size=5,\n","    top_p=0.9,\n","    top_k=5,\n","    num_return_sequences=4,\n",")\n","\n","# Beam sampling\n","beam_outputs = model.generate(\n","    **model_inputs,\n","    max_new_tokens=30,\n","    num_beams=5,\n","    no_repeat_ngram_size=5,\n","    early_stopping=True,\n","    num_return_sequences=4,\n",")\n","\n","# Iterate over outputs, decode with tokenizer and print\n","display(Markdown(\"Sampling\"))\n","display(Markdown(\"---\"))\n","for output in sample_outputs:\n","  display(Markdown(tokenizer.decode(output, skip_special_tokens=True)))\n","\n","display(Markdown(\"Beam Search\"))\n","display(Markdown(\"---\"))\n","for output in beam_outputs:\n","  display(Markdown(tokenizer.decode(output, skip_special_tokens=True)))"]},{"cell_type":"markdown","metadata":{"id":"5Xmn3ukrd667"},"source":["As we'll be moving into the next section, please restart the Colab runtime here to clear RAM and run the cell below:"]},{"cell_type":"code","execution_count":53,"metadata":{"id":"LHoqQhXOd-s3","executionInfo":{"status":"ok","timestamp":1720650216058,"user_tz":240,"elapsed":4,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[],"source":["import torch\n","from IPython.display import Markdown\n","from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline\n","\n","# Check if GPU is available\n","device = \"cuda\" if torch.cuda.is_available() else \"cpu\""]},{"cell_type":"markdown","metadata":{"id":"OD4_6DfBrgFH"},"source":["### Model Quantization and Parameter Efficient Fine-tuning (PEFT)\n","\n","In this section we will explore two concepts for making working with LLMs more tractable without extensive compute or memory requirements. *Model quantization* and *parameter efficient fine-tuning*.\n","\n","While the former can be used just to speed up model predictions, or *inference*, it is often used in conjunction with the latter for reducing the computational requirements for training very large models."]},{"cell_type":"markdown","metadata":{"id":"02jGnBQZWDp-"},"source":["#### Model Quantization\n","\n"," Quantization is a model size reduction technique that loads the model weights in a lower precision than they were originally trained in, resulting in a reduction in the storage and compute costs for the model.\n","\n"," This can mean taking a model which originally had its weights stored in a high-precision format, such as 32 or 16-bit floating point, and converting them into a lower precision such as 8-bit (or even 4-bit!) integer values.\n","\n"," Surprisingly, even with this loss of precision, many sophisticated LLMs are still found to perform quite well after quantization.\n","\n"," Model quantization approaches do not just simply round or truncate the weight values, they are converted using a formula like the below in the [affine quantization scheme](https://pytorch.org/blog/quantization-in-practice/#affine-and-symmetric-quantization-schemes):\n","\n"," $x_q = round(x/S + Z)$\n","\n"," where:\n"," - $x_q$ is the quantized value\n"," - $x$ is the original value\n"," - $S$ is the *scale factor*\n"," - $Z$ is the zero point\n","\n","Optimal values can all be determined as part of the quantization process in calibration.\n","\n","There are also well-known more sophisticated approaches for quantization such as [GPT-Q](https://arxiv.org/abs/2210.17323) and [Activation-aware Weight Quantization (AWQ)](https://github.com/mit-han-lab/llm-awq) which have been applied to create qunatized versions of popular models which are highly optimized inference. In the Hugging Face space, Tom Jobbins (*a.k.a* [The Bloke](https://huggingface.co/TheBloke)) is a well-known individual for producing many quantized versions and variations of popular models.\n","\n","Let's try loading a large version of GPT-2 using qunatization, and then do some inference. Here we will use [GPT2-XL](https://huggingface.co/gpt2-xl), the 1.5B parameter version of GPT-2, which weighs in at ~6.5GB! All we need to do is pass the `load_in_4bit=True` parameter when loading the model to apply quantization:"]},{"cell_type":"code","execution_count":54,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":296,"referenced_widgets":["0dcb7fffb9b0495f9f4adf85c02245d3","b1db4986f0b549c197e1129c89667ae1","f46b619bfc9b4552ae9b6cbedccac8bf","03164eb4f5034d0c8d9cc2479553ebb0","53e893a6c87d44bca93221d8b9c583e1","7fa95c2ec7864241b6103519d488a285","ab09c12c83ed46b099ddaab9cb5b49fa","f5979750d11f4d6bb3f5ffcc778830d2","b6096e03ae914c1d993296d2061329be","cd5ea94612f14b2589d22f1854f6fac0","eae55e6e68bd49a6962391704655ac8d","14aea29593a04145a3981b522369786c","48c65536a4094a8bb74823fd6d227976","2b0d10dc0e13427a98323550a0be199d","427b2c89bb5d46b6a4fca03896b86f18","28aa64433a664e69a725015499aa7a1a","0dae19e80bc0495baec5322c42586992","2d290cc096a84cd19423e4d29bf85ed3","8cf1cc5b2b2a4f43aa253f3889bfbda7","7aaf5bfff3814974aec8bb3022b6e5fe","d2d75e495f2c48e6a65e57f5bbfe1090","2680d1a8aad740db83306b120bc3a20a","eb57ae844d29483cb96635a69f6d1282","574502bd607e4b229a1664487e47fb0e","e51a47c79ef2403a9ee35ad35d604784","99be4e9442a84d9eaae6d06a0a2c2da6","c8fd1e11880448babb8c970e5d11fcc6","4b6470723a054d689d1a931bd11ccd1c","963807e2ff784eb19ccb2cf4945f488d","e37833bf6a9c498da4388e78447b8a8f","3043a9b492624f6084f604cbcdb00027","53e9e289fa454064a430413b92cc40b0","c290a082557e43958c595f21f944be5d","ae49256870804b598044a5fe62db702c","76db057d3bdb4a638c981a6d786a8b2d","669d9d52452a40279529804a6854cc95","6d75175fe62441029dd292b50029f374","bfbe5c61dffe453f88844cf27886f2e8","138881dc73974535966d8930481c827a","e40e90db06d44b7983e2b7683769b290","2c38b459f4224e558e655f6869d0c8ca","cb723e3be81846cca3087f1b038153cf","9b286cad4c304e2b97446d131771c1fe","53279967d41d4d529cafea3c7b5d898c","66afc2e6d8ef49f1843bc8b4ebac4ebb","574ac2f6f62343b2ba522d4d08beec6e","34701102e23e4faebd4be3c26e820c10","f728e256cbb146f7824824f86a491ebe","82a77e91e439419cb347909a6d6cb9ec","1c4aad3119754d65b8565784df6f597b","9befbd0cd2054f3eb5cad58f9ec829f3","736659c3c0b641bd8266e195d4529ee1","23a4c560ac5b4cd093994f5be3f3b685","32319c2ba2764a6b9da5f34f067665d6","7bb0e75490054a308e9de11b2c6c5552","e4ebf545cb4e42fa9cb4e233aaec1bab","55bc43763cc54d8bb01857c6e0a6f5a6","a9f222ff32b04f74a5f4c01906cac590","bc214017673b4a909d4013a897a634f7","d5c1dd35b3494cae988dea5451c73d2c","56d7ad25e5c44163aa898f98af9cae2c","d88d7403cb984fcdb837a266222ebde3","b0c01a122cf6410491b85c53e4b14272","87adabdb77554daeabc723de9eafd3ef","4bdc3594652d41aa94276931ba2589db","45b1123a2fdc4b6dbe24470b93700985","e9ad5d43fdb5486c910aa0b8c641805d","ef2cf9b9aa3140b785e30e6498627e8b","d5967ae13e1745b7880a759cf9234708","5153b13ab8d54d24b25976a89af4b163","e166360141c344ce8cded7017f321e5b","e6e7564fc1aa4ae185a71bfb5fe4f1b2","a868a36e05984b05ac2c141f07bb76fb","8325eb82b82245a8bf19198fcdd21793","d62c9c86ee9e411ca081dc18f9feecb0","10e2642df3844b2399d4c0b186eca9e4","47667165042347999111fb15267e71ab"]},"id":"aXB4P3JYqJyC","outputId":"604c209b-12c4-4bf4-abf1-4180b7ce1662","executionInfo":{"status":"ok","timestamp":1720650295127,"user_tz":240,"elapsed":79073,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"display_data","data":{"text/plain":["config.json:   0%|          | 0.00/689 [00:00<?, ?B/s]"],"application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"0dcb7fffb9b0495f9f4adf85c02245d3"}},"metadata":{}},{"output_type":"stream","name":"stderr","text":["The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.\n","`low_cpu_mem_usage` was None, now set to True since model is quantized.\n"]},{"output_type":"display_data","data":{"text/plain":["model.safetensors:   0%|          | 0.00/6.43G [00:00<?, ?B/s]"],"application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"14aea29593a04145a3981b522369786c"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]"],"application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"eb57ae844d29483cb96635a69f6d1282"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]"],"application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"ae49256870804b598044a5fe62db702c"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]"],"application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"66afc2e6d8ef49f1843bc8b4ebac4ebb"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]"],"application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"e4ebf545cb4e42fa9cb4e233aaec1bab"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]"],"application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"e9ad5d43fdb5486c910aa0b8c641805d"}},"metadata":{}}],"source":["model_id = \"gpt2-xl\"\n","\n","model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True)\n","tokenizer = AutoTokenizer.from_pretrained(model_id)\n","tokenizer.pad_token = tokenizer.eos_token"]},{"cell_type":"code","execution_count":55,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":36},"id":"31Bl4RWLarI2","outputId":"8e9abe49-e364-418c-8c32-9003dacd55df","executionInfo":{"status":"ok","timestamp":1720650295639,"user_tz":240,"elapsed":524,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"execute_result","data":{"text/plain":["'1,557,611,200'"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"string"}},"metadata":{},"execution_count":55}],"source":["f\"{model.num_parameters():,}\""]},{"cell_type":"markdown","metadata":{"id":"nTIdWNuXau6O"},"source":["We can see that each linear layer in the model has been replaced with a 4-bit layer:"]},{"cell_type":"code","execution_count":56,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"JYANkpMpauGB","outputId":"face568a-998f-4acd-9810-e6ebd66ef3ed","executionInfo":{"status":"ok","timestamp":1720650295640,"user_tz":240,"elapsed":8,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"execute_result","data":{"text/plain":["GPT2LMHeadModel(\n","  (transformer): GPT2Model(\n","    (wte): Embedding(50257, 1600)\n","    (wpe): Embedding(1024, 1600)\n","    (drop): Dropout(p=0.1, inplace=False)\n","    (h): ModuleList(\n","      (0-47): 48 x GPT2Block(\n","        (ln_1): LayerNorm((1600,), eps=1e-05, elementwise_affine=True)\n","        (attn): GPT2Attention(\n","          (c_attn): Linear4bit(in_features=1600, out_features=4800, bias=True)\n","          (c_proj): Linear4bit(in_features=1600, out_features=1600, bias=True)\n","          (attn_dropout): Dropout(p=0.1, inplace=False)\n","          (resid_dropout): Dropout(p=0.1, inplace=False)\n","        )\n","        (ln_2): LayerNorm((1600,), eps=1e-05, elementwise_affine=True)\n","        (mlp): GPT2MLP(\n","          (c_fc): Linear4bit(in_features=1600, out_features=6400, bias=True)\n","          (c_proj): Linear4bit(in_features=6400, out_features=1600, bias=True)\n","          (act): NewGELUActivation()\n","          (dropout): Dropout(p=0.1, inplace=False)\n","        )\n","      )\n","    )\n","    (ln_f): LayerNorm((1600,), eps=1e-05, elementwise_affine=True)\n","  )\n","  (lm_head): Linear(in_features=1600, out_features=50257, bias=False)\n",")"]},"metadata":{},"execution_count":56}],"source":["model"]},{"cell_type":"markdown","metadata":{"id":"mHX0cWrqatnO"},"source":["Now, let's generate some text. We know how to do this now, so leverage some of our earlier code from before, using temperature and top-p sampling:"]},{"cell_type":"code","execution_count":57,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":241},"id":"jAoWTFoLbCtT","outputId":"0f7782e9-854e-4381-e332-4063b7ea9a66","executionInfo":{"status":"ok","timestamp":1720650299485,"user_tz":240,"elapsed":3851,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"stream","name":"stderr","text":["Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n","/usr/local/lib/python3.10/dist-packages/bitsandbytes/nn/modules.py:426: UserWarning: Input type into Linear4bit is torch.float16, but bnb_4bit_compute_dtype=torch.float32 (default). This will lead to slow inference or training speed.\n","  warnings.warn(\n"]},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"The rain in Spain, which is now in its third day, has brought down some of the worst flooding the country has seen for a century.\n\nIn the capital"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"---"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"The rain in Spain is not a problem, as long as the wind is not strong. The wind is strong enough to make it a problem for some people, and it"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"---"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"The rain in Spain is a lot more than the rain in the US,\" he says. \"It's a very different experience.\"\n\nThe Spanish weather is so unpredictable"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"---"},"metadata":{}}],"source":["# Generate the model inputs (token ids)\n","input_text = \"The rain in Spain\"\n","\n","model_inputs = tokenizer(input_text, return_tensors=\"pt\").to(device)\n","\n","# Create the outputs with sampling\n","sample_outputs = model.generate(\n","    **model_inputs,\n","    max_new_tokens=30,\n","    do_sample=True,\n","    temperature=0.8,\n","    no_repeat_ngram_size=5,\n","    top_p=0.9,\n","    top_k=5,\n","    num_return_sequences=3,\n",")\n","\n","# Iterate over outputs, decode with tokenizer and print\n","for output in sample_outputs:\n","  display(Markdown(tokenizer.decode(output, skip_special_tokens=True)))\n","  display(Markdown(\"---\"))"]},{"cell_type":"markdown","metadata":{"id":"f69oL29GdXsf"},"source":["Great! We can now efficiently perform inference using very large LLMS via quantization 👍 You can read more about loading quantized models in the Hugging Face documentation here: [Quantize 🤗 Transformers models](https://huggingface.co/docs/transformers/main_classes/quantization)"]},{"cell_type":"markdown","metadata":{"id":"p9I8-fdJeE5T"},"source":["#### Fine-tuning a quantized LLM using PEFT\n","\n","Now that we have sucessfully loaded a very large transformer model, let's move on to seeing how we can fine tune it using a parameter efficient fine-tuning, or [PEFT](https://github.com/huggingface/peft), approach.\n","\n","Luckily for us this is easy to do in Hugging Face using the `peft` library which we installed at the beginning of this notebook.\n","\n","Here we will fine-tune a quantized version of GPT2-XL using [Low Rank Adaption (LoRA)](https://github.com/microsoft/LoRA). This approach was first introduced by researchers from Microsoft in 2021.\n","\n","Instead of updating all the weights in a model, LoRA is a reparameterization method that introduces a smaller number new weights via two matrices which decompose the weights into a lower rank representation, and only these weights are updated during the training process.<br/><br/>\n","\n","<center>\n","<img src=\"https://drive.google.com/uc?export=download&id=1H4KTXgCWHRwSCk1uxelCDsQCfziJCogl\" width=\"50%\"/>\n","</center>\n","<center>\n","<caption> Diagram of LoRA, from the original paper </caption>\n","</center>"]},{"cell_type":"markdown","metadata":{"id":"rNwPHQul8il8"},"source":["Luckily, this is all easy enough to do as it is implemented in Hugging Face in the `peft` library.\n","\n","First, we need to prepare the model for quantized training:"]},{"cell_type":"code","execution_count":58,"metadata":{"id":"H8r6bgN5dQLt","executionInfo":{"status":"ok","timestamp":1720650299485,"user_tz":240,"elapsed":5,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[],"source":["from peft import prepare_model_for_kbit_training\n","\n","# Reload the quantized model and tokenizer\n","#model = AutoModelForCausalLM.from_pretrained(\"gpt2-xl\", load_in_4bit=True)\n","#tokenizer = AutoTokenizer.from_pretrained(\"gpt2-xl\")\n","tokenizer.pad_token = tokenizer.eos_token\n","\n","# Enable ch\n","model.gradient_checkpointing_enable()\n","model = prepare_model_for_kbit_training(model)"]},{"cell_type":"markdown","metadata":{"id":"tDikW2E98-Hy"},"source":["Then, we import a [LoRA configuration](https://huggingface.co/docs/peft/conceptual_guides/lora#common-lora-parameters-in-peft), and then apply it to the model to use LoRA in fine-tuning:"]},{"cell_type":"code","execution_count":59,"metadata":{"id":"UOT21FgH8sxW","executionInfo":{"status":"ok","timestamp":1720650299485,"user_tz":240,"elapsed":5,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[],"source":["from peft import LoraConfig, get_peft_model\n","\n","config = LoraConfig(\n","    r=8,\n","    lora_alpha=32,\n","    lora_dropout=0.05,\n","    bias=\"none\",\n","    task_type=\"CAUSAL_LM\"\n",")\n","\n","model = get_peft_model(model, config).to(device)"]},{"cell_type":"markdown","metadata":{"id":"XMbFHiDZ8tpP"},"source":["Using this utility function, we can take a look at the number of parameters in the model which will be updated as part of the fine-tuning:"]},{"cell_type":"code","execution_count":60,"metadata":{"id":"3byb90Bh87gd","executionInfo":{"status":"ok","timestamp":1720650299486,"user_tz":240,"elapsed":5,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[],"source":["def print_trainable_parameters(model):\n","    \"\"\"\n","    Prints the number of trainable parameters in the model.\n","    \"\"\"\n","    trainable_params = 0\n","    all_param = 0\n","    for _, param in model.named_parameters():\n","        all_param += param.numel()\n","        if param.requires_grad:\n","            trainable_params += param.numel()\n","    print(\n","        f\"trainable params: {trainable_params:,} || all params: {all_param:,} || trainable%: {100 * trainable_params / all_param}\"\n","    )"]},{"cell_type":"code","execution_count":61,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"BWtaLjvE83gb","outputId":"d30e6ec4-774d-4d62-895b-d7de651d0232","executionInfo":{"status":"ok","timestamp":1720650300281,"user_tz":240,"elapsed":800,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"stream","name":"stdout","text":["trainable params: 2,457,600 || all params: 822,788,800 || trainable%: 0.2986914746530337\n"]}],"source":["print_trainable_parameters(model)"]},{"cell_type":"markdown","metadata":{"id":"1aAIAvO39UKK"},"source":["We can see that total number of parameters counted here has been greatly reduced, this is due to the model weights being loaded in 4-bit as part of quantization. The total number of trainable parameters with LoRA here is still ~2.5M.\n","\n","Now that the model is set up with LoRA, the remaining steps are the same as for regular fine-tuning! We import the data, set up a training configuration and `Trainer` object, and the start the fine-tuning:"]},{"cell_type":"code","execution_count":62,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":84,"referenced_widgets":["3ff99c89d9b14c2e8703fc48888eb2f8","e831aee7af244e3cbe313b33af6364ef","f0e66ce08c264adbac3ae1b25f5428bd","262a9d4e254244dca9f839771248578e","11aef73879b44cf9a43a382fff026aca","d98e31fc768f4b25b504ce3a751c4360","4f71c763c67745218c37f883be00c54d","742d595279cd42d2b577c25725bd201f","4c2010ee437647c2b6632587f906ec10","c6d3eb372ecc432fbb67ca534ba5862f","22ca121828fd4ed7aa29a62116a787cf"]},"id":"-4EUpNp69ZZZ","outputId":"55b28001-cd7a-49ca-f5f4-17046fe0502b","executionInfo":{"status":"ok","timestamp":1720650300281,"user_tz":240,"elapsed":6,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"stream","name":"stderr","text":["Repo card metadata block was not found. Setting CardData to empty.\n","WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.\n"]},{"output_type":"display_data","data":{"text/plain":["Map:   0%|          | 0/103 [00:00<?, ? examples/s]"],"application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"3ff99c89d9b14c2e8703fc48888eb2f8"}},"metadata":{}}],"source":["from datasets import load_dataset\n","# 1. Load dataset and tokenize\n","data_files = {\"train\": \"yoda.csv\"}\n","dataset_name = 'datasets/yoda/'\n","dataset = load_dataset(dataset_name, data_files=data_files)\n","tokenizer.pad_token = tokenizer.eos_token\n","\n","def tokenize_function(data):\n","    my_tokenizer = tokenizer(data[\"text\"], padding=True, truncation=True, return_tensors=\"pt\", max_length=128)\n","    return my_tokenizer\n","\n","tokenized_dataset = dataset.map(tokenize_function, batched=True)"]},{"cell_type":"code","execution_count":63,"metadata":{"id":"Jfo_pd5s9h6Y","executionInfo":{"status":"ok","timestamp":1720650300281,"user_tz":240,"elapsed":4,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[],"source":["from transformers import Trainer, TrainingArguments, DataCollatorForLanguageModeling\n","\n","# Set up the training arguments\n","training_args = TrainingArguments(\n","        output_dir=\"yoda-gpt2xl-lora\",\n","        num_train_epochs=10,\n","        fp16=True,\n","        optim=\"paged_adamw_8bit\"\n","    )\n","\n","# Data collator\n","data_collator = DataCollatorForLanguageModeling(tokenizer, mlm=False)\n","\n","# Trainer\n","trainer = Trainer(\n","    model=model,\n","    train_dataset=tokenized_dataset[\"train\"],\n","    args=training_args,\n","    data_collator=data_collator,\n",")\n","\n","model.config.use_cache = False  # silence the warnings. Please re-enable for inference!"]},{"cell_type":"code","execution_count":64,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":165},"id":"wjHp8L5MCg2C","outputId":"db4134ca-56d4-4daf-cd63-abf56b756704","executionInfo":{"status":"ok","timestamp":1720650385523,"user_tz":240,"elapsed":85246,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"stream","name":"stderr","text":["/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:464: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.\n","  warnings.warn(\n"]},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.HTML object>"],"text/html":["\n","    <div>\n","      \n","      <progress value='130' max='130' style='width:300px; height:20px; vertical-align: middle;'></progress>\n","      [130/130 01:24, Epoch 10/10]\n","    </div>\n","    <table border=\"1\" class=\"dataframe\">\n","  <thead>\n"," <tr style=\"text-align: left;\">\n","      <th>Step</th>\n","      <th>Training Loss</th>\n","    </tr>\n","  </thead>\n","  <tbody>\n","  </tbody>\n","</table><p>"]},"metadata":{}},{"output_type":"execute_result","data":{"text/plain":["TrainOutput(global_step=130, training_loss=3.930248319185697, metrics={'train_runtime': 85.3085, 'train_samples_per_second': 12.074, 'train_steps_per_second': 1.524, 'total_flos': 429305456832000.0, 'train_loss': 3.930248319185697, 'epoch': 10.0})"]},"metadata":{},"execution_count":64}],"source":["trainer.train()"]},{"cell_type":"markdown","metadata":{"id":"OHPAf4O-fmID"},"source":["Now let's test it out:"]},{"cell_type":"code","execution_count":65,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":298},"id":"9d-ma-RDLJrV","outputId":"5441ecde-fb74-46bb-a773-5cba66831003","executionInfo":{"status":"ok","timestamp":1720650391562,"user_tz":240,"elapsed":6056,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"stream","name":"stderr","text":["Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n","`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n","/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:91: UserWarning: None of the inputs have requires_grad=True. Gradients will be None\n","  warnings.warn(\n"]},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"---"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"Luke, you must know that!\"\n\n\"Yes, I do,\" he said, \"but it's not the same as knowing the truth. I've been told of the"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"---"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"Luke, you must know that  I am Luke Skywalker.   I am the Jedi, and I will be.  \nLuke Skywalker,  Luke,   the  you"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"---"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"Luke, you must know that?\" He was a man of the people. The words he spoke had a strange ring to them, as if they had been spoken before. \"It"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"---"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"Luke, you must know that!\"\n\n\"I am not a Jedi. You must be.\" ―Luke Skywalker and Palpatine, on the same battlefield [src]\n"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.Markdown object>"],"text/markdown":"---"},"metadata":{}}],"source":["input_text = \"Luke, you must know that \"\n","\n","model_inputs = tokenizer(input_text, return_tensors=\"pt\").to(device)\n","\n","# Create the outputs with sampling\n","outputs = trainer.model.generate(\n","    **model_inputs,\n","    max_new_tokens=30,\n","    do_sample=True,\n","    no_repeat_ngram_size=2,\n","    temperature=0.8,\n","    top_p=0.95,\n","    top_k=5,\n","    num_return_sequences=4,\n",")\n","\n","# Iterate over outputs, decode with tokenizer and print\n","display(Markdown(\"---\"))\n","for output in outputs:\n","  display(Markdown(tokenizer.decode(output, skip_special_tokens=True)))\n","  display(Markdown(\"---\"))"]},{"cell_type":"markdown","metadata":{"id":"AZA2K1qAOF3y"},"source":["Great! With just a few epochs of training, we seem to have significantly altered the behavior of the model as before, only with tuning a much smaller subset of parameters, making it feasible with the computing resources we have available.\n","\n","To eliminate latency in inference and use the model as a standalone model without LoRA, we can [merge the adapter weights back into the base model](https://huggingface.co/docs/peft/conceptual_guides/lora#merge-lora-weights-into-the-base-model). This should be done before pushing the model to the Hub if you wish to use the model as a standalone model.\n","\n","<center>\n","<img src=\"https://drive.google.com/uc?export=download&id=1H_FgH-_axdQohpDe5cOhQtkh1DUrk3bU\">\n","</center>\n","<center>\n","<caption> Merging the adapter weights </caption>\n","</center>"]},{"cell_type":"code","execution_count":66,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":186},"id":"q2D2h5Kjg8gz","outputId":"6a74b473-b69b-4a5f-957e-66927e8b7648","executionInfo":{"status":"ok","timestamp":1720650391562,"user_tz":240,"elapsed":7,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"execute_result","data":{"text/plain":["peft.peft_model.PeftModelForCausalLM"],"text/html":["<div style=\"max-width:800px; border: 1px solid var(--colab-border-color);\"><style>\n","      pre.function-repr-contents {\n","        overflow-x: auto;\n","        padding: 8px 12px;\n","        max-height: 500px;\n","      }\n","\n","      pre.function-repr-contents.function-repr-contents-collapsed {\n","        cursor: pointer;\n","        max-height: 100px;\n","      }\n","    </style>\n","    <pre style=\"white-space: initial; background:\n","         var(--colab-secondary-surface-color); padding: 8px 12px;\n","         border-bottom: 1px solid var(--colab-border-color);\"><b>peft.peft_model.PeftModelForCausalLM</b><br/>def _wrapped_call_impl(*args, **kwargs)</pre><pre class=\"function-repr-contents function-repr-contents-collapsed\" style=\"\"><a class=\"filepath\" style=\"display:none\" href=\"#\">/usr/local/lib/python3.10/dist-packages/peft/peft_model.py</a>Peft model for causal language modeling.\n","\n","Args:\n","    model ([`~transformers.PreTrainedModel`]): Base transformer model.\n","    peft_config ([`PeftConfig`]): Peft config.\n","\n","\n","Example:\n","\n","    ```py\n","    &gt;&gt;&gt; from transformers import AutoModelForCausalLM\n","    &gt;&gt;&gt; from peft import PeftModelForCausalLM, get_peft_config\n","\n","    &gt;&gt;&gt; config = {\n","    ...     &quot;peft_type&quot;: &quot;PREFIX_TUNING&quot;,\n","    ...     &quot;task_type&quot;: &quot;CAUSAL_LM&quot;,\n","    ...     &quot;inference_mode&quot;: False,\n","    ...     &quot;num_virtual_tokens&quot;: 20,\n","    ...     &quot;token_dim&quot;: 1280,\n","    ...     &quot;num_transformer_submodules&quot;: 1,\n","    ...     &quot;num_attention_heads&quot;: 20,\n","    ...     &quot;num_layers&quot;: 36,\n","    ...     &quot;encoder_hidden_size&quot;: 1280,\n","    ...     &quot;prefix_projection&quot;: False,\n","    ...     &quot;postprocess_past_key_value_function&quot;: None,\n","    ... }\n","\n","    &gt;&gt;&gt; peft_config = get_peft_config(config)\n","    &gt;&gt;&gt; model = AutoModelForCausalLM.from_pretrained(&quot;gpt2-large&quot;)\n","    &gt;&gt;&gt; peft_model = PeftModelForCausalLM(model, peft_config)\n","    &gt;&gt;&gt; peft_model.print_trainable_parameters()\n","    trainable params: 1843200 || all params: 775873280 || trainable%: 0.23756456724479544\n","    ```</pre>\n","      <script>\n","      if (google.colab.kernel.accessAllowed && google.colab.files && google.colab.files.view) {\n","        for (const element of document.querySelectorAll('.filepath')) {\n","          element.style.display = 'block'\n","          element.onclick = (event) => {\n","            event.preventDefault();\n","            event.stopPropagation();\n","            google.colab.files.view(element.textContent, 1357);\n","          };\n","        }\n","      }\n","      for (const element of document.querySelectorAll('.function-repr-contents')) {\n","        element.onclick = (event) => {\n","          event.preventDefault();\n","          event.stopPropagation();\n","          element.classList.toggle('function-repr-contents-collapsed');\n","        };\n","      }\n","      </script>\n","      </div>"]},"metadata":{},"execution_count":66}],"source":["# Model is a PEFT model\n","model.__class__"]},{"cell_type":"code","execution_count":67,"metadata":{"id":"umRcSoegg3i2","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1720650392819,"user_tz":240,"elapsed":1263,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}},"outputId":"e372af41-201d-47c8-de36-d13fa1432627"},"outputs":[{"output_type":"stream","name":"stderr","text":["/usr/local/lib/python3.10/dist-packages/peft/tuners/lora/bnb.py:325: UserWarning: Merge lora module to 4-bit linear may get different generations due to rounding errors.\n","  warnings.warn(\n"]}],"source":["# Merge the LoRA weights back into the model\n","merged_model = model.merge_and_unload()"]},{"cell_type":"code","execution_count":68,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":186},"id":"HtDk9T5Dg_95","outputId":"44e3c4da-e547-46e5-91fb-2cb3340cbf2e","executionInfo":{"status":"ok","timestamp":1720650392819,"user_tz":240,"elapsed":6,"user":{"displayName":"Myles Harrison","userId":"13636460506782883737"}}},"outputs":[{"output_type":"execute_result","data":{"text/plain":["transformers.models.gpt2.modeling_gpt2.GPT2LMHeadModel"],"text/html":["<div style=\"max-width:800px; border: 1px solid var(--colab-border-color);\"><style>\n","      pre.function-repr-contents {\n","        overflow-x: auto;\n","        padding: 8px 12px;\n","        max-height: 500px;\n","      }\n","\n","      pre.function-repr-contents.function-repr-contents-collapsed {\n","        cursor: pointer;\n","        max-height: 100px;\n","      }\n","    </style>\n","    <pre style=\"white-space: initial; background:\n","         var(--colab-secondary-surface-color); padding: 8px 12px;\n","         border-bottom: 1px solid var(--colab-border-color);\"><b>transformers.models.gpt2.modeling_gpt2.GPT2LMHeadModel</b><br/>def _wrapped_call_impl(*args, **kwargs)</pre><pre class=\"function-repr-contents function-repr-contents-collapsed\" style=\"\"><a class=\"filepath\" style=\"display:none\" href=\"#\">/usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py</a>The GPT2 Model transformer with a language modeling head on top (linear layer with weights tied to the input\n","embeddings).\n","\n","\n","This model inherits from [`PreTrainedModel`]. Check the superclass documentation for the generic methods the\n","library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads\n","etc.)\n","\n","This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.\n","Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage\n","and behavior.\n","\n","Parameters:\n","    config ([`GPT2Config`]): Model configuration class with all the parameters of the model.\n","        Initializing with a config file does not load the weights associated with the model, only the\n","        configuration. Check out the [`~PreTrainedModel.from_pretrained`] method to load the model weights.</pre>\n","      <script>\n","      if (google.colab.kernel.accessAllowed && google.colab.files && google.colab.files.view) {\n","        for (const element of document.querySelectorAll('.filepath')) {\n","          element.style.display = 'block'\n","          element.onclick = (event) => {\n","            event.preventDefault();\n","            event.stopPropagation();\n","            google.colab.files.view(element.textContent, 1165);\n","          };\n","        }\n","      }\n","      for (const element of document.querySelectorAll('.function-repr-contents')) {\n","        element.onclick = (event) => {\n","          event.preventDefault();\n","          event.stopPropagation();\n","          element.classList.toggle('function-repr-contents-collapsed');\n","        };\n","      }\n","      </script>\n","      </div>"]},"metadata":{},"execution_count":68}],"source":["# Merged model is of same type as base model\n","merged_model.__class__"]},{"cell_type":"markdown","metadata":{"id":"IbfVQ9Dyldz7"},"source":["### Further optimizing LoRA: QLoRA\n","\n","Building on the work of the research of the team at Microsoft, researchers from University of Washington developed [QLoRA: Efficient Finetuning of Quantized LLMs](https://arxiv.org/abs/2305.14314) in May of 2023.\n","\n","[QLoRA](https://github.com/artidoro/qlora) makes parameter efficient fine-tuning even more so by using 4-bit quantization for the model to be tuned, introducing a new data type called 4-bit NormalFloat (NF4), and using \"double quantization\", where the quantization parameters are also quantized.\n","\n","This can be accomplished in code by using a `bitsandbytes` config and with the following parameters:\n","\n","```python\n","bnb_config = BitsAndBytesConfig(\n","    load_in_4bit=True,\n","    bnb_4bit_use_double_quant=True,\n","    bnb_4bit_quant_type=\"nf4\",\n","    bnb_4bit_compute_dtype=torch.bfloat16\n",")\n","\n","...\n","model = AutoModelForCausalLM.from_pretrained(\n","  \"gpt2-xl\",\n","  quantization_config=bnb_config\n","  )\n","\n","```\n","\n","You can see an example of using QLoRA in Hugging Face [in this example notebook](https://colab.research.google.com/drive/1VoYNfYDKcKRQRor98Zbf2-9VQTtGJ24k?usp=sharing) and more details in the [official blog post](https://huggingface.co/blog/4bit-transformers-bitsandbytes) from Hugging Face.\n","\n","A notable output of the QLoRA research was that of the [Guanco model](https://huggingface.co/spaces/uwnlp/guanaco-playground-tgi) family which was fine-tuned on LLaMA 2."]},{"cell_type":"markdown","metadata":{"id":"ejmekUvHra2I"},"source":["## Conclusion 🏁\n","\n","We have covered a lot of ground in both working with generative large language models for inference, as well as methods for efficiently fine-tuning them with limited computational resources. However, we have really just scratched the surface of LLMs for text, and there is much more that makes up the incredibly sophisticated models which represent the state of the art.\n","\n","- **Training \"chat\" models**: By changing the [format of the input](https://huggingface.co/docs/transformers/main/en/chat_templating) and prediction task, and applying more sophisticated approaches such as [instruction tuning](https://arxiv.org/abs/2308.10792).\n","- **Reinforcement learning from human feedback (RLHF)**: Now a standard part of LLM development but is non-trivial for the individual. However, this is possible to do with a base model and the right datasets using the [trl library](https://huggingface.co/docs/trl/index).\n","- **Retrieval Augmented Generation**: Or [RAG](https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html), for combining the generative capabilities of LLMs with search and retrieval from a datastore, to reduce \"hallucinations\" and be able to have a language model work against a knowledge base.\n","\n","LLMs represent the state of the art in generative text models and have rapidly transformed this space (and many other domains) in a very short order. Knowing how how they function, how to use them, and being aware of their shortcomings allows them to be judiciously applied to the right use cases to unlock business value and build product of value to the end-user."]},{"cell_type":"markdown","metadata":{"id":"KsxOj0mxWXUw"},"source":["----\n","\n","<table border=\"0\" bgcolor=\"white\">\n","  <tr></tr>\n","  <tr>\n","      <th align=\"left\" style=\"align:left; vertical-align: bottom;\"><p>Copyright NLP from scratch, 2024.</p></th>\n","      <th aligh=\"right\" width=\"33%\"><a href=\"https://www.nlpfromscratch.com?utm_source=notebook&utm_medium=nb-footer-img\"><img src=\"https://drive.google.com/uc?export=view&id=1-lt6Uft8lgBG9jPD0dO6w3dAcv_EUQRP\"></th>\n","</tr>\n","</table>"]}],"metadata":{"accelerator":"GPU","colab":{"gpuType":"T4","provenance":[]},"jupytext":{"main_language":"python"},"kernelspec":{"display_name":"Python 3","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.11.1"},"widgets":{"application/vnd.jupyter.widget-state+json":{"b51a121daa394443970f334d6671688b":{"model_module":"@jupyter-widgets/controls","model_name":"HBoxModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HBoxModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HBoxView","box_style":"","children":["IPY_MODEL_57db97b938a04966b48d77c509ffb525","IPY_MODEL_173b43eb9bc74962a52821c5f5ce441c","IPY_MODEL_69278d8631e8487c9158082207cdacac"],"layout":"IPY_MODEL_bdf0a8ee2171422389baa5eecedba86b"}},"57db97b938a04966b48d77c509ffb525":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_daf3d480df264ff38d3d5cd1217adb2e","placeholder":"​","style":"IPY_MODEL_4e34a3dd228f4c22b5b9c1294c4f455c","value":"Generating train split: "}},"173b43eb9bc74962a52821c5f5ce441c":{"model_module":"@jupyter-widgets/controls","model_name":"FloatProgressModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"FloatProgressModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"ProgressView","bar_style":"success","description":"","description_tooltip":null,"layout":"IPY_MODEL_1e81f31e3df946bc85b2a187453e5c0b","max":1,"min":0,"orientation":"horizontal","style":"IPY_MODEL_4ade9b5e523249abb7ec0708582e06fa","value":1}},"69278d8631e8487c9158082207cdacac":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_3b73922497ac477c94be4231a3ddc0aa","placeholder":"​","style":"IPY_MODEL_6adfd93c30fc44d384282c9f9e12dbc0","value":" 103/0 [00:00&lt;00:00, 1111.10 examples/s]"}},"bdf0a8ee2171422389baa5eecedba86b":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"daf3d480df264ff38d3d5cd1217adb2e":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"4e34a3dd228f4c22b5b9c1294c4f455c":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"1e81f31e3df946bc85b2a187453e5c0b":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":"20px"}},"4ade9b5e523249abb7ec0708582e06fa":{"model_module":"@jupyter-widgets/controls","model_name":"ProgressStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"ProgressStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","bar_color":null,"description_width":""}},"3b73922497ac477c94be4231a3ddc0aa":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"6adfd93c30fc44d384282c9f9e12dbc0":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"b1657544ac924428bad89dc4c85cdcc8":{"model_module":"@jupyter-widgets/controls","model_name":"HBoxModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HBoxModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HBoxView","box_style":"","children":["IPY_MODEL_5649660d73ef41c3b6c6ea03b09be4ca","IPY_MODEL_734c4b4989744b1eb27248b89ae07090","IPY_MODEL_5c8bdca7cbbc48bcbb669ad907ea0d02"],"layout":"IPY_MODEL_6b536828ccc040cba0d818fab4568044"}},"5649660d73ef41c3b6c6ea03b09be4ca":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_1e95cf7358d1460a99a20682ecd4c97d","placeholder":"​","style":"IPY_MODEL_baa94316e4824e24b80e76ce42d2c209","value":"Map: 100%"}},"734c4b4989744b1eb27248b89ae07090":{"model_module":"@jupyter-widgets/controls","model_name":"FloatProgressModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"FloatProgressModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"ProgressView","bar_style":"success","description":"","description_tooltip":null,"layout":"IPY_MODEL_d78b5de0332e4d69844242e9bd8ddaec","max":103,"min":0,"orientation":"horizontal","style":"IPY_MODEL_32b6881377154016ab6ef51d70d38a1b","value":103}},"5c8bdca7cbbc48bcbb669ad907ea0d02":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_bcce6096c8e0482389e65e2ce7fa44ab","placeholder":"​","style":"IPY_MODEL_a30cdca663c54ed1bea1468142edff08","value":" 103/103 [00:00&lt;00:00, 599.80 examples/s]"}},"6b536828ccc040cba0d818fab4568044":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"1e95cf7358d1460a99a20682ecd4c97d":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"baa94316e4824e24b80e76ce42d2c209":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"d78b5de0332e4d69844242e9bd8ddaec":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"32b6881377154016ab6ef51d70d38a1b":{"model_module":"@jupyter-widgets/controls","model_name":"ProgressStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"ProgressStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","bar_color":null,"description_width":""}},"bcce6096c8e0482389e65e2ce7fa44ab":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"a30cdca663c54ed1bea1468142edff08":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"bed0ffc892df43e8a3576fb776de6774":{"model_module":"@jupyter-widgets/controls","model_name":"HBoxModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HBoxModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HBoxView","box_style":"","children":["IPY_MODEL_32e42f9bb697473190c3722d296a1d87","IPY_MODEL_da73cdc696bd45f2955de42e42cc9a93","IPY_MODEL_f01158f7ad994646ba950ee65d56c00a"],"layout":"IPY_MODEL_71902a253ac04225920ffd5ce8c18686"}},"32e42f9bb697473190c3722d296a1d87":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_479fffa5d1aa417aa672f9c27787a03b","placeholder":"​","style":"IPY_MODEL_52ca7bceddf94614a6e8390a1016add5","value":"Downloading builder script: 100%"}},"da73cdc696bd45f2955de42e42cc9a93":{"model_module":"@jupyter-widgets/controls","model_name":"FloatProgressModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"FloatProgressModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"ProgressView","bar_style":"success","description":"","description_tooltip":null,"layout":"IPY_MODEL_cd1d0ded57284545805e4fc9cb8c86f8","max":4203,"min":0,"orientation":"horizontal","style":"IPY_MODEL_319f422a71974ae884a1c3f5264353f9","value":4203}},"f01158f7ad994646ba950ee65d56c00a":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_a5afbd9e364548dc90d62fdba5f4d7d2","placeholder":"​","style":"IPY_MODEL_90b784cc1b3d4cc590e440da790b4a03","value":" 4.20k/4.20k [00:00&lt;00:00, 73.9kB/s]"}},"71902a253ac04225920ffd5ce8c18686":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"479fffa5d1aa417aa672f9c27787a03b":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"52ca7bceddf94614a6e8390a1016add5":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"cd1d0ded57284545805e4fc9cb8c86f8":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"319f422a71974ae884a1c3f5264353f9":{"model_module":"@jupyter-widgets/controls","model_name":"ProgressStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"ProgressStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","bar_color":null,"description_width":""}},"a5afbd9e364548dc90d62fdba5f4d7d2":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"90b784cc1b3d4cc590e440da790b4a03":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"0dcb7fffb9b0495f9f4adf85c02245d3":{"model_module":"@jupyter-widgets/controls","model_name":"HBoxModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HBoxModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HBoxView","box_style":"","children":["IPY_MODEL_b1db4986f0b549c197e1129c89667ae1","IPY_MODEL_f46b619bfc9b4552ae9b6cbedccac8bf","IPY_MODEL_03164eb4f5034d0c8d9cc2479553ebb0"],"layout":"IPY_MODEL_53e893a6c87d44bca93221d8b9c583e1"}},"b1db4986f0b549c197e1129c89667ae1":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_7fa95c2ec7864241b6103519d488a285","placeholder":"​","style":"IPY_MODEL_ab09c12c83ed46b099ddaab9cb5b49fa","value":"config.json: 100%"}},"f46b619bfc9b4552ae9b6cbedccac8bf":{"model_module":"@jupyter-widgets/controls","model_name":"FloatProgressModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"FloatProgressModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"ProgressView","bar_style":"success","description":"","description_tooltip":null,"layout":"IPY_MODEL_f5979750d11f4d6bb3f5ffcc778830d2","max":689,"min":0,"orientation":"horizontal","style":"IPY_MODEL_b6096e03ae914c1d993296d2061329be","value":689}},"03164eb4f5034d0c8d9cc2479553ebb0":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_cd5ea94612f14b2589d22f1854f6fac0","placeholder":"​","style":"IPY_MODEL_eae55e6e68bd49a6962391704655ac8d","value":" 689/689 [00:00&lt;00:00, 50.7kB/s]"}},"53e893a6c87d44bca93221d8b9c583e1":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"7fa95c2ec7864241b6103519d488a285":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"ab09c12c83ed46b099ddaab9cb5b49fa":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"f5979750d11f4d6bb3f5ffcc778830d2":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"b6096e03ae914c1d993296d2061329be":{"model_module":"@jupyter-widgets/controls","model_name":"ProgressStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"ProgressStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","bar_color":null,"description_width":""}},"cd5ea94612f14b2589d22f1854f6fac0":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"eae55e6e68bd49a6962391704655ac8d":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"14aea29593a04145a3981b522369786c":{"model_module":"@jupyter-widgets/controls","model_name":"HBoxModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HBoxModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HBoxView","box_style":"","children":["IPY_MODEL_48c65536a4094a8bb74823fd6d227976","IPY_MODEL_2b0d10dc0e13427a98323550a0be199d","IPY_MODEL_427b2c89bb5d46b6a4fca03896b86f18"],"layout":"IPY_MODEL_28aa64433a664e69a725015499aa7a1a"}},"48c65536a4094a8bb74823fd6d227976":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_0dae19e80bc0495baec5322c42586992","placeholder":"​","style":"IPY_MODEL_2d290cc096a84cd19423e4d29bf85ed3","value":"model.safetensors: 100%"}},"2b0d10dc0e13427a98323550a0be199d":{"model_module":"@jupyter-widgets/controls","model_name":"FloatProgressModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"FloatProgressModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"ProgressView","bar_style":"success","description":"","description_tooltip":null,"layout":"IPY_MODEL_8cf1cc5b2b2a4f43aa253f3889bfbda7","max":6431829964,"min":0,"orientation":"horizontal","style":"IPY_MODEL_7aaf5bfff3814974aec8bb3022b6e5fe","value":6431829964}},"427b2c89bb5d46b6a4fca03896b86f18":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_d2d75e495f2c48e6a65e57f5bbfe1090","placeholder":"​","style":"IPY_MODEL_2680d1a8aad740db83306b120bc3a20a","value":" 6.43G/6.43G [00:38&lt;00:00, 244MB/s]"}},"28aa64433a664e69a725015499aa7a1a":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"0dae19e80bc0495baec5322c42586992":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"2d290cc096a84cd19423e4d29bf85ed3":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"8cf1cc5b2b2a4f43aa253f3889bfbda7":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"7aaf5bfff3814974aec8bb3022b6e5fe":{"model_module":"@jupyter-widgets/controls","model_name":"ProgressStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"ProgressStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","bar_color":null,"description_width":""}},"d2d75e495f2c48e6a65e57f5bbfe1090":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"2680d1a8aad740db83306b120bc3a20a":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"eb57ae844d29483cb96635a69f6d1282":{"model_module":"@jupyter-widgets/controls","model_name":"HBoxModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HBoxModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HBoxView","box_style":"","children":["IPY_MODEL_574502bd607e4b229a1664487e47fb0e","IPY_MODEL_e51a47c79ef2403a9ee35ad35d604784","IPY_MODEL_99be4e9442a84d9eaae6d06a0a2c2da6"],"layout":"IPY_MODEL_c8fd1e11880448babb8c970e5d11fcc6"}},"574502bd607e4b229a1664487e47fb0e":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_4b6470723a054d689d1a931bd11ccd1c","placeholder":"​","style":"IPY_MODEL_963807e2ff784eb19ccb2cf4945f488d","value":"generation_config.json: 100%"}},"e51a47c79ef2403a9ee35ad35d604784":{"model_module":"@jupyter-widgets/controls","model_name":"FloatProgressModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"FloatProgressModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"ProgressView","bar_style":"success","description":"","description_tooltip":null,"layout":"IPY_MODEL_e37833bf6a9c498da4388e78447b8a8f","max":124,"min":0,"orientation":"horizontal","style":"IPY_MODEL_3043a9b492624f6084f604cbcdb00027","value":124}},"99be4e9442a84d9eaae6d06a0a2c2da6":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_53e9e289fa454064a430413b92cc40b0","placeholder":"​","style":"IPY_MODEL_c290a082557e43958c595f21f944be5d","value":" 124/124 [00:00&lt;00:00, 7.62kB/s]"}},"c8fd1e11880448babb8c970e5d11fcc6":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"4b6470723a054d689d1a931bd11ccd1c":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"963807e2ff784eb19ccb2cf4945f488d":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"e37833bf6a9c498da4388e78447b8a8f":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"3043a9b492624f6084f604cbcdb00027":{"model_module":"@jupyter-widgets/controls","model_name":"ProgressStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"ProgressStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","bar_color":null,"description_width":""}},"53e9e289fa454064a430413b92cc40b0":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"c290a082557e43958c595f21f944be5d":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"ae49256870804b598044a5fe62db702c":{"model_module":"@jupyter-widgets/controls","model_name":"HBoxModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HBoxModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HBoxView","box_style":"","children":["IPY_MODEL_76db057d3bdb4a638c981a6d786a8b2d","IPY_MODEL_669d9d52452a40279529804a6854cc95","IPY_MODEL_6d75175fe62441029dd292b50029f374"],"layout":"IPY_MODEL_bfbe5c61dffe453f88844cf27886f2e8"}},"76db057d3bdb4a638c981a6d786a8b2d":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_138881dc73974535966d8930481c827a","placeholder":"​","style":"IPY_MODEL_e40e90db06d44b7983e2b7683769b290","value":"tokenizer_config.json: 100%"}},"669d9d52452a40279529804a6854cc95":{"model_module":"@jupyter-widgets/controls","model_name":"FloatProgressModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"FloatProgressModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"ProgressView","bar_style":"success","description":"","description_tooltip":null,"layout":"IPY_MODEL_2c38b459f4224e558e655f6869d0c8ca","max":26,"min":0,"orientation":"horizontal","style":"IPY_MODEL_cb723e3be81846cca3087f1b038153cf","value":26}},"6d75175fe62441029dd292b50029f374":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_9b286cad4c304e2b97446d131771c1fe","placeholder":"​","style":"IPY_MODEL_53279967d41d4d529cafea3c7b5d898c","value":" 26.0/26.0 [00:00&lt;00:00, 1.91kB/s]"}},"bfbe5c61dffe453f88844cf27886f2e8":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"138881dc73974535966d8930481c827a":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"e40e90db06d44b7983e2b7683769b290":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"2c38b459f4224e558e655f6869d0c8ca":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"cb723e3be81846cca3087f1b038153cf":{"model_module":"@jupyter-widgets/controls","model_name":"ProgressStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"ProgressStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","bar_color":null,"description_width":""}},"9b286cad4c304e2b97446d131771c1fe":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"53279967d41d4d529cafea3c7b5d898c":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"66afc2e6d8ef49f1843bc8b4ebac4ebb":{"model_module":"@jupyter-widgets/controls","model_name":"HBoxModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HBoxModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HBoxView","box_style":"","children":["IPY_MODEL_574ac2f6f62343b2ba522d4d08beec6e","IPY_MODEL_34701102e23e4faebd4be3c26e820c10","IPY_MODEL_f728e256cbb146f7824824f86a491ebe"],"layout":"IPY_MODEL_82a77e91e439419cb347909a6d6cb9ec"}},"574ac2f6f62343b2ba522d4d08beec6e":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_1c4aad3119754d65b8565784df6f597b","placeholder":"​","style":"IPY_MODEL_9befbd0cd2054f3eb5cad58f9ec829f3","value":"vocab.json: 100%"}},"34701102e23e4faebd4be3c26e820c10":{"model_module":"@jupyter-widgets/controls","model_name":"FloatProgressModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"FloatProgressModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"ProgressView","bar_style":"success","description":"","description_tooltip":null,"layout":"IPY_MODEL_736659c3c0b641bd8266e195d4529ee1","max":1042301,"min":0,"orientation":"horizontal","style":"IPY_MODEL_23a4c560ac5b4cd093994f5be3f3b685","value":1042301}},"f728e256cbb146f7824824f86a491ebe":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_32319c2ba2764a6b9da5f34f067665d6","placeholder":"​","style":"IPY_MODEL_7bb0e75490054a308e9de11b2c6c5552","value":" 1.04M/1.04M [00:00&lt;00:00, 3.12MB/s]"}},"82a77e91e439419cb347909a6d6cb9ec":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"1c4aad3119754d65b8565784df6f597b":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"9befbd0cd2054f3eb5cad58f9ec829f3":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"736659c3c0b641bd8266e195d4529ee1":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"23a4c560ac5b4cd093994f5be3f3b685":{"model_module":"@jupyter-widgets/controls","model_name":"ProgressStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"ProgressStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","bar_color":null,"description_width":""}},"32319c2ba2764a6b9da5f34f067665d6":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"7bb0e75490054a308e9de11b2c6c5552":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"e4ebf545cb4e42fa9cb4e233aaec1bab":{"model_module":"@jupyter-widgets/controls","model_name":"HBoxModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HBoxModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HBoxView","box_style":"","children":["IPY_MODEL_55bc43763cc54d8bb01857c6e0a6f5a6","IPY_MODEL_a9f222ff32b04f74a5f4c01906cac590","IPY_MODEL_bc214017673b4a909d4013a897a634f7"],"layout":"IPY_MODEL_d5c1dd35b3494cae988dea5451c73d2c"}},"55bc43763cc54d8bb01857c6e0a6f5a6":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_56d7ad25e5c44163aa898f98af9cae2c","placeholder":"​","style":"IPY_MODEL_d88d7403cb984fcdb837a266222ebde3","value":"merges.txt: 100%"}},"a9f222ff32b04f74a5f4c01906cac590":{"model_module":"@jupyter-widgets/controls","model_name":"FloatProgressModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"FloatProgressModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"ProgressView","bar_style":"success","description":"","description_tooltip":null,"layout":"IPY_MODEL_b0c01a122cf6410491b85c53e4b14272","max":456318,"min":0,"orientation":"horizontal","style":"IPY_MODEL_87adabdb77554daeabc723de9eafd3ef","value":456318}},"bc214017673b4a909d4013a897a634f7":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_4bdc3594652d41aa94276931ba2589db","placeholder":"​","style":"IPY_MODEL_45b1123a2fdc4b6dbe24470b93700985","value":" 456k/456k [00:00&lt;00:00, 1.86MB/s]"}},"d5c1dd35b3494cae988dea5451c73d2c":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"56d7ad25e5c44163aa898f98af9cae2c":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"d88d7403cb984fcdb837a266222ebde3":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"b0c01a122cf6410491b85c53e4b14272":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"87adabdb77554daeabc723de9eafd3ef":{"model_module":"@jupyter-widgets/controls","model_name":"ProgressStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"ProgressStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","bar_color":null,"description_width":""}},"4bdc3594652d41aa94276931ba2589db":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"45b1123a2fdc4b6dbe24470b93700985":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"e9ad5d43fdb5486c910aa0b8c641805d":{"model_module":"@jupyter-widgets/controls","model_name":"HBoxModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HBoxModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HBoxView","box_style":"","children":["IPY_MODEL_ef2cf9b9aa3140b785e30e6498627e8b","IPY_MODEL_d5967ae13e1745b7880a759cf9234708","IPY_MODEL_5153b13ab8d54d24b25976a89af4b163"],"layout":"IPY_MODEL_e166360141c344ce8cded7017f321e5b"}},"ef2cf9b9aa3140b785e30e6498627e8b":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_e6e7564fc1aa4ae185a71bfb5fe4f1b2","placeholder":"​","style":"IPY_MODEL_a868a36e05984b05ac2c141f07bb76fb","value":"tokenizer.json: 100%"}},"d5967ae13e1745b7880a759cf9234708":{"model_module":"@jupyter-widgets/controls","model_name":"FloatProgressModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"FloatProgressModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"ProgressView","bar_style":"success","description":"","description_tooltip":null,"layout":"IPY_MODEL_8325eb82b82245a8bf19198fcdd21793","max":1355256,"min":0,"orientation":"horizontal","style":"IPY_MODEL_d62c9c86ee9e411ca081dc18f9feecb0","value":1355256}},"5153b13ab8d54d24b25976a89af4b163":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_10e2642df3844b2399d4c0b186eca9e4","placeholder":"​","style":"IPY_MODEL_47667165042347999111fb15267e71ab","value":" 1.36M/1.36M [00:00&lt;00:00, 3.32MB/s]"}},"e166360141c344ce8cded7017f321e5b":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"e6e7564fc1aa4ae185a71bfb5fe4f1b2":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"a868a36e05984b05ac2c141f07bb76fb":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"8325eb82b82245a8bf19198fcdd21793":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"d62c9c86ee9e411ca081dc18f9feecb0":{"model_module":"@jupyter-widgets/controls","model_name":"ProgressStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"ProgressStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","bar_color":null,"description_width":""}},"10e2642df3844b2399d4c0b186eca9e4":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"47667165042347999111fb15267e71ab":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"3ff99c89d9b14c2e8703fc48888eb2f8":{"model_module":"@jupyter-widgets/controls","model_name":"HBoxModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HBoxModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HBoxView","box_style":"","children":["IPY_MODEL_e831aee7af244e3cbe313b33af6364ef","IPY_MODEL_f0e66ce08c264adbac3ae1b25f5428bd","IPY_MODEL_262a9d4e254244dca9f839771248578e"],"layout":"IPY_MODEL_11aef73879b44cf9a43a382fff026aca"}},"e831aee7af244e3cbe313b33af6364ef":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_d98e31fc768f4b25b504ce3a751c4360","placeholder":"​","style":"IPY_MODEL_4f71c763c67745218c37f883be00c54d","value":"Map: 100%"}},"f0e66ce08c264adbac3ae1b25f5428bd":{"model_module":"@jupyter-widgets/controls","model_name":"FloatProgressModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"FloatProgressModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"ProgressView","bar_style":"success","description":"","description_tooltip":null,"layout":"IPY_MODEL_742d595279cd42d2b577c25725bd201f","max":103,"min":0,"orientation":"horizontal","style":"IPY_MODEL_4c2010ee437647c2b6632587f906ec10","value":103}},"262a9d4e254244dca9f839771248578e":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_c6d3eb372ecc432fbb67ca534ba5862f","placeholder":"​","style":"IPY_MODEL_22ca121828fd4ed7aa29a62116a787cf","value":" 103/103 [00:00&lt;00:00, 2551.93 examples/s]"}},"11aef73879b44cf9a43a382fff026aca":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"d98e31fc768f4b25b504ce3a751c4360":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"4f71c763c67745218c37f883be00c54d":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"742d595279cd42d2b577c25725bd201f":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"4c2010ee437647c2b6632587f906ec10":{"model_module":"@jupyter-widgets/controls","model_name":"ProgressStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"ProgressStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","bar_color":null,"description_width":""}},"c6d3eb372ecc432fbb67ca534ba5862f":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"22ca121828fd4ed7aa29a62116a787cf":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}}}}},"nbformat":4,"nbformat_minor":0}