What is a Large Language Model (LLM)?
A large language model denotes a category of artificial intelligence (AI) models that undergo extensive training with extensive textual data to comprehend and produce language resembling human expression🙇. Such a large language model constitutes a scaled-up transformer model, often too extensive for execution on a single computer. Consequently, it is commonly deployed as a service accessible through an API or web interface. These models are typically constructed using deep learning methodologies and neural networks, with a primary emphasis on processing and generating natural language.
As per NVIDIA- "Large language models largely represent a class of deep learning architectures called transformer networks. A transformer model is a neural network that learns context and meaning by tracking relationships in sequential data, like the words in this sentence". The large language model component generates output (in this case, text) based on the prompt and input. These LLMs can further be fine-tuned to match the needs of specific conversational agents (e.g. if you are building a customer care chatbot). A few use cases of LLM are as below-
How Large an LLM can be?
As per Google- "Early language models could predict the probability of a single word but modern large language models can predict the probability of sentences, paragraphs, or even entire documents". The size and capability of language models have exploded over the last few years as computer memory, dataset size, and processing power increase, and more effective techniques for modeling longer text sequences are developed💥.
Here "Large" can refer either to the number of parameters in the model, or sometimes the number of words in the dataset. Here parameters are the weights the model learned during training, used to predict the next token in the sequence. You can refer following table to imagine the largeness of few available pre-trained LLMs-
How Do Large Language Models Work?
Large language models are trained using unsupervised learning👈. With unsupervised learning, models can find previously unknown patterns in data using unlabelled datasets. This also eliminates the need for extensive data labeling, which is one of the biggest challenges in building AI models.
The ability of the foundation model to generate text for a wide variety of purposes without much instruction or training is called zero-shot learning👏. Different variations of this capability include one-shot or few-shot learning, wherein the foundation model is fed one or a few examples illustrating how a task can be accomplished to understand and better perform on select use cases.
What is Context Length and Token in LLM?
Context length is the number of tokens a language model can process at once. It is the maximum length of the input sequence. A Token is the model’s way of representing a word with a series of numbers. 100 words is around 130 tokens. If a model does not recognize a word, it breaks the word into multiple tokens. A model can only summarize an article no longer than the context length. In chat apps, the context length dictates how much of the previous conversation is remembered. Language models like ChatGPT or Llama 2-Chat do not remember anything. They are stateless. They know the past conversation because they were included in your current input before feeding it to the model.
What is LangChain?
Much like our familiarity with frameworks such as TensorFlow and PyTorch in the domains of image classification and object detection, LangChain serves as a framework tailored for building applications fueled by language models🙆. Specifically designed for working with Large Language Models (LLMs), LangChain simplifies the intricate process of developing extensive language models by presenting a systematic sequence of steps to generate text from a given input prompt. This toolkit manages various aspects, including prompt input, text generation, and manipulation of the generated output. Essentially, it facilitates the creation of conversational agents that leverage LLMs to generate coherent and natural language responses☝.
Components of LangChain Framework?
This framework consists of several parts-
1. LangChain Libraries: The Python and JavaScript libraries. Contains interfaces and integrations for a myriad of components, a basic run time for combining these components into chains and agents, and off-the-shelf implementations of chains and agents.
2. LangChain Templates: A collection of easily deployable reference architectures for a wide variety of tasks.
3. LangServe: A library for deploying LangChain chains as a REST API.
4. LangSmith: A developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain. Currently, it is in the beta stage as of writing this post on 10 Jan'24.
How does LangChain work?
LangChain provides a pipeline of steps that generate text from input prompts. The pipeline comprises seven main components: input (prompt templates), the large language model, agents, utilities, document loaders, chains, indexes, and memory.
The input component specifies the input prompt or the initial input provided to the pipeline. This comes in the form of a template that defines the structure of the prompt, including the format and content.
LangChain enables access to a range of pre-trained LLMs (e.g., GPT-3.5) trained on large datasets. The large language model component generates output (in this case, text) based on the prompt and input. These LLMs can further be fine-tuned to match the needs of specific conversational agents (e.g. if you are building a customer care chatbot).
With the Document Loaders module, we can ingest documents (e.g., PDFs, powerpoints, etc) into the LLM for further analysis (typically question answering).
LangChain provides an extensive collection of common utilities (Utils) to use in our application, such as Python REPLs (LLM would generate code to calculate the answer, run that code to get the answer, and print it out), bash commands (e.g., to interface with the local system) or search engines, as well as a requests wrapper (to link a URL published post-2021 that ChatGPT doesn’t know about, for instance).
Agents are systems that use a language model to interact with other tools. They can be used to create chatbots or power the next generation of personal assistants.
Chains provide an end-to-end pipeline for utilizing language models. These chains seamlessly integrate models, prompts, memory, parsing output, and debugging capabilities, offering a user-friendly interface.
In LangChain, Memory refers to the mechanism that stores and manages the conversation history between a user and the AI. It helps maintain context and coherency throughout the interaction, enabling the AI to generate more relevant and accurate responses.
How to install LangChain?
We can install the LangChain library using the pip or conda command. The pip command is pip install langchain and conda command is conda install langchain -c conda-forge
For example,, in google colab notebook installation will look like the below-
LangChain enables building applications that connect external sources of data and computation to LLMs. We will start with a simple LLM chain, which just relies on information in the prompt template to respond using one of the open-sourced LLM mistralai/Mistral-7B-v0.1. Mistral 7B is a new 7.3 billion parameter language model that represents a major advance in large language model (LLM) capabilities👀. It has outperformed the 13 billion parameter Llama 2 model on all tasks and outperforms the 34 billion parameter Llama 1 on many benchmarks👊. You can refer following table showing the number of parameters and minimum RAM required to run the two available Mistalai LLMs:
These models are available in Hugging Face and Hugging Face has inbuilt APIs to use these models easily. Let's start with installing the required library from the hugging face-
Next, we will download the pre-trained LLM from the Hugging Face space along with its tokenizer. Please note that tokenization is a crucial text-processing step for large language models👈. It splits text into smaller units called tokens that can be fed into the language model. Also to download the model you will need 16GB RAM at least. But free and Pro Google Colab accounts do not have enough CPU RAM to download the official model containing the weights and you will get Out of memory error😓 if you try to download the whole model at once-
Here, the BitsAndBytes library is used to create 4-bit quantization with NF4-type configuration to load our model in 4-bit precision. It will help us load the model faster and reduce the memory footprint💣.
Once we download the model successfully👍, the next step is to test it with user given prompt and see how it answers that prompt query. Mistral expects the prompts in the following format-
Comments
Post a Comment