Generative AI with LangChain: Basics

Wishing everyone a Happy New Year '24😇 I trust that you've found valuable insights in my previous blog posts. Embarking on a new learning adventure with this latest post, we'll delve into the realm of Generative AI applications using LangChain💪. This article will initially cover the basics of Language Models and LangChain. Subsequent posts will guide you through hands-on experiences with various Generative AI use cases using LangChain. Let's kick off by exploring the essential fundamentals💁

What is a Large Language Model (LLM)?

A large language model denotes a category of artificial intelligence (AI) models that undergo extensive training with extensive textual data to comprehend and produce language resembling human expression🙇. Such a large language model constitutes a scaled-up transformer model, often too extensive for execution on a single computer. Consequently, it is commonly deployed as a service accessible through an API or web interface. These models are typically constructed using deep learning methodologies and neural networks, with a primary emphasis on processing and generating natural language.

As per NVIDIA- "Large language models largely represent a class of deep learning architectures called transformer networks. A transformer model is a neural network that learns context and meaning by tracking relationships in sequential data, like the words in this sentence". The large language model component generates output (in this case, text) based on the prompt and input. These LLMs can further be fine-tuned to match the needs of specific conversational agents (e.g. if you are building a customer care chatbot). A few use cases of LLM are as below-

How Large an LLM can be?

As per Google- "Early language models could predict the probability of a single word but modern large language models can predict the probability of sentences, paragraphs, or even entire documents". The size and capability of language models have exploded over the last few years as computer memory, dataset size, and processing power increase, and more effective techniques for modeling longer text sequences are developed💥.

Here "Large" can refer either to the number of parameters in the model, or sometimes the number of words in the dataset. Here parameters are the weights the model learned during training, used to predict the next token in the sequence. You can refer following table to imagine the largeness of few available pre-trained LLMs-

Model_Name	Parameters	License	Task
ChatGPT	175 billion	Proprietary	Multi
microsoft/phi-2	2.7 billion	MIT	Text Generation
TinyLlama/TinyLlama-1.1B-Chat-v1.0	1.1B	Apache-2.0	Text Generation
mistralai/Mixtral-8x7B-Instruct-v0.1	46.7B	Apache-2.0	Text Generation
openai/whisper-large-v3	1 million hours of weakly labeled audio and 4 million hours of pseudolabeled audio	Apache-2.0	Speech Recognition
nvidia/parakeet-rnnt-1.1b	1.1B	cc-by-4.0	Speech Recognition
bert-base-uncased	110 million	Apache-2.0	Fill-Mask
bert-large-uncased	340 million	Apache-2.0	Fill-Mask

How Do Large Language Models Work?

Large language models are trained using unsupervised learning👈. With unsupervised learning, models can find previously unknown patterns in data using unlabelled datasets. This also eliminates the need for extensive data labeling, which is one of the biggest challenges in building AI models.

The ability of the foundation model to generate text for a wide variety of purposes without much instruction or training is called zero-shot learning👏. Different variations of this capability include one-shot or few-shot learning, wherein the foundation model is fed one or a few examples illustrating how a task can be accomplished to understand and better perform on select use cases.

What is Context Length and Token in LLM?

Context length is the number of tokens a language model can process at once. It is the maximum length of the input sequence. A Token is the model’s way of representing a word with a series of numbers. 100 words is around 130 tokens. If a model does not recognize a word, it breaks the word into multiple tokens. A model can only summarize an article no longer than the context length. In chat apps, the context length dictates how much of the previous conversation is remembered. Language models like ChatGPT or Llama 2-Chat do not remember anything. They are stateless. They know the past conversation because they were included in your current input before feeding it to the model.

What is LangChain?

Much like our familiarity with frameworks such as TensorFlow and PyTorch in the domains of image classification and object detection, LangChain serves as a framework tailored for building applications fueled by language models🙆. Specifically designed for working with Large Language Models (LLMs), LangChain simplifies the intricate process of developing extensive language models by presenting a systematic sequence of steps to generate text from a given input prompt. This toolkit manages various aspects, including prompt input, text generation, and manipulation of the generated output. Essentially, it facilitates the creation of conversational agents that leverage LLMs to generate coherent and natural language responses☝.

Components of LangChain Framework?

This framework consists of several parts-

1. LangChain Libraries: The Python and JavaScript libraries. Contains interfaces and integrations for a myriad of components, a basic run time for combining these components into chains and agents, and off-the-shelf implementations of chains and agents.

2. LangChain Templates: A collection of easily deployable reference architectures for a wide variety of tasks.

3. LangServe: A library for deploying LangChain chains as a REST API.

4. LangSmith: A developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain. Currently, it is in the beta stage as of writing this post on 10 Jan'24.

How does LangChain work?

LangChain provides a pipeline of steps that generate text from input prompts. The pipeline comprises seven main components: input (prompt templates), the large language model, agents, utilities, document loaders, chains, indexes, and memory.

The input component specifies the input prompt or the initial input provided to the pipeline. This comes in the form of a template that defines the structure of the prompt, including the format and content.

LangChain enables access to a range of pre-trained LLMs (e.g., GPT-3.5) trained on large datasets. The large language model component generates output (in this case, text) based on the prompt and input. These LLMs can further be fine-tuned to match the needs of specific conversational agents (e.g. if you are building a customer care chatbot).

With the Document Loaders module, we can ingest documents (e.g., PDFs, powerpoints, etc) into the LLM for further analysis (typically question answering).

LangChain provides an extensive collection of common utilities (Utils) to use in our application, such as Python REPLs (LLM would generate code to calculate the answer, run that code to get the answer, and print it out), bash commands (e.g., to interface with the local system) or search engines, as well as a requests wrapper (to link a URL published post-2021 that ChatGPT doesn’t know about, for instance).

Agents are systems that use a language model to interact with other tools. They can be used to create chatbots or power the next generation of personal assistants.

Chains provide an end-to-end pipeline for utilizing language models. These chains seamlessly integrate models, prompts, memory, parsing output, and debugging capabilities, offering a user-friendly interface.

In LangChain, Memory refers to the mechanism that stores and manages the conversation history between a user and the AI. It helps maintain context and coherency throughout the interaction, enabling the AI to generate more relevant and accurate responses.

How to install LangChain?

We can install the LangChain library using the pip or conda command. The pip command is pip install langchain and conda command is conda install langchain -c conda-forge

For example,, in google colab notebook installation will look like the below-

Building something with LangChain?

LangChain enables building applications that connect external sources of data and computation to LLMs. We will start with a simple LLM chain, which just relies on information in the prompt template to respond using one of the open-sourced LLM mistralai/Mistral-7B-v0.1. Mistral 7B is a new 7.3 billion parameter language model that represents a major advance in large language model (LLM) capabilities👀. It has outperformed the 13 billion parameter Llama 2 model on all tasks and outperforms the 34 billion parameter Llama 1 on many benchmarks👊. You can refer following table showing the number of parameters and minimum RAM required to run the two available Mistalai LLMs:

These models are available in Hugging Face and Hugging Face has inbuilt APIs to use these models easily. Let's start with installing the required library from the hugging face-

Next, we will import the required libraries-

Next, we will download the pre-trained LLM from the Hugging Face space along with its tokenizer. Please note that tokenization is a crucial text-processing step for large language models👈. It splits text into smaller units called tokens that can be fed into the language model. Also to download the model you will need 16GB RAM at least. But free and Pro Google Colab accounts do not have enough CPU RAM to download the official model containing the weights and you will get Out of memory error😓 if you try to download the whole model at once-

To solve the out-of-memory error or to be able to download the model in a resource-constrained notebook, we can download the sharded version of the original model that somebody had already done for us😀. Before that install following these two libraries as well in colab and don't forget to restart the session-

Now we will load the shared version of the model and it's tokenizer like below-

Here, the BitsAndBytes library is used to create 4-bit quantization with NF4-type configuration to load our model in 4-bit precision. It will help us load the model faster and reduce the memory footprint💣.

Once you run the above-mentioned code, downloading will be started without any out of memory error like the below-

Once we download the model successfully👍, the next step is to test it with user given prompt and see how it answers that prompt query. Mistral expects the prompts in the following format-

So let's create a sample prompt by following the above format:

Now, we will pass out text to the tokenizer and then generate the answer to the user query from our model in the following way-

Here, we use the model's generate function to generate the text based on our query. The output of the above code will look like below-

As you can see, the answer is quite promising and accurate👌. We have a list of models with their source links and explanations. This shows the power of text generation and understanding of human language by a machine💥. The generative AI field is quite impressive, right?

That's it for today's guys! It's time for you to do some hands-on and play with the code using this link. Copy the notebook in your space and do some experiments with it to get familiar with LangChain and Large Language Models👲. In the next post, we will learn to fine-tune this Mistral model to our use case, till then 👉 Go chase your dreams, have an awesome day, make every second count, and see you later in my next post😇

How to convert your YOLOv4 weights to TensorFlow 2.2.0?

Another post starts with you beautiful people! Thank you all for your overwhelming response in my last two posts about the YOLOv4. It is quite clear that my beloved aspiring data scientists are very much curious to learn state of the art computer vision technique but they were not able to achieve that due to the lack of proper guidance. Now they have learnt exact steps to use a state of the art object detection and recognition technique from my last two posts. If you are new to my blog and want to use YOLOv4 in your project then please follow below two links- How to install and compile Darknet code with GPU? How to train your custom data with YOLOv4? In my last post we have trained our custom dataset to identify eight types of Indian classical dance forms. After the model training we have got the YOLOv4 specific weights file as 'yolo-obj_final.weights'. This YOLOv4 specific weight file cannot be used directly to either with OpenCV or with TensorFlow currently becau...

Learn Data Science using Python

Search This Blog

Generative AI with LangChain: Basics

Labels

Comments

Post a Comment

Popular posts from this blog

How to use opencv-python with Darknet's YOLOv4?

How to convert your YOLOv4 weights to TensorFlow 2.2.0?

Learn the fastest way to build data apps