Skip to main content

Generative AI with LangChain: Basics

 





Wishing everyone a Happy New Year '24😇 I trust that you've found valuable insights in my previous blog posts. Embarking on a new learning adventure with this latest post, we'll delve into the realm of Generative AI applications using LangChain💪. This article will initially cover the basics of Language Models and LangChain. Subsequent posts will guide you through hands-on experiences with various Generative AI use cases using LangChain. Let's kick off by exploring the essential fundamentals💁

What is a Large Language Model (LLM)?

A large language model denotes a category of artificial intelligence (AI) models that undergo extensive training with extensive textual data to comprehend and produce language resembling human expression🙇. Such a large language model constitutes a scaled-up transformer model, often too extensive for execution on a single computer. Consequently, it is commonly deployed as a service accessible through an API or web interface. These models are typically constructed using deep learning methodologies and neural networks, with a primary emphasis on processing and generating natural language. 

As per NVIDIA- "Large language models largely represent a class of deep learning architectures called transformer networks. A transformer model is a neural network that learns context and meaning by tracking relationships in sequential data, like the words in this sentence". The large language model component generates output (in this case, text) based on the prompt and input. These LLMs can further be fine-tuned to match the needs of specific conversational agents (e.g. if you are building a customer care chatbot). A few use cases of LLM are as below-



How Large an LLM can be?

As per Google- "Early language models could predict the probability of a single word but modern large language models can predict the probability of sentences, paragraphs, or even entire documents". The size and capability of language models have exploded over the last few years as computer memory, dataset size, and processing power increase, and more effective techniques for modeling longer text sequences are developed💥. 

Here "Large" can refer either to the number of parameters in the model, or sometimes the number of words in the dataset. Here parameters are the weights the model learned during training, used to predict the next token in the sequence. You can refer following table to imagine the largeness of few available pre-trained LLMs-

Model_NameParametersLicenseTask
ChatGPT175 billionProprietaryMulti
microsoft/phi-22.7 billionMITText Generation
TinyLlama/TinyLlama-1.1B-Chat-v1.01.1BApache-2.0Text Generation
mistralai/Mixtral-8x7B-Instruct-v0.146.7BApache-2.0Text Generation
openai/whisper-large-v31 million hours of weakly labeled audio
and 4 million hours of pseudolabeled audio
Apache-2.0Speech Recognition
nvidia/parakeet-rnnt-1.1b1.1Bcc-by-4.0Speech Recognition
bert-base-uncased110 millionApache-2.0Fill-Mask
bert-large-uncased340 millionApache-2.0Fill-Mask

How Do Large Language Models Work?

Large language models are trained using unsupervised learning👈. With unsupervised learning, models can find previously unknown patterns in data using unlabelled datasets. This also eliminates the need for extensive data labeling, which is one of the biggest challenges in building AI models.

The ability of the foundation model to generate text for a wide variety of purposes without much instruction or training is called zero-shot learning👏. Different variations of this capability include one-shot or few-shot learning, wherein the foundation model is fed one or a few examples illustrating how a task can be accomplished to understand and better perform on select use cases.

What is Context Length and Token in LLM?

Context length is the number of tokens a language model can process at once. It is the maximum length of the input sequence. A Token is the model’s way of representing a word with a series of numbers. 100 words is around 130 tokens. If a model does not recognize a word, it breaks the word into multiple tokens. A model can only summarize an article no longer than the context length. In chat apps, the context length dictates how much of the previous conversation is remembered. Language models like ChatGPT or Llama 2-Chat do not remember anything. They are stateless. They know the past conversation because they were included in your current input before feeding it to the model.

What is LangChain?

Much like our familiarity with frameworks such as TensorFlow and PyTorch in the domains of image classification and object detection, LangChain serves as a framework tailored for building applications fueled by language models🙆. Specifically designed for working with Large Language Models (LLMs), LangChain simplifies the intricate process of developing extensive language models by presenting a systematic sequence of steps to generate text from a given input prompt. This toolkit manages various aspects, including prompt input, text generation, and manipulation of the generated output. Essentially, it facilitates the creation of conversational agents that leverage LLMs to generate coherent and natural language responses☝.

Components of LangChain Framework?

This framework consists of several parts-

1. LangChain Libraries: The Python and JavaScript libraries. Contains interfaces and integrations for a myriad of components, a basic run time for combining these components into chains and agents, and off-the-shelf implementations of chains and agents.

2. LangChain Templates: A collection of easily deployable reference architectures for a wide variety of tasks.

3. LangServe: A library for deploying LangChain chains as a REST API.

4. LangSmith: A developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain. Currently, it is in the beta stage as of writing this post on 10 Jan'24.

How does LangChain work?

LangChain provides a pipeline of steps that generate text from input prompts. The pipeline comprises seven main components: input (prompt templates), the large language model, agents, utilities, document loaders, chains, indexes, and memory.

The input component specifies the input prompt or the initial input provided to the pipeline. This comes in the form of a template that defines the structure of the prompt, including the format and content. 

LangChain enables access to a range of pre-trained LLMs (e.g., GPT-3.5) trained on large datasets. The large language model component generates output (in this case, text) based on the prompt and input. These LLMs can further be fine-tuned to match the needs of specific conversational agents (e.g. if you are building a customer care chatbot).

With the Document Loaders module, we can ingest documents (e.g., PDFs, powerpoints, etc) into the LLM for further analysis (typically question answering).

LangChain provides an extensive collection of common utilities (Utils) to use in our application, such as Python REPLs (LLM would generate code to calculate the answer, run that code to get the answer, and print it out), bash commands (e.g., to interface with the local system) or search engines, as well as a requests wrapper (to link a URL published post-2021 that ChatGPT doesn’t know about, for instance). 

Agents are systems that use a language model to interact with other tools. They can be used to create chatbots or power the next generation of personal assistants.

Chains provide an end-to-end pipeline for utilizing language models. These chains seamlessly integrate models, prompts, memory, parsing output, and debugging capabilities, offering a user-friendly interface. 

In LangChain, Memory refers to the mechanism that stores and manages the conversation history between a user and the AI. It helps maintain context and coherency throughout the interaction, enabling the AI to generate more relevant and accurate responses.

How to install LangChain?

We can install the LangChain library using the pip or conda command. The pip command is pip install langchain and conda command is conda install langchain -c conda-forge

For example,, in google colab notebook installation will look like the below-



Building something with LangChain?

LangChain enables building applications that connect external sources of data and computation to LLMs. We will start with a simple LLM chain, which just relies on information in the prompt template to respond using one of the open-sourced LLM mistralai/Mistral-7B-v0.1. Mistral 7B is a new 7.3 billion parameter language model that represents a major advance in large language model (LLM) capabilities👀. It has outperformed the 13 billion parameter Llama 2 model on all tasks and outperforms the 34 billion parameter Llama 1 on many benchmarks👊. You can refer following table showing the number of parameters and minimum RAM required to run the two available Mistalai LLMs:


These models are available in Hugging Face and Hugging Face has inbuilt APIs to use these models easily. Let's start with installing the required library from the hugging face-

Next, we will import the required libraries-

Next, we will download the pre-trained LLM from the Hugging Face space along with its tokenizer. Please note that tokenization is a crucial text-processing step for large language models👈. It splits text into smaller units called tokens that can be fed into the language model. Also to download the model you will need 16GB RAM at least. But free and Pro Google Colab accounts do not have enough CPU RAM to download the official model containing the weights and you will get Out of memory error😓 if you try to download the whole model at once-



To solve the out-of-memory error or to be able to download the model in a resource-constrained notebook, we can download the sharded version of the original model that somebody had already done for us😀. Before that install following these two libraries as well in colab and don't forget to restart the session-



Now we will load the shared version of the model and it's tokenizer like below-

Here, the BitsAndBytes library is used to create 4-bit quantization with NF4-type configuration to load our model in 4-bit precision. It will help us load the model faster and reduce the memory footprint💣.
Once you run the above-mentioned code, downloading will be started without any out of memory error like the below-

Once we download the model successfully👍, the next step is to test it with user given prompt and see how it answers that prompt query. Mistral expects the prompts in the following format-

So let's create a sample prompt by following the above format:

Now, we will pass out text to the tokenizer and then generate the answer to the user query from our model in the following way-

Here, we use the model's generate function to generate the text based on our query. The output of the above code will look like below-

As you can see, the answer is quite promising and accurate👌. We have a list of models with their source links and explanations. This shows the power of text generation and understanding of human language by a machine💥. The generative AI field is quite impressive, right?

 That's it for today's guys! It's time for you to do some hands-on and play with the code using this link. Copy the notebook in your space and do some experiments with it to get familiar with LangChain and Large Language Models👲. In the next post, we will learn to fine-tune this Mistral model to our use case, till then 👉 Go chase your dreams, have an awesome day, make every second count, and see you later in my next post😇






 









Comments

Popular posts from this blog

How to deploy your ML model as Fast API?

Another post starts with you beautiful people! Thank you all for showing so much interests in my last posts about object detection and recognition using YOLOv4. I was very happy to see many aspiring data scientists have learnt from my past three posts about using YOLOv4. Today I am going to share you all a new skill to learn. Most of you have seen my post about  deploying and consuming ML models as Flask API   where we have learnt to deploy and consume a keras model with Flask API  . In this post you are going to learn a new framework-  FastAPI to deploy your model as Rest API. After completing this post you will have a new industry standard skill. What is FastAPI? FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.6+ based on standard Python type hints. It is easy to learn, fast to code and ready for production . Yes, you heard it right! Flask is not meant to be used in production but with FastAPI you can use you...

Learn the fastest way to build data apps

Another post starts with you beautiful people! I hope you have enjoyed and learned something new from my previous three posts about machine learning model deployment. In one post we have learned  How to deploy a model as FastAPI?  I n the second post, we have learned  How to deploy a deep learning model as RestAPI ? and in the third post, we have also learned  How to scale your deep learning model API?   If you are following my blog posts, you have seen how easily you have transit yourselves from aspiring to a mature data scientist. In this new post, I am going to share a new framework-  Streamlit which will help you to easily create a beautiful app with Python only. I will show here how had I used the Streamlit framework to create an app for my YOLOv3 custom model. What is Streamlit? Streamlit’s open-source app framework is the easiest way for data scientists and machine learning engineers to create beautiful, performant apps in only a few hours!...

How can I make a simple ChatBot?

Another post starts with you beautiful people! It has been a long time of posting a new post. But my friends in this period I was not sitting  where I got a chance to work with chatbot and classification related machine learning problem. So in this post I am going to share all about chatbot- from where I have learned? What I have learned? And how can you build your first bot? Quite interesting right! Chatbot is a program that can conduct an intelligent conversation based on user's input. Since chatbot is a new thing to me also, I first searched- is there any Python library available to start with this? And like always Python has helped me this time also. There is a Python library available with name as  ChatterBot   which is nothing but a machine learning conversational dialog engine. And yes that is all I want to start my learning because I always prefer inbuilt Python library to start my learning journey and once I learn this then only I move ahead for another...