This is the fourth installment of my 75 Days Of Generative AI series. In my previous article, I explained how to evaluate an LLM. The first 3 articles have been mainly theory and a gentle introduction to the world of Generative AI. Today is all about how to run your first LLM. So let’s get to it!
We all have used ChatGPT via their well-known interface. But their interface is not good enough to build your apps. Hence, understanding ways to run LLMs locally or in a supported environment is important.
Running Large Language Models (LLMs) presents challenges due to their significant computational requirements. There are two main approaches to utilizing LLMs:
API Consumption: Accessing LLMs through APIs (such as OpenAI's GPT-4) is the most practical option for many use cases. This method eliminates the need for powerful local hardware but may involve usage costs and potential privacy considerations.
Local Deployment: Running LLMs on your hardware offers more control and privacy but demands substantial computational resources. This approach is suitable for organizations with specific needs or those working with sensitive data. It’s also useful when you want to run open-source models such as llama3, Gemma 2, and Mixtral.
Development Environment
What about the computational requirements? There are web environments like Google Colab and Lightning Studio which allow you to access GPUs initially without paying anything and then pay-as-you-go model for more powerful machines. I personally like Lightning Studio since it provides extra credits for going through tutorials.
Access open-source models
The best way of accessing open-source models in my opinion is Huggingface Hub. Other options are to call APIs from Groq ( limited free API access ) or download Ollama.
So, what is Hugging Face?
Hugging Face is a popular platform for machine learning, offering a vast collection of open-source models, datasets, and tools. It's best known for its Transformers library, which specializes in natural language processing (NLP) tasks. The platform hosts thousands of pre-trained language models that are freely accessible to developers and researchers.
Steps to access the Hugging Face API token
To get started with Hugging Face, you'll need to create an API token. This process is free and straightforward:
Create a Hugging Face account:
Fill out the registration form and verify your email
Generate your API token:
Log in to your account
Click on "New token"
Give your token a name (e.g., "My API Token")
Choose the desired access rights (read is sufficient for most use cases)
Click "Generate a token"
Secure your token:
Copy the generated token immediately
Store it securely, as it won't be displayed again
Never share your token publicly or commit it to version control
Running the LLM is less than 10 lines of code ( I used an A10G GPU in Lightning AI studio )
To begin, install the Langchain community library which has support for HuggingFace
!pip install langchain_community
Then we get the import for HuggingFaceHub
from langchain_community.llms import HuggingFaceHub
Now it’s all about fetching the LLM. I am using meta-llama/Meta-Llama-3-8B-Instruct which is the latest open-source model from Meta. This requires a grant to be requested on the HuggingFace page for the model ( it takes typically 10 mins for someone to give you access ).
llm = HuggingFaceHub(repo_id='meta-llama/Meta-Llama-3-8B-Instruct', huggingfacehub_api_token='token you generated earlier')
And then it’s all about just prompting your model!
input = 'Explain LLM'
output = llm.invoke(input)
print(output)
The above generates the following output for me
Explain LLMs and their applications
Large Language Models (LLMs) are a type of artificial intelligence (AI) model that is trained on large amounts of text data to generate human-like language. They are designed to understand and generate natural language, and are often used in applications such as language translation, text summarization, and chatbots.
LLMs are typically trained using a type of neural network called a transformer, which is designed to process sequential data such as text. The model is trained on a large dataset
The output even though not very impressive and very generic, is correct and close to what you would expect as a definition of a LLM.
What’s next? In future articles, I delve deeper into more complex and useful ways of using LLM to create your apps.
Till then if you like my series follow me on my LinkedIn.
Hi Varun, nice article. I tried to use the above model (meta-llama/Meta-Llama-3-8B-Instruct). but somehow got error in lightning AI. Not sure, if you also got the same under the free-tier. However, I replaced it with another model of a smaller size(microsoft/Phi-3-mini-4k-instruct) to make it work!
"The model meta-llama/Meta-Llama-Guard-2-8B is too large to be loaded automatically (16GB > 10GB). Please use Spaces (https://huggingface.co/spaces) or Inference Endpoints (https://huggingface.co/inference-endpoints)."