Today’s AI platforms rely on different types of models, including the Large Language Models (LLMs) that power generative AI. As a software developer, how much do you need to actually learn about LLMs and related technologies (such as transformers)?
In the case of generative AIs like ChatGPT, these models are initially trained by having them read huge volumes of information, and gradually make word connections based on context. The software managing the LLM is called a transformer. The transformer analyzes the relationships between words in a sentence to figure out the meaning of the sentence. Using advanced mathematics, the transformer can read each word and, while doing so, pay attention to certain other words in the sentence that are related to it, and thus build a deeper understanding of the sentence. (This technology was described in the 2017 research paper called Attention is All You Need, which laid the foundation for today’s AI.)
To figure out how much technical AI knowledge you’ll actually need, you’ll need to have an idea of what type of software developer you want to be and where AI fits into that. AI is touching virtually every aspect of software development, which means you’ll need to know at least something. On one end, you have AI tools such as GitHub Copilot and Amazon Q for Developers that integrate with your IDEs and help you code; you don’t need to learn a lot about how the technology works, just what it can do.
On the other end of the scale, you may need to integrate AI into the software you’re building, such as a chatbot built into your website or app. In that case, you’ll need to know a bit about LLMs and the related technology as you integrate pre-trained LLMs into your app. These are LLMs that already have all the information they need coded into them, and you can download them (such as one called Hugging Face).
How you interact with AI will determine your need to understand LLMs, transformers, and the required libraries and frameworks. We’ve listed these in order from minimal required knowledge to maximum knowledge. You could take it even further, to areas such as writing your own transformer code or doing research into new types of transformers; those are the most advanced cases, and we won’t be covering them here because they require years of training in advanced mathematics and computer science (more than we can break down in our limited space).
Level 1: Interacting with the Tools
The first level is simply interacting with AI tools built into code editors. Using these tools requires little knowledge of LLMs and transformers; however, you do need to at least have a basic understanding of what AI tools can and cannot do. By learning how to use the tools and recognizing their limitations, you can get the most out of them.
For example, you need to realize the tools are far from perfect and can make mistakes. But by understanding that LLMs focus on language, you can recognize that today’s AI is great at language (both human and programming languages) but not so much problem solving. They’re like a human who memorizes things; we might have a friend who can rattle off 20 digits of Pi because they memorized it, but they might not know how to actually calculate the 21st digit.
Knowing these caveats of AI will give you a head’s up on what types of questions to ask the tools. Instead of querying, “Write me a program that will solve this particular scientific problem,” you would probably want to prompt: “Here’s the scientific formula I’m using. Write me a function that uses it and returns the results of it.”
Level 2: Adding a Pre-trained Chatbot to Your App
At this level, you’re adding minimal AI features to your app. You’ll locate a pretrained model on a site like Hugging Face, and that model will have all the information you want your chatbot to know. In this scenario, you’ll need to understand two aspects:
- How to locate the right model. To do so, you’ll search through the models based on your app’s needs. You’ll want to find a model trained with the information you need, and you’ll want to make sure it’s the right size. Some models are tens of gigabytes; some are hundreds of thousands of bytes. The larger ones, of course, have far more information and detail in them, but require a lot more processing power to use.
- How to write code that uses the model. There are many libraries out there, but one important one is a library simply called Transformers that was created by the researchers at Hugging Face. There are others you can search for; ChatterBot is a popular one as well.
Level 3: Training a Model with Your Data for Searching
At this level, you’ll start with a pre-trained model and add your own data to the AI system, such as your company’s knowledge base. Remember, the models first have to be trained so they can understand human language. That means you don’t just start with an empty model and drop in your 100 pages of text and expect the chat system to be up and running; you start with a model that is trained with the level of language you need.
Larger models can deal with complex language and nuances, whereas smaller models aren’t nearly as sophisticated. Then, you’ll need to use various code libraries that will allow your app to read in your own knowledge base so that your users can perform searches. With the help of the LLM, your users can search on words that might be similar to those found in the documents.
In this case, you’ll need to understand what LLMs are and how they’re used, as well as how to put them to work for word searches. Then you’ll need to use different libraries to simplify the process of loading in the knowledge base articles, indexing them, and allowing your users to perform searches on words found in them and words similar to those found in them. The technology here is called vector similarity searches. (The reason is the code uses vector mathematics to determine if two words are similar.)
For this you’ll want to learn how to use the aforementioned Transformers library from Hugging Face. You’ll also want to look into Chroma, which is quite useful. Others you might consider are Pinecone, Weaviate, Vespa, and FAISS. With these, you can load in the LLM, load in the documents, and then easily pass string searches in and get back a list of relevant documents.
This is when things start to get exciting and a bit more difficult. Study the sample code in the documentation for the libraries and get it to run. Then try adding on a few more features to the samples. Next, try adding code to your own apps. Make sure you fully understand the code you’re writing, and not just copying the samples and getting it to run. That way, you’ll master the use of the framework, and could even land a new job using the technology.
Level 4: Q&A from Your Knowledge Base
While being able to search based on word similarity is one thing, answering questions from the content of the knowledge base is definitely a next-level challenge. Fortunately, it’s not a huge leap from just performing searches; the key is to use the right library.
As before, don’t just drop the library in and run with it. Understand what the frameworks and libraries are doing, and how they’re ingesting and storing the documents. The technology here is known as Retrieval-Augmented Generation (RAG). RAG provides two steps: retrieval, where it finds the information needed, and generation, where it generates a human-readable response.
Once again, the Transformers library from Hugging Face can help here. Other great libraries are LangChain and Haystack. And again, try out the samples and understand what they’re doing and how they work before adding such features to your own apps. Study the documentation and become fluent in the objects and methods provided. Now you’re becoming a serious AI practitioner, and this can look great on your resume.
Then take it to the next level; make sure you understand the limitations of the system. Do lots of testing. Are the generated answers accurate? For example, if a coworker asks how to put in for time off, does the generated answer truly match what is given in the employee handbook, or is it reworded to the point that it’s no longer accurate? A wrong answer can have serious implications for your coworkers!
Conclusion: Level 5, Mastery
Although we’re not going to talk about how to build transformers and create LLMs from scratch, we do encourage you to learn as much about LLM technology as you can. One great way to learn this is to take a short course on how LLMs operate; from there, you can decide whether you want to devote an enormous amount of time and effort on learning every detail of the technology. This is challenging, but could potentially pay off with lots of job opportunities as more companies wholeheartedly embrace everything AI-related.