Summer 2024 Intern Project Recap
Introduction
This summer three interns joined my team at our Rosslyn office to explore diverse use cases for large language models (LLMs). We came up with ambitious projects for the 10 weeks they were here with us and they exceeded our lofty expectations by building prototypes for the following use cases:
- Quiz generator and tutor, to enable people to learn about any subject at any time
- Subject Matter Expert (SME) chat, which enabled us to develop new methods for automating prompt construction
- A system for simulating how a person of a given background would navigate a digital twin of a city and react to events
Through these projects we learned a lot about how LLMs work and what needs to be done to get the most out of them. We are already applying this knowledge to new projects to get significantly better results out of LLMs while lowering the barrier for anyone to use them.
In this blog post I’ll provide more details on each of the projects and then wrap up with some of what we learned.
Quiz Generator and Tutor
One of our interns developed a prototype of an AI-enhanced educational platform designed to provide personalized learning experiences. It creates dynamic quizzes on any topic based on uploaded documents, enabling users to learn about a broader range of topics (e.g. maintenance manuals for specific engines or smart meters). The platform then provides feedback on incorrect responses and offers additional opportunities to learn. The system adapts to individual student performance, with the ability to adjust difficulty levels and focus areas to ensure optimal learning outcomes. By providing immediate, context-specific feedback and curating supplementary materials, it goes beyond traditional learning platforms, actively participating in the learning process. This innovative approach enables cost-effective access to personalized tutoring on any source document.
SME Chat
Another intern developed an AI-powered platform that provides on-demand access to diverse, simulated Subject Matter Expert (SME) perspectives. This was a project that I had wanted to tackle for months prior to the internship, when I had the thought that it would be useful to be able to get the perspective of different SMEs on a given topic. However, SME time is valuable and in short supply, which led me to wonder: could we provide helpful feedback, on-demand, from LLM-based experts. We didn’t expect to achieve the same quality level as human SMEs, but we wanted to see what would be possible.
Users can create “experts” through a simple UI and, with some behind-the-scenes prompt engineering the system creates the expert. The prototype simulates conversations with one or multiple of these experts at the same time, offers 24/7 availability, and incorporates knowledge from source documents. We believe this project is the start of systems that enable enhanced decision-making in complex scenarios, improved wargaming and red-teaming exercises for national security, and cost-effective access to a wide range of expertise. I even used it myself to prep for a Board meeting!
Simulating Real World Behavior
Multiple studies have shown that LLMs can accurately reflect human opinion in many situations. To explore this idea one of our interns created a prototype of a cost-effective system for simulating and predicting human behavior across diverse real-world environments. Users can create digital avatars reflecting a variety of backgrounds and deploy them to digital representations of cities. Users can then inject events into the system and observe how different people, represented by the avatars, might react.
What We Learned
It’s amazing what the students were able to accomplish in just 10 weeks! As many times as I’ve run intern programs I continue to be impressed at how much they can get done when focusing on a single project for the summer. While we think each of these prototypes could help our customers if they were further built out, there are individual components that are widely applicable to a range of LLM-based solutions:
- Prompting matters, but every employee shouldn’t have to be a prompt engineer. People who work regularly with LLMS know that better prompts lead to better outputs, but learning how to prompt well is a skill that takes time and effort. So for our prototypes we took time to craft high-quality prompts that were used behind-the-scenes to get good output for users.
- LLMs are more capable than people think, but it takes work. While I use LLMs nearly every day for quick and simple tasks, if you want to roll out applications to a range of users to support more sophisticated use cases it takes some work. It’s not just the prompting, but everything else you need to consider in a typical software project, such as the user experience and data management.
- LLMs benefit from examples. Just as with people, if you provide the model examples of what “good” and “bad” outputs look like it’ll perform better.
I had hoped we’d be able to see how the prototypes worked when the next generation of models (GPT-5 generation) is released, but unfortunately we are still waiting on the release of those models. I can’t wait to see how these prototypes work when we switch a single line of code to point to a more powerful model.
Note: This was written prior to the release of OpenAI o1.