NotesAI - Automatic Notetaker using Large Language Models

GitHub Back to Homepage

Overview

Notes AI demo

Notes AI is an application written in Python that can listen to audio, transcribe it into text using OpenAI's whisper model, and summarize that transcription using Llama. To get access to Llama, you need to have an api key, which is free on Groq (this will be discussed below). The purpose of this project is to give students in classroom's better learning experiences by digesting the material instead of scrambling to take notes in real time.

Recording

Recording audio is acheived by using Python's sound device library. This accesses the host machine's microphone and allows for easy interaction with audio devices. This real time audio input/output operation allows for a lightweight and flexible way to record and save audio. The audio is recorded in chunks to allow for stopping the recording at anytime. After the audio is processed as a numpy array, it is saved to a .wav file, then to a .mp3 file to be used later on in the transcription process.

Transcribing Audio to Text

After the audio is recorded and in the .mp3 format, OpenAI's Whisper model uses Artificial Intelligence to transcribe the audio into text. Simply specify the model, which in our case is whisper-large-v3, then read the file using this model. This will output a json format of text, which is then saved into a text file to later view the transcription.

Summarizing the Transcription

To use a large language model to summarize the transcription created above, we first need an API key. Llama's api key can be created here. Once created, after cloning the repository, the key needs to be exported so the code recognizes it. To achieve this, you can run export YOUR_API_KEY_HERE. The model is specified in the code as llama3-8b-8192, and the summarization comes from the content passed into the language model (along with the transcribed text), which reads Please summarize the following text into a concise and organized notes format suitable for studying. After all of this initial setup, the text is moved to another text file (in markdown format) in a similar way that the transcribed audio is.

GUI

The GUI is not ideal for this project, and the future goal is to make this a website, where any user can retireve an api key and use this application in their own browser. In this way, the project is still in progress, but core funcitonality is fully operational.