ResourcesMusic TranscriptionAndrew Carlins7 min read

Can Gemini or Claude Turn a Song Into Sheet Music?

People are asking Gemini and Claude to turn recordings into sheet music, the way they once asked ChatGPT. Here is what these AI assistants can and cannot do with audio, and what actually produces accurate notation.

Whether Google Gemini or Anthropic Claude can turn a song into sheet music, what AI assistants can do with audio, and what works instead

Part of our guide to choosing music transcription tools.

Not on their own. Neither Gemini nor Claude can listen to a recording and produce accurate sheet music, and neither can read a printed score note for note. Both are far more capable than a plain text chatbot now: Gemini accepts audio and images, and Claude accepts images and text, so you can hand them more than words. But they are tuned for understanding, that is, transcribing speech, describing a picture, reasoning over text, not for the signal processing that music notation demands. Ask either one to transcribe a song and you get a fluent, confident answer full of notes that were never played.

That does not make them useless for music. Both are strong on the text side of it: explaining theory, talking through notation you paste in, suggesting a progression, sketching a one-line melody, building a practice plan. The skill is knowing which half of the job is language (where they help) and which half is audio-to-notation (where you need a dedicated tool), and reaching for the right one for each.

What Gemini and Claude Can Do With Music

Both models have grown well past text-only. Gemini is natively multimodal: you can upload an audio file and it will transcribe speech, summarize a recording, tell who is talking, and answer questions about what was said. It also takes images. Claude takes text and images, and reads a photo or a document you give it. So if you show either one a picture of a score, it can look at it and respond. The honest question is not whether they can see or hear the file, it is whether they can turn it into correct notation, and that is a narrower skill.

Inside the boundary of text and theory, both are helpful:

  • Explain theory. Ask why a progression resolves the way it does, what a tritone substitution is, or how to harmonize a minor scale, and you get a clear, mostly reliable explanation.
  • Discuss notation you paste as text. Hand it ABC notation, a chord chart, or a passage described in words, and it can name the chords, explain what is happening, and suggest changes.
  • Suggest a chord progression. Describe a mood or a key and it will propose progressions, reharmonizations, or a bass line to try.
  • Sketch a single-line melody. When you describe a tune in words, it can write it out in a text notation like ABC, which you can paste into a notation editor.

The common thread: every one of those starts from text or a description and the useful output is text. The model is reasoning about musical symbols, not converting sound or a printed page into exact notes. The moment the job is “here is a recording, give me the score,” both are out of their depth.

Why They Can't Transcribe a Recording

Turning audio into notation is a signal-processing problem, and a hard one. You have to detect the pitch of every note, find exactly when each note starts and stops, separate notes that overlap (the polyphony in a chord or a full mix), quantize the timing into a readable rhythm, and only then lay it on a staff with the right key and time signature. Each of those is a specialized model trained on audio. Gemini's audio understanding is built for speech, the words, the speaker, the meaning, not for pulling apart simultaneous pitches and onsets, and Claude does not take an audio file in the first place, so the only way it hears a song is through a separate transcription step.

Reading a printed score is its own version of the same trap. When researchers benchmark this directly, models can recognize the symbols on the page with very high accuracy and still convert them into the wrong notes most of the time, falling back on even quarter notes and scale-shaped runs because their guesses lean on text patterns rather than the actual music in front of them. So even when a model “sees” the page clearly, the pitches and rhythms it reports drift away from what is written.

When you ask anyway, both do what language models do under uncertainty: they produce a fluent, confident answer that looks like notation and is full of notes nobody played. The output reads as authoritative, which makes the errors harder to catch than an obvious blank. If you want the full picture of what real transcription involves, our explainer on how AI music transcription works walks through each stage, and monophonic vs polyphonic transcription covers why overlapping notes are the part that breaks generic models.

What to Use Instead

For the audio-to-notation step, use a tool built for exactly that. The flow with Songscription is straightforward: upload an audio file, or paste a link to a video, and the model detects the notes and gives you editable sheet music along with MIDI, MusicXML, Guitar Pro tabs, and an interactive piano roll you can slow down to check the result. There is no manual note entry, and you can fix notes in the browser before you export anything.

A few features matter once you are past the first pass. It splits a piano part into two hands automatically, treble and bass, instead of one undifferentiated stream. It can isolate one instrument out of a full mix, so a busy track works best transcribed one part at a time. You can transpose to any key in the built-in editor, and an arrangement mode can simplify a dense piece to an easier level or write a single melodic line for an instrument that was not in the original. Where Gemini and Claude give you a paragraph about a song, this gives you a file you can play from and keep editing.

Be realistic about accuracy, because it is not magic either. Clean solo material, a single piano, a solo guitar, one vocal line, comes out far more accurate than a dense full-band mix where many instruments overlap at once. Automatic transcription gets you most of the way fast; you still scan the result and fix the spots it got wrong, and for heavy engraving you export MusicXML and finish in a notation editor like MuseScore. Why some songs transcribe cleanly and others fight you is its own topic, covered in why AI transcription accuracy varies.

Where Gemini and Claude Still Help

Once you have the notation in hand, both models go back to being useful, because now the work is text and reasoning again. With a score, a chord chart, or the MIDI-derived chords already in front of you, either one can:

  • Explain a tricky chord. Paste the symbol and ask what the notes are, how it functions, and what scale fits over it.
  • Suggest fingering ideas. Describe a passage and your instrument and ask for a sensible fingering or hand position to try.
  • Build a practice plan. Tell it the piece, your level, and how much time you have, and it will lay out a structured way to work through it.
  • Turn a melody you can name into text notation. If you can already describe or sing the tune, it will write it out in ABC for you to drop into an editor.

The principle is simple: use each tool for what it is good at. A transcription model turns sound into notes; Gemini and Claude help you understand and practice those notes once you have them. Pair them and you get the recording into notation, then get help learning to play it. For more on what an AI can and cannot tell you about a piece, see whether AI can tell you how to play a song, and for the closely related question about OpenAI's model, see whether ChatGPT can make sheet music.

Frequently Asked Questions

Can Gemini read sheet music?

Only loosely. Gemini accepts images, so it can look at a photo or PDF of a score and describe it in general terms, name the key, or recognize that a passage is busy. What it cannot do reliably is read the notes off the page one by one and hand you back the correct pitches and rhythms. Benchmarks of this exact task show models that recognize the symbols well still get the actual note conversion wrong most of the time, defaulting to even quarter notes and scale-like runs. Treat its reading of a printed score as a rough summary, not an accurate transcription.

Can Claude transcribe a song into notation?

No, not from a recording. Claude works with text and images, so it can discuss notation you paste as text, such as ABC or a chord chart, explain harmony, and sketch a simple one-line melody you describe in words. It does not take an audio file and turn it into an accurate score. Converting sound into notes is a signal-processing problem that needs pitch detection, onset timing, and polyphony separation, which is what a purpose-built transcriber does and a chat model does not.

Can Gemini or Claude make a MIDI file from audio?

Not from a recording, no. Neither model performs the audio-to-note detection that a MIDI file requires, so asking either one to listen to a song and output MIDI gives you confident, wrong data. If you describe a melody in words, a model might write out a short text representation you could convert, but for an actual MIDI of a real recording you want a dedicated audio-to-MIDI tool that detects the notes and exports the file directly.

What is the best way to turn a song into sheet music?

Use a purpose-built AI transcription tool that takes an audio file or a link and outputs editable notation, MIDI, and a piano roll, then clean up the result by hand. Automatic transcription gets you most of the way quickly, and clean solo material comes out more accurate than a dense full-band mix. For final engraving, export MusicXML and finish in a notation editor.

Ready to turn a recording into a score? Start at audio to sheet music, upload the song or paste a link, and edit the result in the browser before you export.

About the author

Andrew Carlins

Written by

Andrew Carlins

Co-Founder & CEO, Songscription

Andrew co-founded Songscription at Stanford with a few fellow musicians who were tired of not finding the notes to the songs they wanted to play. He grew up playing piano and baritone saxophone and performing in musical theater, and though he hasn't performed in years, he likes to think he's still pretty sharp. He writes about getting a song off the recording and onto the page.

More about the team

Keep exploring more posts on the same topics.