회의(대화) 음성 녹음을 텍스트로 변환(STT)하는 방법

프로그램 (PHP,Python)

회의(대화) 음성 녹음을 텍스트로 변환(STT)하는 방법

날으는물고기 2024. 8. 15. 00:55

음성 녹음을 텍스트로 변환(TTS: Text-to-Speech의 반대인 STT: Speech-to-Text)을 하는 방법에는 여러 가지가 있습니다. 아래에는 이를 구현하는 몇 가지 방법을 설명합니다.

방법 1: Google Cloud Speech-to-Text API 사용

Google Cloud의 Speech-to-Text API는 매우 높은 정확도의 음성 인식 기능을 제공합니다.

Google Cloud Console에서 프로젝트를 생성합니다.
Speech-to-Text API를 활성화합니다.
서비스 계정을 생성하고 JSON 형식의 키 파일을 다운로드합니다.

코드 예시 (Python)

import os
from google.cloud import speech

# Google Cloud 인증 설정
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "path_to_your_service_account.json"

def transcribe_speech(audio_file_path):
    client = speech.SpeechClient()

    with open(audio_file_path, "rb") as audio_file:
        content = audio_file.read()

    audio = speech.RecognitionAudio(content=content)
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=16000,
        language_code="ko-KR",
    )

    response = client.recognize(config=config, audio=audio)

    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")

# 파일 경로 지정
audio_file_path = "path_to_your_audio_file.wav"
transcribe_speech(audio_file_path)

방법 2: Python의 SpeechRecognition 라이브러리 사용

SpeechRecognition 라이브러리는 Google Web Speech API, CMU Sphinx, Microsoft Bing Voice Recognition 등 여러 음성 인식 API를 사용할 수 있습니다.

설치

pip install SpeechRecognition pydub

코드 예시 (Python)

import speech_recognition as sr
from pydub import AudioSegment

def convert_audio_to_wav(input_file, output_file):
    audio = AudioSegment.from_file(input_file)
    audio.export(output_file, format="wav")

def recognize_speech_from_audio(audio_file):
    recognizer = sr.Recognizer()
    audio_file = sr.AudioFile(audio_file)

    with audio_file as source:
        audio_data = recognizer.record(source)

    try:
        text = recognizer.recognize_google(audio_data, language="ko-KR")
        print(f"Transcript: {text}")
    except sr.UnknownValueError:
        print("Google Speech Recognition could not understand audio")
    except sr.RequestError as e:
        print(f"Could not request results from Google Speech Recognition service; {e}")

# 입력 파일과 출력 파일 경로 지정
input_file = "path_to_your_audio_file.mp3"
output_file = "output_audio_file.wav"

# 변환 및 인식 실행
convert_audio_to_wav(input_file, output_file)
recognize_speech_from_audio(output_file)

방법 3: OpenAI의 Whisper 모델 사용

OpenAI에서 개발한 Whisper 모델은 다양한 언어를 지원하는 강력한 STT 모델입니다.

설치

pip install git+https://github.com/openai/whisper.git
pip install torch

코드 예시 (Python)

import whisper

def transcribe_audio(audio_file):
    model = whisper.load_model("base")
    result = model.transcribe(audio_file)
    print(f"Transcript: {result['text']}")

# 파일 경로 지정
audio_file_path = "path_to_your_audio_file.mp3"
transcribe_audio(audio_file_path)

위의 방법 중 하나를 선택하여 음성 파일을 텍스트로 변환할 수 있습니다. n8n을 사용하여 Google Drive 특정 디렉토리에 파일이 등록되면 이를 자동으로 감지하고, 해당 파일을 가져와서 STT(Speech-to-Text) 변환하는 자동화 워크플로우를 구성할 수 있습니다. 이 워크플로우는 Google Drive, HTTP 요청, 그리고 Python 스크립트를 사용하여 STT 변환을 수행합니다.

단계별 구성 방법

1. Google Drive 트리거 설정

Google Drive에 파일이 업로드되면 이를 감지하는 트리거를 설정합니다.

Google Drive Trigger 노드 추가: n8n에서 Google Drive 트리거 노드를 추가합니다.
인증 설정: Google Drive API 인증을 설정합니다.
폴더 선택: 감지할 특정 디렉토리를 선택합니다.

2. 파일 다운로드 노드 설정

Google Drive 트리거가 감지한 파일을 다운로드합니다.

HTTP Request 노드 추가: HTTP Request 노드를 추가하여 파일을 다운로드합니다.
파일 URL 설정: 트리거에서 받은 파일 URL을 사용하여 파일을 다운로드합니다.

3. 파일 변환 및 STT 수행

Python 스크립트를 사용하여 음성 파일을 텍스트로 변환합니다.

Execute Command 노드 추가: Python 스크립트를 실행할 Execute Command 노드를 추가합니다.
Python 스크립트 작성: 아래 예시와 같이 Python 스크립트를 작성하여 음성 파일을 텍스트로 변환합니다.

Python 스크립트 예시

import os
import requests
from google.cloud import speech

# Google Cloud 인증 설정
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "path_to_your_service_account.json"

def download_file(url, local_filename):
    with requests.get(url, stream=True) as r:
        r.raise_for_status()
        with open(local_filename, 'wb') as f:
            for chunk in r.iter_content(chunk_size=8192):
                f.write(chunk)
    return local_filename

def transcribe_speech(audio_file_path):
    client = speech.SpeechClient()
    with open(audio_file_path, "rb") as audio_file:
        content = audio_file.read()
    audio = speech.RecognitionAudio(content=content)
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=16000,
        language_code="ko-KR",
    )
    response = client.recognize(config=config, audio=audio)
    transcript = ""
    for result in response.results:
        transcript += result.alternatives[0].transcript + "\n"
    return transcript

# 파일 다운로드
file_url = "URL_FROM_N8N"
local_file = download_file(file_url, "downloaded_file.wav")

# STT 변환
transcript = transcribe_speech(local_file)
print(transcript)

4. 결과 처리 노드 설정

STT 변환 결과를 처리합니다.

Set 노드 추가: Execute Command 노드의 출력을 설정합니다.
결과 저장: 변환된 텍스트를 저장하거나 다른 워크플로우로 전달합니다.

위 단계를 통해 Google Drive에 파일이 업로드되면 이를 자동으로 감지하고, 파일을 다운로드하여 STT 변환을 수행하는 워크플로우를 구성할 수 있습니다. 각 단계에 대한 구체적인 설정은 n8n의 인터페이스를 참고하여 설정하세요.

728x90

그리드형(광고전용)

저작자표시 비영리 동일조건 (새창열림)