#

speech-representation

Here are 17 public repositories matching this topic...

s3prl

s3prl / s3prl

Self-Supervised Speech Pre-training and Representation Learning Toolkit

Updated Jun 13, 2025
Python

jishengpeng / WavTokenizer

[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling

semantic text-to-speech codec acoustic dac speech-representation audio-representation encodec soundstream music-representation-learning gpt4o speech-language-model

Updated Mar 2, 2025
Python

ddlBoJack / emotion2vec

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

speech-emotion-recognition pytorch-implementation iemocap speech-representation

Updated Dec 23, 2024
Python

jishengpeng / WavChat

A Survey of Spoken Dialogue Models (60 pages)

streaming duplex speech moshi speech-representation encodec gpt-4o speech-language-model spoken-dialogue-models modal-alignment intreaction mini-omni llama-omni wavtokenizer

Updated Nov 28, 2024

QiangChunyu / SecoustiCodec

Ultra-low bitrate speech codec (0.27-1 kbps) with cross-modal alignment and real-time capabilities

semantic speech vae codec cross-modal speaker fsq speech-codec speech-representation contrastive-learning single-codebook

Updated Aug 27, 2025
Python

Ereboas / MagiCodec

A single-layer, streaming codec model providing SOTA audio quality and discrete tokens designed for superior downstream modelability.

text-to-speech pytorch tts codec speech-representation llm llms speech-language-model

Updated Jun 4, 2025
Python

OpenMOSS / MOSS-Audio-Tokenizer

MOSS-Audio-Tokenizer is a Causal Transformer-based audio tokenizer built on the CAT architecture. Trained on 3M hours of diverse audio, it supports streaming and variable bitrates, delivering SOTA reconstruction and strong performance in generation and understanding—serving as a unified interface for next-generation native audio language models.

audio music tokenizer speech tts unified speech-representation

Updated Feb 13, 2026
Python

Soul-AILab / SAC

Trainging, inference, and testing of the SAC speech codec model.

semantic codec acoustic speech-reconstruction speech-representation speech-disentanglement

Updated Nov 1, 2025
Python

gyt1145028706 / XY-Tokenizer

This is the code for paper: XY-Tokenizer: Mitigating the Semantic-Acoustic Conflict in Low-Bitrate Speech Codecs

autoencoder automatic-speech-recognition speech-representation speech-tokenizer speech-language-models

Updated Sep 19, 2025
Python

mechanicalsea / lighthubert

LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

pytorch neural-architecture-search self-supervised-learning speech-representation lighthubert

Updated Sep 26, 2022
Python

ryota-komatsu / slp2025

Survey of audio language models

speech speech-processing speech-representation multimodal-large-language-models speech-language-model

Updated Feb 4, 2026
Jupyter Notebook

andi611 / Mockingjay-Speech-Representation

Official Implementation of Mockingjay in Pytorch

speech pytorch feature-extraction representation-learning speaker-recognition apc sentiment-classification mockingjay pytorch-implementation phoneme-prediction speech-representation phone-classification speaker-classification

Updated Jul 6, 2023
Python

MiniASR

vectominist / MiniASR

A mini, simple, and fast end-to-end automatic speech recognition toolkit.

minimal pytorch speech-recognition asr ctc fairseq speech-representation hubert wav2vec2 s3prl

Updated Dec 6, 2022
Jupyter Notebook

ZhangXinWhut / SimWhisper-Codec

Official code for paper:"Speaking Clearly: A Simplified Whisper-Based Codec for Low-Bitrate Speech Coding"

semantic autoencoder codec acoustic speech-reconstruction speech-representation speech-tokenizer

Updated Jan 28, 2026
Python

seorim0 / SE-using-SRL-Model

Causal Speech Enhancement Based on a Two-Branch Nested U-Net Architecture Using Self-Supervised Speech Embeddings

python deep-neural-networks deep-learning pytorch noise-reduction speech-enhancement self-supervised-learning speech-representation nested-unet speech-restoration icassp2025

Updated Jun 6, 2025
Python

bshall / dusted

DUSTED: Spoken-Term Discovery using Discrete Speech Units

zerospeech speech-representation spoken-term-discovery

Updated Oct 2, 2024
Jupyter Notebook

jefflai108 / Semi-Supervsied-Spoken-Language-Understanding-PyTorch

Semi-supervised spoken language understanding (SLU) via self-supervised speech and language model pretraining

speech-recognition semi-supervised-learning spoken-language-understanding speech-representation

Updated Mar 23, 2021
Python

Improve this page

Add a description, image, and links to the speech-representation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the speech-representation topic, visit your repo's landing page and select "manage topics."