Skip to content

A minimal, easy-to-read PyTorch reimplementation of the Qwen2 series—without the complexity of larger frameworks.

Notifications You must be signed in to change notification settings

Emericen/tiny-qwen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

 English  | 中文

✨ Tiny Qwen

A minimal, easy-to-read PyTorch re-implementation of Qwen2 and Qwen2.5, the open source multi-modal LLM.

If you find Transformers code verbose and challenging to interprete, this repo is for you! Inspired by nanoGPT and litGPT, it supports text-only versions (Instruct, Coder, Math, etc.) and text+vision versions (VL). It also supports all full prevision Qwen2+ model at any size. Just choose a repo id from Hugging Face Hub here.

Keep in mind you'll likely need multiple GPU for models bigger than 32B. Stay tuned for FSDP support in the coming days. If you run into any issues, open a PR or create an issue.

Interested in building vision-based AI Agents?

I’m passionate about automating computer use to free up human labor and would love to collaborate with like-minded people. If this sound like you, please don't hesitate to reach out to me 🤗 (my bio)!

🦋 Quick Start

I recommend installing torch with cuda enabled (see here). After that, simply run:

pip install -r requirements.txt

You can use the code base like the following:

from models.model import Qwen2, Qwen2VL
from models.processor import Processor
from PIL import Image

# text-only models
model_name = "Qwen/Qwen2.5-3B"
model = Qwen2.from_pretrained(repo_id=model_name, device_map="auto")
processor = Processor(repo_id=model_name)

context = [
    "<|im_start|>user\nwhat is the meaning of life?<|im_end|>\n<|im_start|>assistant\n"
]
inputs = processor(context, device="cuda")
output = model.generate(input_ids=inputs["input_ids"], max_new_tokens=64)
output_text = processor.tokenizer.decode(output[0].tolist())

# text + vision models
model_name = "Qwen/Qwen2-VL-2B-Instruct"
model = Qwen2VL.from_pretrained(repo_id=model_name, device_map="auto")
processor = Processor(
    repo_id=model_name,
    vision_config=model.config.vision_config,
)

context = [
    "<|im_start|>user\n<|vision_start|>",
    Image.open("images/test-image.jpeg"),
    "<|vision_end|>What's on this image?<|im_end|>\n<|im_start|>assistant\n",
]
inputs = processor(context, device="cuda")
output = model.generate(
    input_ids=inputs["input_ids"],
    pixels=inputs["pixels"],
    d_image=inputs["d_image"],
    max_new_tokens=64,
)
output_text = processor.tokenizer.decode(output[0].tolist())

🛠️ Fine-tune / Post-train Your Own

See train/train_sft.py for simple SFT example. Any library compatible with torch.nn.Module wourld work, but here I used PyTorch Lightning for its flexibility and simplcity. Also see train/train_mnist.py for inspiration on how to use this library.

To run any of the training scripts, just run:

PYTHONPATH=. python train/train_mnist.py

or

PYTHONPATH=. python train/train_sft.py

About

A minimal, easy-to-read PyTorch reimplementation of the Qwen2 series—without the complexity of larger frameworks.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages