+2

T矛m hi峄僽 "TinyLlama: An Open-Source Small Language Model"馃

TinyLlama l脿 m么 h矛nh ng么n ng峄 1,1B nh峄 g峄峮, 膽瓢峄 膽脿o t岷 tr瓢峄沜 tr锚n 1 ngh矛n t峄 token. X芒y d峄眓g tr锚n Llama 2, n贸 t岷璶 d峄g nh峄痭g ti岷縩 b峄 c峄 c峄檔g 膽峄搉g ngu峄搉 m峄 nh瓢 FlashAttention 膽峄 c贸 hi峄噓 qu岷 t铆nh to谩n t峄憈 h啤n. M岷穋 d霉 c贸 k铆ch th瓢峄沜 nh峄 nh瓢ng TinyLlama v岷玭 v瓢峄 tr峄檌 so v峄沬 c谩c m么 h矛nh ngu峄搉 m峄 hi峄噉 c贸.

Gi峄沬 thi峄噓

Theo nh峄痭g g矛 m矛nh 膽茫 th岷, trong l末nh v峄眂 x峄 l媒 ng么n ng峄 t峄 nhi锚n (NLP), m峄峣 ng瓢峄漣 c贸 xu h瓢峄沶g m峄 r峄檔g quy m么 m么 h矛nh ng么n ng峄. B瓢峄沜 n脿y 膽瓢峄 th煤c 膽岷﹜ b峄焛 th峄眂 t岷 l脿 c谩c M么 h矛nh ng么n ng峄 l峄沶 (LLM) 膽瓢峄 膽脿o t岷 tr瓢峄沜 tr锚n kho v膬n b岷 l峄沶 膽茫 th峄 hi峄噉 th脿nh c么ng trong nhi峄乽 nhi峄噈 v峄 kh谩c nhau. Tuy nhi锚n, 膽峄 m峄檛 m么 h矛nh c贸 th峄 膽瓢峄 ph谩t tri峄僴 r峄檔g r茫i v脿 峄﹏g d峄g m岷h m岷 (膽岷穋 bi峄噒 l脿 t铆ch h峄 tr锚n c谩c thi岷縯 b峄 ph岷 c峄﹏g) th矛 k铆ch th瓢峄沜 c峄 n贸 ph岷 nh峄, nh岷 v脿 膽岷 b岷 膽峄 ch铆nh x谩c t瓢啤ng 膽峄慽.

image.png

C贸 hai 膽i峄乽 c岷 c芒n nh岷痗 v峄 lu岷璽 chia t峄 l峄 膽瓢峄 tr矛nh b脿y trong [1] khi岷縩 m矛nh c芒n nh岷痗 vi峄嘽 膽脿o t岷 m峄檛 m么 h矛nh nh峄 h啤n v峄沬 t岷璸 d峄 li峄噓 l峄沶 h啤n:

  • C谩c m么 h矛nh nh峄 h啤n c贸 th峄 ho岷 膽峄檔g t峄憈 nh瓢 c谩c m么 h矛nh l峄沶 h啤n khi 膽瓢峄 膽脿o t岷 v峄沬 nhi峄乽 d峄 li峄噓 h啤n [3].
  • C谩c nguy锚n t岷痗 m峄 r峄檔g quy m么 hi峄噉 t岷 c贸 th峄 kh么ng d峄 膽o谩n 膽瓢峄 k岷縯 qu岷 ch铆nh x谩c khi c谩c m么 h矛nh nh峄 h啤n 膽瓢峄 hu岷 luy峄噉 trong th峄漣 gian d脿i h啤n [4].

TinyLlama 膽瓢峄 th煤c 膽岷﹜ b峄焛 nh峄痭g kh谩m ph谩 n脿y v脿 t岷璸 trung v脿o vi峄嘽 膽i峄乽 tra h脿nh vi c峄 c谩c m么 h矛nh nh峄 h啤n khi 膽瓢峄 膽脿o t岷 v峄沬 nhi峄乽 m茫 th么ng b谩o h啤n 膽谩ng k峄 so v峄沬 quy t岷痗 chia t峄 l峄 g峄 媒. 膼贸 l脿 m么 h矛nh ng么n ng峄 1,1 t峄 t峄 膽茫 膽瓢峄 膽脿o t岷 tr瓢峄沜 tr锚n 1 ngh矛n t峄 m茫 th么ng b谩o trong ba k峄 nguy锚n. X芒y d峄眓g tr锚n Llama 2, n贸 t岷璶 d峄g c谩c c岷 ti岷縩 c峄 c峄檔g 膽峄搉g ngu峄搉 m峄 nh瓢 FlashAttention 膽峄 c岷 thi峄噉 hi峄噓 qu岷 t铆nh to谩n. M岷穋 d霉 c贸 k铆ch th瓢峄沜 khi锚m t峄憂 nh瓢ng n贸 ho岷 膽峄檔g t峄憈 h啤n c谩c m么 h矛nh ngu峄搉 m峄 kh谩c.

膼贸ng g贸p

  • Gi峄沬 thi峄噓 TinyLlama, m么 h矛nh ng么n ng峄 quy m么 nh峄, m茫 ngu峄搉 m峄.
  • 膼峄 t膬ng t铆nh c峄焛 m峄 trong c峄檔g 膽峄搉g 膽脿o t岷 tr瓢峄沜 LLM ngu峄搉 m峄, c谩c t谩c gi岷 膽茫 cung c岷 t岷 c岷 th么ng tin c岷 thi岷縯, bao g峄搈 m茫 膽脿o t岷 tr瓢峄沜, 膽i峄僲 ki峄僲 tra m么 h矛nh trung gian v脿 quy tr矛nh x峄 l媒 d峄 li峄噓.
  • Ki岷縩 tr煤c nh峄 v脿 hi峄噓 su岷 膽岷 h峄゛ h岷筺 c峄 TinyLlama khi岷縩 n贸 tr峄 n锚n l媒 t瓢峄焠g cho c谩c 峄﹏g d峄g di 膽峄檔g 膽峄 th峄 nghi峄噈 c谩c m么 h矛nh ng么n ng峄 m峄沬.

Source code c峄 TinyLlama c贸 th峄 t矛m th岷 峄 膽芒y: https://github.com/jzhang38/TinyLlama?source=post_page-----66e747f181fb--------------------------------

Th铆 nghi峄噈

B峄 d峄 li峄噓

M峄 ti锚u c峄 TinyLlama l脿 l脿m cho qu谩 tr矛nh 膽脿o t岷 tr瓢峄沜 tr峄 n锚n hi峄噓 qu岷 v脿 c贸 th峄 l岷穚 l岷. T谩c gi岷 s峄 d峄g k岷縯 h峄 ng么n ng峄 t峄 nhi锚n v脿 d峄 li峄噓 m茫 膽峄 膽脿o t岷 tr瓢峄沜. C谩c b峄 d峄 li峄噓 m脿 h峄 s峄 d峄g l脿:

  • SlimPajama: t岷璸 d峄 li峄噓 n脿y 膽瓢峄 t岷 ra b岷眓g c谩ch tr岷 nghi峄噈 qu谩 tr矛nh l脿m s岷h t岷璸 d峄 li峄噓 t峄 ti锚n c峄 n贸, c峄 th峄 l脿 RedPajama. Kho d峄 li峄噓 RedPajama ban 膽岷 l脿 m峄檛 n峄 l峄眂 nghi锚n c峄﹗ ngu峄搉 m峄 nh岷眒 m峄 膽铆ch t谩i t岷 d峄 li峄噓 膽脿o t岷 tr瓢峄沜 c峄 Llama ch峄゛ h啤n 1,2 ngh矛n t峄 m茫 th么ng b谩o.
  • Starcodedata: T岷璸 d峄 li峄噓 n脿y bao g峄搈 kho岷g 250 t峄 m茫 th么ng b谩o t峄 86 ng么n ng峄 l岷璸 tr矛nh. Ngo脿i m茫, n贸 c貌n ch峄゛ c谩c v岷 膽峄 v峄 GitHub v脿 c谩c c岷穚 m茫 v膬n b岷 s峄 d峄g ng么n ng峄 t峄 nhi锚n.

Sau khi t铆ch h峄 hai t岷璸 膽o脿n n脿y, t岷璸 d峄 li峄噓 hu岷 luy峄噉 ch峄゛ t峄昻g c峄檔g kho岷g 950 t峄 m茫 th么ng b谩o tr瓢峄沜 khi 膽脿o t岷. TinyLlama 膽瓢峄 膽脿o t岷 v峄 c谩c token n脿y trong kho岷g ba k峄 nguy锚n (3 epochs). Trong qu谩 tr矛nh 膽脿o t岷, t谩c gi岷 l岷 m岷玼 d峄 li峄噓 ng么n ng峄 t峄 nhi锚n 膽峄 膽岷 膽瓢峄 t峄 l峄 d峄 li峄噓 ng么n ng峄-m茫 t峄 nhi锚n kho岷g 7:3.

Ki岷縩 tr煤c m么 h矛nh

TinyLlama 谩p d峄g ki岷縩 tr煤c m么 h矛nh gi峄憂g h峄噒 v峄沬 Llama 2, c峄 th峄 l脿 ki岷縩 tr煤c Transformer. image.png C峄 th峄, n贸 s峄 d峄g:

  • Nh煤ng v峄 tr铆 (Positional Embedding): s峄 d峄g RoPE (Nh煤ng v峄 tr铆 quay-Rotary Positional Embedding) 膽峄 膽瓢a th么ng tin v峄 tr铆 v脿o.
  • RMSNorm: 膽峄 c岷 thi峄噉 膽峄 峄昻 膽峄媙h trong qu谩 tr矛nh hu岷 luy峄噉, qu谩 tr矛nh chu岷﹏ h贸a tr瓢峄沜 bao g峄搈 vi峄嘽 chu岷﹏ h贸a 膽岷 v脿o tr瓢峄沜 m峄梚 l峄沺 con m谩y bi岷縩 谩p.
  • SwiGLU: k岷縯 h峄 膽啤n v峄 tuy岷縩 t铆nh Swish v脿 Gated (GLU) - 膽贸 l脿 l媒 do t岷 sao n贸 膽瓢峄 膽岷穞 t锚n l脿 SwiGLU- l脿m ch峄ヽ n膬ng k铆ch ho岷.
  • Truy v岷 theo nh贸m ch煤 媒 (Grouped-query Attention): 膼峄 gi岷 chi ph铆 b膬ng th么ng b峄 nh峄 v脿 t膬ng t峄慶 膽峄 suy lu岷璶. C峄 th峄 h啤n, c谩c t谩c gi岷 cho bi岷縯 h峄 c贸 32 膽岷 m峄 ch煤 媒 truy v岷 v脿 4 nh贸m 膽岷 kh贸a-gi谩 tr峄. TinyLlama c贸 th峄 s峄 d峄g k峄 thu岷璽 n脿y 膽峄 ph芒n ph峄慽 c谩c bi峄僽 di峄卬 kh贸a v脿 gi谩 tr峄 tr锚n nhi峄乽 膽岷 trong khi v岷玭 duy tr矛 hi峄噓 su岷 cao.

Qu谩 tr矛nh hu岷 luy峄噉

Sau 膽贸, c谩c t谩c gi岷 tri峄僴 khai m峄檛 s峄 ph瓢啤ng 谩n 膽峄 t峄慽 瓢u h贸a t峄慶 膽峄 c峄 m么 h矛nh:

  • Ngu峄搉 c峄 t谩c gi岷 t铆ch h峄 Song song d峄 li峄噓 膽瓢峄 ph芒n chia ho脿n to脿n (Fully Sharded Data Parallel-FSDP) 膽峄 膽脿o t岷 膽a GPU v脿 膽a n煤t hi峄噓 qu岷. Vi峄嘽 t铆ch h峄 nhi峄乽 n煤t 膽i峄噉 to谩n gi煤p n芒ng cao t峄慶 膽峄 v脿 hi峄噓 qu岷 膽脿o t岷.
  • C谩c t谩c gi岷 s峄 d峄g FlashAttention 膽峄 t膬ng th么ng l瓢峄g t铆nh to谩n.

N峄乶 t岷g c峄 m么 h矛nh d峄盿 tr锚n lit-gpt. Trong giai 膽o岷 ti峄乶 膽脿o t岷, n贸 膽瓢峄 s峄 d峄g v峄沬 m峄 ti锚u m么 h矛nh h贸a ng么n ng峄 t峄 膽峄檔g h峄搃 quy (an auto-regressive language modeling goal). C峄 th峄 h啤n, h峄 s峄 d峄g tr矛nh t峄慽 瓢u h贸a AdamW, ph霉 h峄 v峄沬 c谩c tham s峄 c峄 Llama 2. H峄 c农ng s峄 d峄g l峄媍h tr矛nh t峄慶 膽峄 h峄峜 cosine ( cosine learning rate schedule) v峄沬 t峄慶 膽峄 h峄峜 t峄慽 膽a l脿 4,0 脳 10鈭4 v脿 t峄慽 thi峄僽 l脿 4,0 脳 10鈭5. H峄 s峄 d峄g 2.000 b瓢峄沜 kh峄焛 膽峄檔g 膽峄 t峄慽 膽a h贸a vi峄嘽 h峄峜 v脿 膽岷穞 quy m么 l么 th脿nh 2 tri峄噓 m茫 th么ng b谩o. H峄 膽i峄乽 ch峄塶h m峄ヽ gi岷 tr峄峮g s峄 (weight decay) th脿nh 0,1 v脿 s峄 d峄g ng瓢峄g c岷痶 膽峄 d峄慶 l脿 1,0 膽峄 ki峄僲 so谩t gi谩 tr峄 膽峄 d峄慶.

Tri峄僴 khai thu岷璽 to谩n

膼i峄乽 th煤 v峄 v峄 c谩c m么 h矛nh nh峄 l脿 ch煤ng 膽峄 nh峄 膽峄 thi岷縯 l岷璸 tr锚n d峄媍h v峄 膽谩m m芒y v脿 kh么ng y锚u c岷 GPU 膽峄 suy lu岷璶. Trong ph岷 vi b脿i vi岷縯 n脿y, m矛nh ch峄 s峄 d峄g TinyLlama l脿m chatbot v脿 minh h峄峚 n贸 v脿o Colab-Notebook. Tuy nhi锚n, m矛nh mong 膽峄 s岷 th峄眂 hi峄噉 膽瓢峄 nhi峄乽 nhi峄噈 v峄 s谩ng t岷 h啤n v峄沬 m么 h矛nh th煤 v峄 n脿y.

image.png

膼峄 b岷痶 膽岷, ch煤ng t么i s岷 c脿i 膽岷穞 v脿 nh岷璸 c谩c th瓢 vi峄噉 c岷 thi岷縯 cho m茫 c峄 m矛nh. Ch煤ng t么i s峄 d峄g TinyLlama/TinyLlama-1.1B-Chat-v1.0. 膼芒y l脿 checkpoint m峄沬 nh岷 c峄 TinyLlama tr锚n HuggingFace.

# Intall libraries
!pip install git+https://github.com/huggingface/transformers
!pip install -q -U accelerate


# Import libraries
import torch

import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import pipeline


# Call out model name
model_id = 'TinyLlama/TinyLlama-1.1B-Chat-v1.0'

Huggingface gi峄沬 thi峄噓 c谩c m岷玼 d脿nh cho m么 h矛nh tr貌 chuy峄噉. 膼峄慽 v峄沬 t么i, n贸 th芒n thi峄噉 v峄沬 ng瓢峄漣 d霉ng. C贸 2 c脿i 膽岷穞 cho vi峄嘽 n脿y:

  1. M岷玼 s岷 膽瓢峄 th峄眂 hi峄噉 t峄玭g b瓢峄沜.
  2. M岷玼 s岷 膽瓢峄 th峄眂 hi峄噉 b岷眓g m峄檛 膽瓢峄漬g 峄憂g (pipe).

Th峄眂 hi峄噉 t峄玭g b瓢峄沜

Tr瓢峄沜 h岷縯, ch煤ng ta t岷 th岷 c峄 m么 h矛nh xu峄憂g t峄 HuggingFace. 膼峄 l脿m nh瓢 v岷瓂, ch煤ng t么i l脿m theo d貌ng m茫 n脿y

model = AutoModelForCausalLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)

Sau 膽贸, ch煤ng ta x谩c 膽峄媙h vai tr貌 cho m么 h矛nh. Theo quan 膽i峄僲 c峄 m矛nh, b瓢峄沜 n脿y gi峄憂g nh瓢 t岷 cho ng瓢峄漣 m岷玼 m峄檛 b岷 s岷痗, l脿m cho n贸 tr峄 n锚n 鈥渟峄憂g 膽峄檔g鈥 h啤n.

# Check the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
messages = [
    {
        "role": "system",
        "content": "You are a content creator who is in love with technology that always repond as a geek style.",
    },
    {"role": "user", "content": "How to increase views for my content?"},
]

Ti岷縫 theo, g峄璱 n贸 t峄沬 ph瓢啤ng th峄ヽ apply_chat_template(). Sau khi ho脿n th脿nh vi峄嘽 膽贸, ch煤ng ta s岷 c贸 膽岷 ra s岷祅 s脿ng ho岷 膽峄檔g! Khi s峄 d峄g m岷玼 tr貌 chuy峄噉 l脿m 膽岷 v脿o t岷 m么 h矛nh, b岷 c农ng n锚n s峄 d峄g add_generation_prompt=True 膽峄 bao g峄搈 l峄漣 nh岷痗 t岷 m么 h矛nh.

tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, 
                                               add_generation_prompt=True, 
                                               return_tensors="pt")
print(tokenizer.decode(tokenized_chat[0]))


"""
<|system|>
You are a content creator who is in love with technology that always repond as a geek style.</s> 
<|user|>
How to increase views for my content?</s> 
<|assistant|>
"""

Gi峄 膽芒y, d峄 li峄噓 膽岷 v脿o c峄 ch煤ng ta 膽茫 膽瓢峄 chu岷﹏ b峄 膽煤ng c谩ch cho TinyLlama, ch煤ng ta c贸 th峄 s峄 d峄g m么 h矛nh n脿y 膽峄 膽瓢a ra c芒u tr岷 l峄漣 cho c芒u h峄廼 c峄 ng瓢峄漣 d霉ng.

outputs = model.generate(tokenized_chat, max_new_tokens=128, 
                                        do_sample=True, 
                                        temperature=0.7, 
                                        top_k=50, 
                                        top_p=0.95) 
print(tokenizer.decode(outputs[0]))


"""
<|system|>
You are a content creator who is in love with technology that always repond as a geek style.</s> 
<|user|>
How to increase views for my content?</s> 
<|assistant|>
Here are some ways to increase views for your content:

1. Create high-quality content: Make sure your content is engaging, informative, and relevant to your target audience.

2. Optimize your content for SEO: Ensure your content is optimized for search engines by including relevant keywords in your title, description, and content.

3. Use social media: Share your content on social media platforms like Twitter, Facebook, and Instagram to reach a wider audience.

4. Promote your content: Leverage social media platforms to promote your content and drive traffic to your website.
"""

Th峄眂 hi峄噉 v峄沬 pipe

Quy tr矛nh t岷 cu峄檆 tr貌 chuy峄噉 v峄沬 pipeline c峄 Huggingface cho ph茅p nh岷璸 n峄檌 dung tr貌 chuy峄噉, gi煤p vi峄嘽 ph谩t tri峄僴 m么 h矛nh tr貌 chuy峄噉 tr峄 n锚n c峄眂 k峄 d峄 d脿ng.

pipe = pipeline("text-generation", 
                model=model, 
                tokenizer=tokenizer,
                torch_dtype=torch.bfloat16, 
                device_map="auto")

messages = [
    {
        "role": "system",
        "content": "You are a content creator who is in love with technology that always repond as a geek style.",
    },
    {"role": "user", "content": "How to increase views for my content?"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=128, 
               do_sample=True, 
               temperature=0.7, 
               top_k=50, 
               top_p=0.95)
print(outputs[0]["generated_text"])

"""
<|system|>
You are a content creator who is in love with technology that always repond as a geek style.</s>
<|user|>
How to increase views for my content?</s>
<|assistant|>
Here are some tips to increase views for your content:

1. Optimize your content: Make sure your content is optimized for search engines. Use descriptive and relevant keywords, use headings and subheadings to break up your content, and add meta descriptions to your articles.

2. Share your content on social media: Share your content on your social media channels, including Facebook, Twitter, LinkedIn, and Instagram. This will help increase visibility and reach your target audience.

3. Use social media advertising: Use social media advertising to reach a wider audience. Platforms like Facebook
"""

Ch煤 媒: Ch煤ng ta c贸 th峄 th岷 r岷眓g k岷縯 qu岷 c峄 hai l岷 tri峄僴 khai kh谩c nhau 膽谩ng k峄. 膼芒y kh么ng ph岷 l脿 m峄檛 sai s贸t; n贸 ch峄 膽啤n gi岷 l脿 m峄檛 t铆nh n膬ng trong 膽贸 ch煤ng ta thay 膽峄昳 nhi峄噒 膽峄 (temperature), cho ph茅p m么 h矛nh t膬ng kh岷 n膬ng s谩ng t岷 c峄 n贸 trong m峄梚 l岷 ta t瓢啤ng t谩c v峄沬 n贸.

Tham kh岷

  1. Zhang et al. 鈥 TinyLlama: An Open-Source Small Language Model鈥 URL: https://arxiv.org/abs/2401.02385
  2. Jordan Hoffman et al. 鈥 Training Compute-Optimal Large Language Models 鈥 URL: https://arxiv.org/abs/2203.15556
  3. Thadd茅e, Y. T 鈥 Chinchilla鈥檚 death 鈥 URL: https://espadrine.github.io/blog/posts/chinchilla-sdeath.html.
  4. Hugging Face 鈥 Templates for chat models 鈥 URL: https://huggingface.co/docs/transformers/main/en/chat_templating

All Rights Reserved

Viblo
Let's register a Viblo Account to get more interesting posts.