<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Tokenizer on 寒寒的博客</title><link>https://blog.alikia2x.com/tags/tokenizer/</link><description>Recent content in Tokenizer on 寒寒的博客</description><generator>Hugo -- gohugo.io</generator><language>zh</language><lastBuildDate>Sun, 06 Oct 2024 23:22:18 +0800</lastBuildDate><atom:link href="https://blog.alikia2x.com/tags/tokenizer/index.xml" rel="self" type="application/rss+xml"/><item><title>Qwen(千问) 系列大模型的 tokenizer 为什么是乱码？</title><link>https://blog.alikia2x.com/posts/qwen-tokenizer/</link><pubDate>Sun, 06 Oct 2024 23:22:18 +0800</pubDate><guid>https://blog.alikia2x.com/posts/qwen-tokenizer/</guid><description>Qwen系列大模型的 tokenizer 的 vocabulary（词典）看起来有点奇怪，似乎全是乱码？这是因为它对原始Unicode做了一些变换。</description></item></channel></rss>