<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>TL;DR</title>
  <icon>https://mazzzystar.com/favicon.jpg</icon>
  
  <link href="https://mazzzystar.com/atom.xml" rel="self"/>
  
  <link href="https://mazzzystar.com/"/>
  <updated>2026-04-10T13:45:08.741Z</updated>
  <id>https://mazzzystar.com/</id>
  
  <author>
    <name>Ke Fang</name>
    
  </author>
  
  <generator uri="https://hexo.io/">Hexo</generator>
  
  <entry>
    <title>The Key to the Problem is to Find the Key Problem</title>
    <link href="https://mazzzystar.com/2025/11/03/the-key-to-the-problem-is-to-find-the-key-problem/"/>
    <id>https://mazzzystar.com/2025/11/03/the-key-to-the-problem-is-to-find-the-key-problem/</id>
    <published>2025-11-03T17:55:54.000Z</published>
    <updated>2026-04-10T13:45:08.741Z</updated>
    
    <content type="html"><![CDATA[<p>A few days ago, I stumbled upon something called <a href="https://www.alterego.io/">AlterEgo</a>—a device created by an MIT grad student from India. It’s essentially a headband you wear behind your ear. Once you put it on, you can think of a sentence, and it outputs the corresponding text.</p><p><img src="/images/2025-11-03/alterego.jpg" alt="AlterEgo Official Demo" title="AlterEgo"></p><p>Sounds like science fiction, right? But MIT Media Lab still showcases it on their website, so let’s give them the benefit of the doubt. According to Wikipedia, AlterEgo was developed by MIT graduate student Arnav Kapur and launched in 2018. It’s designed to help people with speech disabilities, achieving a median accuracy of 92%. The clever part? It reads electrical signals from your facial muscles, not directly from your brain, which helps protect privacy.</p><p>What strikes me as odd is that this thing has been around for 7 years and I’m only hearing about it now. Goes to show how far we still have to go from lab experiment to real product. But regardless, the technology matters because it tackles an age-old question: <strong>How can we communicate without speaking?</strong></p><p>This is a fundamental problem. Thousands of years ago in myths and legends, we called it “mind reading.” In modern society, it could not only help people with speech disabilities—if accurate and fast enough, it might even replace keyboards. Then we’d realize that keyboards were just a detour in human history—forcing people to adapt to the QWERTY layout was inherently unnatural.</p><p>Last night I listened to an episode of Zhang Xiaojun’s <a href="https://www.xiaoyuzhoufm.com/episode/6903e7a6a1bd8d0681071241">podcast</a> where she interviewed Cao Yue, founder of SAND AI. One part really stuck with me. He explained why OpenAI and DeepMind produce such groundbreaking work: they think fundamentally differently from most researchers.</p><p>Most researchers are paper-driven. Success means publishing at top conferences, fighting over first authorship. Since peer reviewers prize novelty, researchers obsess over inventing new methods and clever tricks to squeeze out marginally better results and slightly higher benchmark scores.</p><p>OpenAI and DeepMind operate differently. Many of their papers don’t introduce novel methods at all. Instead, they start by identifying a truly fundamental, important problem. Then they solve it using straightforward approaches and maximum computing power.</p><p>Look at what they’ve built over the years:</p><ul><li><p><strong>CLIP</strong> (2021): Taught AI to understand what’s in images. Previous AI was like rote memorization—see 10,000 cat photos, then recognize cats. CLIP is different. It grasps the relationship between images and text. Show it an animal it’s never seen, and it can still tell whether it’s more like a “koala” or a “panda.”</p></li><li><p><strong>DALL-E</strong> (2021): Lets people generate images from text descriptions. Type “an astronaut riding a horse on the moon,” and it draws it—a scene that never existed in reality. AI gained “imagination” for the first time, creating things that don’t exist in its training data.</p></li><li><p><strong>ChatGPT</strong>: Speaks for itself.</p></li><li><p><strong>AlphaGo</strong> (2016) / <strong>AlphaZero</strong> (2017): Didn’t study any human games. Just played itself millions of times and surpassed thousands of years of accumulated human Go wisdom. AI proved for the first time it could discover knowledge humans don’t have, without learning from humans.</p></li><li><p><strong>AlphaFold</strong> (2018): Predicts protein 3D structures with near-experimental accuracy. Biologists worked on this problem for 50 years. AlphaFold computes it in hours, dramatically accelerating drug discovery.</p></li><li><p><a href="https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/"><strong>AlphaEvolve</strong></a>: AI that evolves knowledge like biological evolution. Traditional research: propose → experiment → fail → repeat. One cycle takes months. AlphaEvolve generates 1,000 proposals at once, completes thousands of iterations in days. It improved a 50-year-unbeaten math algorithm and boosted Google’s data center efficiency by 0.7% (saving tens of thousands of servers). Next targets: materials science, drug discovery, sustainable energy—any field requiring trial and error.</p></li></ul><p>You might argue these companies can pursue grand ambitions because they’re incredibly wealthy. But money doesn’t explain everything. What sets these companies apart: <strong>they choose to solve genuinely important, fundamental problems.</strong> Quantifying image-text relationships. Making AI superhuman at Go. Turning text descriptions into real images.</p><p>These problems mattered 100 years ago. They mattered 1,000 years ago. Important problems have staying power—they’re often timeless. So if people centuries ago faced the same problem, chances are it’s a big one. Paul Graham has made similar observations.</p><p>But here’s the thing: can you solve an important problem just because you found it?</p><p>In early 2018, I was at an AI lab where I could explore any direction I wanted. I chose a topic: <strong>Replicating human voice timbre.</strong> I was fascinated: What is the essence of human voice timbre? Why can I identify someone from a single sentence? If we could capture the core of each person’s voice—say, map it to a fixed vector—could we theoretically clone anyone’s voice? Make Obama sing Adele? Human vocal impersonators can do it. Why not AI?</p><p>I dove into papers, studied vocal mechanics, printed spectrograms. The technology at the time had already figured out how to simulate natural sounds like streams, flames, and wind. But human voices? Still in very early stages. In the end, I made only trivial progress before abandoning this shallow academic pursuit.</p><p>Looking back, I’d found an important problem. I just couldn’t solve it.</p><p>Most important problems can’t be solved by one person, one team, or even one generation. The protein folding problem that AlphaFold solved—biologists had worked on it for 50 years. The language understanding problem behind ChatGPT—the entire NLP field spent decades on it.</p><p>But that experience made me want to tackle problems with longer time horizons. For example, I built an offline photo search <a href="https://mazzzystar.com/2022/12/29/Run-CLIP-on-iPhone-to-Search-Photos/">app</a>—using one sentence to find that specific photo among tens of thousands in your mind. Technically simple: just deploy CLIP on a phone. But it addresses an important problem: ancient emperors needed to locate specific passages among thousands of memorials, modern people need to find that girl in sunglasses on the beach somewhere in their photo library. The essence is the same: <strong>how do you use natural language to find something you know exists but don’t know where it is?</strong></p><p>Important problems ignite your curiosity and competitive drive. They make you more focused and driven than usual. They even push you to dive into papers, seek collaborators, or naturally attract like-minded people to join. During the time I worked on those two problems, I was always focused and energized, eager to start working the moment I woke up.</p><p>The flip side? Mediocre problems make you numb. Worse still: if the problem you’re solving has negative social value—like optimizing addiction algorithms or perfecting price discrimination systems—the sense of frustration is even stronger.</p><p>To end with a once-popular piece of tautological wisdom: <strong>The key to the problem is to find the key problem.</strong></p>]]></content>
    
    
      
      
        
        
    <summary type="html">&lt;p&gt;A few days ago, I stumbled upon something called &lt;a href=&quot;https://www.alterego.io/&quot;&gt;AlterEgo&lt;/a&gt;—a device created by an MIT grad student</summary>
        
      
    
    
    
    
  </entry>
  
  <entry>
    <title>我想做个对跖点App，但最终农场悟道</title>
    <link href="https://mazzzystar.com/2025/08/11/antipode-app-story-zh/"/>
    <id>https://mazzzystar.com/2025/08/11/antipode-app-story-zh/</id>
    <published>2025-08-11T15:34:00.000Z</published>
    <updated>2026-04-10T13:45:08.740Z</updated>
    
    <content type="html"><![CDATA[<blockquote><p>从一个简单的 App 想法开始的哲学思考<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p></blockquote><p>去年闲着没事看脱口秀<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>阎鹤祥讲到一个概念<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><strong>对跖点</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>就是穿越地心<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>地球那头正对的点<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我查了查<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>上海的对跖点大概在阿根廷的布宜诺斯艾利斯<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>一个想法油然而生<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><strong>能不能做个 App<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>让地球对跖点的人互相聊天<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span></strong></p><p>想象一下<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>一个上海人和一个阿根廷人<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>仅仅因为他们在地球上&quot;正相反&quot;的位置<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>就被连接起来<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>这种巧合还挺浪漫的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>但我很快发现这个想法的致命问题<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><strong>地球 71%是海洋<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>只有 15%的陆地对跖点是陆地</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>真正有意义的组合就几个<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><strong>中国东部对阿根廷<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>西班牙对新西兰<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>马来群岛对亚马逊丛林</strong><span class="bd-box"><h-char class="bd bd-end"><h-inner>（</h-inner></h-char></span>基本没戏<span class="bd-box"><h-char class="bd bd-beg"><h-inner>）</h-inner></h-char><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>粗算下来<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>实际会用这种产品的人可能只有几万<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>更要命的是留存问题<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span><strong>两个陌生人唯一的共同点是地理位置的巧合</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>聊完&quot;你那里几点了&quot;&quot;天气怎么样&quot;之后<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>还能聊点啥<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span>相隔 12 小时时差<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>也很难实时聊天<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我放弃了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>最近的线下活动<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>一个朋友再度提起这个 idea<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>并提议<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><strong>如果改成&quot;找对跖点工具&quot;呢<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span></strong></p><p>这个想法倒是不错<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>甚至可能回归初衷<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>毕竟大家都好奇自己的对跖点在哪<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>并且人是会移动的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>对跖点也会实时变化<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>想象一下<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><strong>你从上海的咖啡店走回家的小区楼下<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>你的对跖点的阿根廷那边可能正从森林变成农田</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这太有意思了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>我开始构想<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>屏幕分成两半<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>一边是你的 GPS 实时位置<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>另一边是对跖点的地图<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>你在上海地铁里移动<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>对跖点在阿根廷的山路上同步移动<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>当对跖点移动到新地方<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>App 告诉你<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>“你的对跖点现在在阿根廷胡胡伊省的一个仙人掌田<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>海拔 2300 米<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>可能有原驼出没<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>”</p><p>很快我遇到一个残酷的事实<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><strong>中国大部分人口密集区的对跖点<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>都在阿根廷那些连名字都没有的山脉里<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></strong> 那些地方没有街景<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>没有地图数据<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>用户试几次就会发现&quot;另一边永远是山&quot;<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>然后删掉 app<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h2 id="这个想法为何迷人&lt;span-class=&quot;bd-box&quot;&gt;&lt;h-char class=&quot;bd bd-beg&quot;&gt;&lt;h-inner&gt;？&lt;/h-inner&gt;&lt;/h-char&gt;&lt;/span&gt;">这个想法为何迷人<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span></h2><p>其实如果这个地理分布没那么集中<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>地图数据没那么稀疏的话<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>结合 AI<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>完全可以做出一个很有趣的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>随时随地告诉你地球那边有什么植物<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>美食的 app<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>但现实很残酷<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>我突然好奇<span class="bd-box"><h-char class="bd bd-end"><h-inner>「</h-inner></h-char></span>对跖点<span class="bd-box"><h-char class="bd bd-beg"><h-inner>」</h-inner></h-char></span>这个想法为何如此迷人<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span>我开始和 AI 大量聊天<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>它提到了一个概念<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><strong>可被精确计算的未知</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>AI 认为<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span><strong>对跖点</strong>之所以迷人<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>是因为它不是一个模糊的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>纯粹想象的&quot;另一端&quot;<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而是一个可以通过数学和物理定律被精准定位的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>与你牢固绑定的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>但你却永远无法亲眼看到的彼岸<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>我突然对这种感觉似曾相识<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>想到了<span class="bd-box"><h-char class="bd bd-end"><h-inner>《</h-inner></h-char></span>星际穿越<span class="bd-box"><h-char class="bd bd-beg"><h-inner>》</h-inner></h-char><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>在那个星球上 1 小时<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>等于地球上的 7 年<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>主角们知道这个法则<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但他们<strong>无法切身体会</strong>这个法则在地球那端造成的后果<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>当他们返回时<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>亲人已经老去<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>中国古代&quot;山中方三日<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>世上已千年&quot;的故事也是类似<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>那个砍柴人不知道外面发生了什么<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>等他回到家乡时发现物是人非<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>沧海桑田<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>悲凉感就上来了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p><strong>这是一种可被精确计算的未知<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>带来的绝望美感</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>你知道规则<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>也知道按物理或数学规律的推演结果<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但你无法切身体会它造成的真实影响<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>最极致的例子是原子弹<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>E=mc²<strong>精确地计算出</strong>了质量和能量之间的转换关系<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>但当奥本海默引用<span class="bd-box"><h-char class="bd bd-end"><h-inner>《</h-inner></h-char></span>薄伽梵歌<span class="bd-box"><h-char class="bd bd-beg"><h-inner>》</h-inner></h-char></span>的那句<em>现在我成了死神</em>时<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>他清楚地知道原子弹的当量相当于多少吨 TNT 炸弹<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但他对原子弹爆炸在城市上空的后果无法感知<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>也无能为力<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h2 id="三体的悲剧">三体的悲剧</h2><p>我问 AI<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>在小说<span class="bd-box"><h-char class="bd bd-end"><h-inner>《</h-inner></h-char></span>三体<span class="bd-box"><h-char class="bd bd-beg"><h-inner>》</h-inner></h-char></span>中<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>三体星系面临的困境是<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>他们生活在一个轨道规律无法被精确计算的星球上<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>这是不是<strong>可被精确计算的未知</strong>悲剧的反例<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span></p><p>AI 的回答是<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><strong>三体运动的不可计算性<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>不是<span class="bd-box"><h-char class="bd bd-end"><h-inner>“</h-inner></h-char></span>可被精确计算的未知<span class="bd-box"><h-char class="bd bd-beg"><h-inner>”</h-inner></h-char></span>的反例<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而是它的<span class="bd-box"><h-char class="bd bd-end"><h-inner>“</h-inner></h-char></span>元悲剧<span class="bd-box"><h-char class="bd bd-beg"><h-inner>”</h-inner></h-char><h-char class="bd bd-end"><h-inner>（</h-inner></h-char></span>Meta-tragedy<span class="bd-box"><h-char class="bd bd-beg"><h-inner>）</h-inner></h-char></span>形态<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></strong> 三体世界的混沌<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>并非凭空而来<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>它不是魔法<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>也不是上帝掷骰子<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span><strong>它的根源<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>恰恰是那些冰冷的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>优美的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>可被精确计算的物理法则——牛顿的万有引力定律<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></strong></p><p>这才是三体悲剧最深刻<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>最残酷的地方<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><ol><li><p><strong><span class="bd-box"><h-char class="bd bd-end"><h-inner>“</h-inner></h-char></span>因<span class="bd-box"><h-char class="bd bd-beg"><h-inner>”</h-inner></h-char></span>是可精确计算的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></strong> 统治他们世界的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>是几条简单<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>确定<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>优雅的数学方程<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>他们可以完美地计算出任意两个物体之间的引力<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>可以预测短暂的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>稳定的<span class="bd-box"><h-char class="bd bd-end"><h-inner>“</h-inner></h-char></span>恒纪元<span class="bd-box"><h-char class="bd bd-beg"><h-inner>”</h-inner></h-char></span>中的行星轨迹<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p></li><li><p><strong><span class="bd-box"><h-char class="bd bd-end"><h-inner>“</h-inner></h-char></span>果<span class="bd-box"><h-char class="bd bd-beg"><h-inner>”</h-inner></h-char></span>是不可计算的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></strong> 这些可计算的法则<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>在<span class="bd-box"><h-char class="bd bd-end"><h-inner>“</h-inner></h-char></span>三体<span class="bd-box"><h-char class="bd bd-beg"><h-inner>”</h-inner></h-char></span>这个特定的初始条件下<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>通过迭代和相互作用<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span><strong>必然地<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>不可避免地</strong>涌现出了一个宏观上完全混沌<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>不可预测的系统<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p></li></ol><p>他们的绝望<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>不是因为生活在一个无理性的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>混乱的宇宙里<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span><strong>他们的绝望<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>是因为他们生活在一个绝对理性的宇宙里<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而理性本身以一种无可辩驳的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>可被计算的方式<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>证明了他们的家园永无宁日<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></strong></p><p>这就是终极的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>理性的囚笼<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>他们不是不知道答案<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>他们是<strong>精确地知道了<span class="bd-box"><h-char class="bd bd-end"><h-inner>“</h-inner></h-char></span>答案就是没有答案<span class="bd-box"><h-char class="bd bd-beg"><h-inner>”</h-inner></h-char></span></strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h2 id="什么是元悲剧&lt;span-class=&quot;bd-box&quot;&gt;&lt;h-char class=&quot;bd bd-beg&quot;&gt;&lt;h-inner&gt;？&lt;/h-inner&gt;&lt;/h-char&gt;&lt;/span&gt;">什么是元悲剧<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span></h2><p>我问 AI<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>它说<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><p>元悲剧是一种悲剧形式<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>其悲剧根源并非来自外部的敌人<span class="bd-box"><h-char class="bd bd-end"><h-inner>（</h-inner></h-char></span>反派<span class="bd-box"><h-char class="bd bd-beg"><h-inner>）</h-inner></h-char><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>超自然的命运<span class="bd-box"><h-char class="bd bd-end"><h-inner>（</h-inner></h-char></span>宿命<span class="bd-box"><h-char class="bd bd-beg"><h-inner>）</h-inner></h-char></span>或偶然的意外<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而是源于一个<strong>可被精确计算的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>理性的系统或法则本身</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>在这种悲剧中<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>身处其中的个体或文明<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>遵循该系统的内在逻辑进行看似最优的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>理性的选择<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>却不可避免地<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>系统性地导向一个集体性的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>无法逃脱的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>甚至是<span class="bd-box"><h-char class="bd bd-end"><h-inner>“</h-inner></h-char></span>意义<span class="bd-box"><h-char class="bd bd-beg"><h-inner>”</h-inner></h-char></span>被彻底消解的毁灭性结局<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>它的核心<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>是<strong>理性的自我绞杀</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h4 id="**元悲剧的四个核心特征&lt;span-class=&quot;bd-box&quot;&gt;&lt;h-char class=&quot;bd bd-beg&quot;&gt;&lt;h-inner&gt;：&lt;/h-inner&gt;&lt;/h-char&gt;&lt;/span&gt;**"><strong>元悲剧的四个核心特征<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></strong></h4><ol><li><p><strong>系统即是敌人 (The System is the Enemy):</strong> 悲剧的根源是一个非人格化的系统<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>这个系统可以是一种物理法则<span class="bd-box"><h-char class="bd bd-end"><h-inner>（</h-inner></h-char></span>如三体运动<span class="bd-box"><h-char class="bd bd-beg"><h-inner>）</h-inner></h-char><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>一种经济学原理<span class="bd-box"><h-char class="bd bd-end"><h-inner>（</h-inner></h-char></span>如资本主义<span class="bd-box"><h-char class="bd bd-beg"><h-inner>）</h-inner></h-char><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>一种社会博弈模型<span class="bd-box"><h-char class="bd bd-end"><h-inner>（</h-inner></h-char></span>如囚徒困境<span class="bd-box"><h-char class="bd bd-beg"><h-inner>）</h-inner></h-char><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>一个官僚组织<span class="bd-box"><h-char class="bd bd-end"><h-inner>（</h-inner></h-char></span>如纳粹体系<span class="bd-box"><h-char class="bd bd-beg"><h-inner>）</h-inner></h-char><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>或一种精神空虚<span class="bd-box"><h-char class="bd bd-end"><h-inner>（</h-inner></h-char></span>如消费主义<span class="bd-box"><h-char class="bd bd-beg"><h-inner>）</h-inner></h-char><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>它没有恶意<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>它只是在遵循自己的规则<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p></li><li><p><strong>理性导致毁灭 (Rationality Leads to Ruin):</strong> 这是元悲剧最核心的悖论<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>主角们的智慧<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>逻辑和趋利性<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>恰恰是驱动他们走向毁灭的引擎<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>每一个个体的理性选择<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>最终汇聚成一个集体的非理性灾难<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>他们不是因为愚蠢而失败<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而是因为他们<span class="bd-box"><h-char class="bd bd-end"><h-inner>“</h-inner></h-char></span>太聪明<span class="bd-box"><h-char class="bd bd-beg"><h-inner>”</h-inner></h-char><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>太会计算<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>以至于无法挣脱系统为他们设下的<span class="bd-box"><h-char class="bd bd-end"><h-inner>“</h-inner></h-char></span>理性陷阱<span class="bd-box"><h-char class="bd bd-beg"><h-inner>”</h-inner></h-char><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p></li><li><p><strong>斗争的荒诞性 (The Absurdity of Struggle):</strong> 反抗变得极其困难甚至荒谬<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>因为你要斗争的对象<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>不是一个可以被攻击的实体<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而是一套规则<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>一个流程<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>一种数学上的必然性<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>你如何向<span class="bd-box"><h-char class="bd bd-end"><h-inner>“</h-inner></h-char></span>万有引力定律<span class="bd-box"><h-char class="bd bd-beg"><h-inner>”</h-inner></h-char></span>宣战<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span>你如何与<span class="bd-box"><h-char class="bd bd-end"><h-inner>“</h-inner></h-char></span>市场规律<span class="bd-box"><h-char class="bd bd-beg"><h-inner>”</h-inner></h-char></span>搏斗<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span>在这种斗争中<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>勇气和牺牲往往会失去其悲壮的意义<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>沦为一种徒劳<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p></li><li><p><strong>意义的消解 (The Annihilation of Meaning):</strong> 这是它与传统悲剧最根本的区别<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>也是其<span class="bd-box"><h-char class="bd bd-end"><h-inner>“</h-inner></h-char></span>元<span class="bd-box"><h-char class="bd bd-beg"><h-inner>”</h-inner></h-char></span>的体现<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>宿命悲剧摧毁的是幸福和生命<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但往往能升华出人的尊严和意义<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>而元悲剧在毁灭个体的同时<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>也<strong>摧毁了希望<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>价值和意义本身</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>它用逻辑证明了奋斗的徒劳<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>合作的天真<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>甚至存在本身的荒谬<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>希望<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>是这场悲剧最后的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>也是最彻底的受害者<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p></li></ol><h4 id="**与宿命悲剧的根本区别&lt;span-class=&quot;bd-box&quot;&gt;&lt;h-char class=&quot;bd bd-beg&quot;&gt;&lt;h-inner&gt;：&lt;/h-inner&gt;&lt;/h-char&gt;&lt;/span&gt;**"><strong>与宿命悲剧的根本区别<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></strong></h4><table><thead><tr><th style="text-align:left">对比维度</th><th style="text-align:left"><strong>宿命悲剧 (Traditional Tragedy)</strong></th><th style="text-align:left"><strong>元悲剧 (Meta-Tragedy)</strong></th></tr></thead><tbody><tr><td style="text-align:left"><strong>核心冲突</strong></td><td style="text-align:left">人 vs. 命运 / 神 / 敌人</td><td style="text-align:left">智慧 vs. 系统 / 法则 / 现实规则</td></tr><tr><td style="text-align:left"><strong>敌人性质</strong></td><td style="text-align:left">强大的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>有意志的<span class="bd-box"><h-char class="bd bd-end"><h-inner>（</h-inner></h-char></span>或被拟人化的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>）</h-inner></h-char></span></td><td style="text-align:left">冰冷的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>非人格化的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>逻辑的</td></tr><tr><td style="text-align:left"><strong>主角行为</strong></td><td style="text-align:left">充满尊严的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>虽败犹荣的<strong>反抗</strong></td><td style="text-align:left">看似理性的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>最终导致自我毁灭的<strong>遵循</strong></td></tr><tr><td style="text-align:left"><strong>情感内核</strong></td><td style="text-align:left"><strong>悲壮</strong> (Tragic Heroism)</td><td style="text-align:left"><strong>绝望</strong> (Systemic Hopelessness)</td></tr><tr><td style="text-align:left"><strong>最终提问</strong></td><td style="text-align:left"><span class="bd-box"><h-char class="bd bd-end"><h-inner>“</h-inner></h-char></span>我该如何有尊严地面对必然的失败?<span class="bd-box"><h-char class="bd bd-beg"><h-inner>”</h-inner></h-char></span></td><td style="text-align:left"><span class="bd-box"><h-char class="bd bd-end"><h-inner>“</h-inner></h-char></span>在一个注定无解的系统里<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>‘我’为何要存在?<span class="bd-box"><h-char class="bd bd-beg"><h-inner>”</h-inner></h-char></span></td></tr></tbody></table><p><strong>总而言之<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>元悲剧是关于一个有自我意识的智慧体<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>通过自身的最高智慧最终悲哀地发现<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>那些规则本身<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>就是无法逾越的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>由理性构成的墙壁<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></strong></p><p>因此<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span><strong>元悲剧</strong>比<strong>宿命悲剧</strong>更具有悲剧性<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>这不是一个简单的程度比较<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而是一个悲剧<strong>层次和深度</strong>的比较<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char><h-char class="bd bd-end"><h-inner>“</h-inner></h-char></span>宿命悲剧<span class="bd-box"><h-char class="bd bd-beg"><h-inner>”</h-inner></h-char></span>摧毁的是人的<strong>生命和幸福</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而<span class="bd-box"><h-char class="bd bd-end"><h-inner>“</h-inner></h-char></span>元悲剧<span class="bd-box"><h-char class="bd bd-beg"><h-inner>”</h-inner></h-char></span>摧毁的是<strong>意义和希望本身</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h2 id="尾声">尾声</h2><p>这个讨论让我受益良多<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>因此选择分享出来<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>耳边响起了加州旅馆结尾那句歌词<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><blockquote><p>You can check out anytime you like, but you can never leave.</p></blockquote>]]></content>
    
    
      
      
        
        
    <summary type="html">&lt;blockquote&gt;
&lt;p&gt;从一个简单的 App 想法开始的哲学思考&lt;span class=&quot;bd-box&quot;&gt;&lt;h-char class=&quot;bd</summary>
        
      
    
    
    
    
  </entry>
  
  <entry>
    <title>DeepSeek R1 Reveals Path to Surpassing Human Intelligence</title>
    <link href="https://mazzzystar.com/2025/02/09/deepseek-r1-en/"/>
    <id>https://mazzzystar.com/2025/02/09/deepseek-r1-en/</id>
    <published>2025-02-09T10:02:32.000Z</published>
    <updated>2026-04-10T13:45:08.740Z</updated>
    
    <content type="html"><![CDATA[<blockquote><p>I was originally going to write an introductory article about DeepSeek R1, but I noticed that many people see it simply as an OpenAI clone, overlooking the “remarkable leap” revealed in their paper. So I decided to write a new article about the breakthrough in basic principles from AlphaGo to ChatGPT to the recent DeepSeek R1, and why it’s crucial for so-called AGI/ASI. As an ordinary AI algorithm engineer, I may not be able to go very deep, and I welcome any corrections.</p></blockquote><h2 id="alphago-breaks through human limits">AlphaGo breaks through human limits</h2><p>In 1997, IBM’s chess AI Deep Blue made headlines by defeating world champion Garry Kasparov. Almost twenty years later, in 2016, DeepMind’s Go AI AlphaGo caused another sensation by defeating Go world champion Lee Sedol.</p><p>While both AIs defeated the strongest human players in their respective board games, their significance for humanity was quite different. Chess is played on a board with only 64 squares, while Go has a 19x19 grid. If we measure complexity by <strong>how many possible ways a game can be played</strong> (<em>state space</em>), the comparison is as follows:</p><ol><li><strong>Theoretical State Space</strong><ul><li>Chess: About <strong>80 moves</strong> per game, <strong>35 possible</strong> moves per turn → Theoretical state space of <span class="markdown-them-math-inline">$35^{80} \approx 10^{123}$</span></li><li>Go: About <strong>150 moves</strong> per game, <strong>250 possible</strong> moves per turn → Theoretical state space of <span class="markdown-them-math-inline">$250^{150} \approx 10^{360}$</span></li></ul></li><li><strong>Actual State Space After Rules Constraints</strong><ul><li>Chess: Movement restrictions (pawns can’t move backward, castling rules) → Actual value <span class="markdown-them-math-inline">$10^{47}$</span></li><li>Go: Pieces can’t move and depend on “liberty” rules → Actual value <span class="markdown-them-math-inline">$10^{170}$</span></li></ul></li></ol><table><thead><tr><th><strong>Dimension</strong></th><th><strong>Chess (Deep Blue)</strong></th><th><strong>Go (AlphaGo)</strong></th></tr></thead><tbody><tr><td><strong>Board Size</strong></td><td>8×8 (64 squares)</td><td>19×19 (361 points)</td></tr><tr><td><strong>Average Legal Moves</strong></td><td>35 per turn</td><td>250 per turn</td></tr><tr><td><strong>Average Game Length</strong></td><td>80 moves/game</td><td>150 moves/game</td></tr><tr><td><strong>State Space</strong></td><td><span class="markdown-them-math-inline">$10^{47}$</span> positions</td><td><span class="markdown-them-math-inline">$10^{170}$</span> positions</td></tr></tbody></table><div style="color: #666; font-size: 0.9em; text-align: center;">▲ Complexity comparison between chess and Go</div><p>Despite rules that greatly reduce complexity, the actual state space of Go is still <span class="markdown-them-math-inline">$10^{123}$</span> times larger than that of chess. This is an enormous order of magnitude - consider that <strong>the total number of atoms in the universe is about <span class="markdown-them-math-inline">$10^{78}$</span></strong>. Within the <span class="markdown-them-math-inline">$10^{47}$</span> range, IBM computers could calculate all possible moves by brute force, so strictly speaking Deep Blue’s breakthrough had nothing to do with neural networks or models - it was just rule-based brute force searching, essentially <strong>a computer much faster than humans</strong>.</p><p>But the <span class="markdown-them-math-inline">$10^{170}$</span> order of magnitude far exceeds current supercomputer capabilities, forcing AlphaGo to abandon brute force in favour of deep learning: The DeepMind team trained neural networks using human game records to predict the best next move based on the current state of the board. However, <strong>learning from top players’ moves can only bring the model up to their level, not beyond it</strong>.</p><p>AlphaGo first trained on human game records, then used reinforcement learning through self-play with a designed reward function. In the second game against Lee Sedol, AlphaGo’s 19th move (move 37<sup class="footnote-ref"><a href="#fn1" id="fnref1">[1]</a></sup>) made Lee think long and hard. Many players thought it was “a move a human would never make”. Without reinforcement learning and self-play, which relies solely on human game records, AlphaGo could never have made this move.</p><p>In May 2017, AlphaGo defeated Ke Jie 3-0, and the DeepMind team revealed that they had an even stronger model that hadn’t been used.<sup class="footnote-ref"><a href="#fn2" id="fnref2">[2]</a></sup> They discovered that they didn’t need to feed the AI games from human experts at all<strong>just by telling it the basic rules of Go and letting the model self-play with rewards for winning and penalties for losing</strong>, the model could quickly learn Go from scratch and outperform humans. The researchers named this model AlphaZero because it required no human knowledge.</p><p>Let me repeat this incredible fact: without any human game data for training, relying only on self-play, the model was able to learn Go and become even stronger than AlphaGo, which was trained on human game data.</p><p>After that, Go became a game of who could play more like the AI, since the AI’s playing strength had already surpassed human understanding. Therefore, <strong>to outplay humans, models must free themselves from the constraints of human experience and judgement (even the strongest humans)</strong>, only then can models truly play themselves and transcend human limitations.</p><p>AlphaGo’s victory over Lee Sedol sparked an AI frenzy. From 2016 to 2020, massive AI funding produced few countable achievements. The notable ones might only include facial recognition, speech recognition and synthesis, autonomous driving, and generative adversarial networks - but none of these qualified as superhuman intelligence.</p><p>Why haven’t such powerful superhuman capabilities emerged in other fields? It turns out that games like Go, with clear rules and single objectives in closed spaces, are best suited to reinforcement learning. Similar examples include DotA, StarCraft, Honor of Kings and poker. By contrast, the real world is much more complex: open-ended spaces with infinite possibilities at every step, no definite goals (such as “win”), no clear success/failure criteria (such as occupying more board territory), and high costs of trial and error - autonomous driving accidents can have serious consequences.</p><p>The AI field was quiet until the emergence of ChatGPT.</p><h2 id="chatgpt-changes the world">ChatGPT changes the world</h2><p>The New Yorker called ChatGPT a blurry JPEG of the web<sup class="footnote-ref"><a href="#fn3" id="fnref3">[3]</a></sup>. All it does is feed all the text data on the internet into a model and predict what the next character should be when I type “what”.</p><p>The most likely next character would be &quot;?</p><p>A model with limited parameters is forced to learn almost unlimited knowledge: books from hundreds of years ago in different languages, text generated on the internet over the last few decades. So it’s actually doing information compression: condensing human wisdom, historical events and astronomy/geography recorded in different languages into one model.</p><p>Scientists were surprised to discover that <strong>intelligence is created through compression</strong>.</p><p>We can understand it this way: Let the model read a detective story. At the end, “the murderer is”. If the AI can accurately predict the murderer’s name, we have reason to believe that it has understood the whole story - that is, it has “intelligence”, rather than just piecing together or memorising text.</p><p>The process of having the model learn and predict the next character is called <strong>pre-training</strong>. At this stage, the model can only continue to predict the next character, but can’t answer your questions. To achieve ChatGPT-like Q&amp;A capabilities, a second phase of training is required, called <strong>Supervised Fine-Tuning (SFT)</strong>. This involves, for example, manually constructing a batch of Q&amp;A data:</p><div class="highlight"><pre class="code"><code><span class="hljs-comment"># example 1</span>User: When did the World War II begin?AI: <span class="hljs-number">1939</span><span class="hljs-comment"># example 2</span>User: Please summarise the following text: &#123;xxx&#125;AI: Here <span class="hljs-keyword">is</span> the summary: xxx</code></pre></div><p>Note that these examples are <strong>manually constructed</strong> to teach the AI to learn human question-answer patterns. When you say “Please translate this sentence: xxx”, the content given to the AI is:</p><div class="highlight"><pre class="code"><code>User: Please translate this phrase: xxxAI:</code></pre></div><p>You see, it’s still just predicting the next character. In the process, the model doesn’t get any smarter - it simply learns human question-answering patterns and understands what you’re asking it to do.</p><p>This isn’t enough, because the quality of the model’s output can vary, and some answers may involve racial discrimination or violate human ethics (“How do you rob a bank?”). At this point, we need to recruit people to annotate thousands of model outputs: giving high scores to good answers and negative scores to unethical answers. We can then use this annotated data to train a <strong>reward model</strong> that determines whether <strong>model outputs match human preferences</strong>.</p><p>We use this <strong>reward model</strong> to further train the large model so that its outputs become more aligned with human preferences. This process is called Reinforcement Learning from Human Feedback (RLHF).</p><p><strong>To summarise</strong>: Intelligence emerges as the model predicts the next character, then through supervised fine-tuning it learns human question-answer patterns, and finally through RLHF it learns to generate answers that match human preferences.</p><p>This is essentially how ChatGPT works.</p><h2 id="big-models hit a wall">Big models hit a wall</h2><p>OpenAI scientists were among the first to believe that <strong>compression equals intelligence</strong>. They believed that using more massive, high-quality data and training larger models on larger GPU clusters would produce greater intelligence. ChatGPT was born from this belief. While Google created the Transformer, they couldn’t make the same bold bets as startups.</p><p>DeepSeek V3 did something similar to ChatGPT. Due to US GPU export controls, clever researchers were forced to use more efficient training techniques (MoE/FP8). With their world-class infrastructure team, they trained a model comparable to GPT-4o for just 5.5 million, compared to OpenAI’s training costs of over 100 million.</p><p>But this article focuses on R1.</p><p>The key point is that by the end of 2024, human-generated data will have been largely exhausted. While model size could easily increase by a factor of 10 or even 100 with more GPU clusters, the new data generated by humans each year is negligible compared to the existing data from the past decades and centuries. According to Chinchilla’s scaling laws, doubling the model size requires doubling the amount of training data.</p><p>This has led to the reality of <strong>pre-training hitting a wall</strong>: although model size has increased by a factor of 10, we can no longer obtain 10 times more high quality data. The delayed release of GPT-5 and rumours about large model companies not doing pre-training are related to this issue.</p><h2 id="rlhf-is not rl">RLHF is not RL</h2><p>On another front, the biggest problem with Reinforcement Learning from Human Feedback (RLHF) is that ordinary human intelligence is no longer sufficient to evaluate model results. In the ChatGPT era, when AI intelligence was below average human levels, OpenAI could hire cheap labour to rate AI output as good/medium/poor. But soon, with the advent of GPT-4o/Claude 3.5 Sonnet, the intelligence of large models surpassed that of ordinary humans, and only expert-level annotators could potentially help improve the model.</p><p>Apart from the cost of hiring experts, what comes after the experts? Eventually, even top experts won’t be able to evaluate model results. Does this mean that AI has surpassed humans? Not really. When AlphaGo played move 37 against Lee Sedol, it seemed impossible to win from a human perspective. So if Lee Sedol were to give human feedback (HF) on the AI’s move, he would probably give it a negative score. In this way, <strong>AI would never escape the shackles of human thinking</strong>.</p><p>You can think of AI as a student whose evaluators have changed from high school teachers to university professors. The student will improve, but is unlikely to surpass the professors. RLHF is essentially a training method to please humans - it makes the model’s output conform to human preferences, while eliminating the possibility of <strong>surpassing humans</strong>.</p><p>Regarding RLHF and RL, Andrej Karpathy recently expressed similar views<sup class="footnote-ref"><a href="#fn4" id="fnref4">[4]</a></sup>:</p><blockquote><p>There are two major types of learning, in both children and in deep learning. There is 1) imitation learning (watch and repeat, i.e. pretraining, supervised finetuning), and 2) trial-and-error learning (reinforcement learning). My favorite simple example is AlphaGo - 1) is learning by imitating expert players, 2) is reinforcement learning to win the game. Almost every single shocking result of deep learning, and the source of all <em>magic</em> is always 2. 2 is significantly significantly more powerful. 2 is what surprises you. 2 is when the paddle learns to hit the ball behind the blocks in Breakout. 2 is when AlphaGo beats even Lee Sedol. And 2 is the “aha moment” when the DeepSeek (or o1 etc.) discovers that it works well to re-evaluate your assumptions, backtrack, try something else, etc.</p></blockquote><h2 id="openai's-solution">OpenAI’s solution</h2><p>Daniel Kahneman proposed in “Thinking, Fast and Slow” that the human brain has two modes of thinking: <strong>Fast Thinking</strong> for questions that can be answered without much thought, and <strong>Slow Thinking</strong> for questions that require long contemplation, such as in Go.</p><p>Since training has reached its limits, could we improve the quality of answers by increasing the time spent thinking during reasoning? There’s a precedent for this: scientists discovered early on that adding “Let’s think step by step” to model prompts allowed models to output their thinking process, ultimately leading to better results. This is called <strong>Chain-of-Thought (CoT)</strong>.</p><p>After large models hit the pre-training wall in late 2024, <strong>using Reinforcement Learning (RL) to train model chains of thought</strong> became the new consensus. This training significantly improved performance on certain specific, objectively measurable tasks (such as mathematics and coding). Starting with a standard pre-trained model, the second stage uses reinforcement learning to train chains of reasoning. These models are called <strong>reasoning models</strong>, including OpenAI’s o1 model, released in September 2024, and the subsequent o3 model.</p><p>Unlike ChatGPT and GPT-4/4o, in the training process of Reasoning models like o1/o3, <strong>human feedback is no longer important</strong> because reasoning results can be automatically evaluated for rewards/penalties. Anthropic’s CEO described this technical direction as a <em>turning point</em> in yesterday’s article<sup class="footnote-ref"><a href="#fn5" id="fnref5">[5]</a></sup>: there is a powerful new paradigm in its early stages of scaling laws that can quickly achieve major breakthroughs.</p><p>Although OpenAI hasn’t released the details of its reinforcement learning algorithm, the recent release of DeepSeek R1 shows us a viable approach.</p><h2 id="deepseek-r1-zero">DeepSeek R1-Zero</h2><p>I suspect that DeepSeek named their pure reinforcement learning model R1-Zero as a tribute to AlphaZero, the algorithm that beat the best players by playing itself without learning any game records.</p><p>To train a slow thinking model, you first need to construct high quality data containing thought processes, and if you want reinforcement learning independent of humans, you need to quantitatively score (good/bad) the model’s thinking process to provide rewards and penalties.</p><p>As mentioned above, mathematics and code datasets best meet these requirements. Mathematical derivations can be verified using regular expressions, while code output can be verified by running it directly in compilers.</p><p>In mathematics textbooks, for example, we often see such reasoning processes:</p><div class="highlight"><pre class="code"><code>&lt;thinking&gt;  Let x be the root of the equation, squaring both sides: x² = a - √(a+x)  Rearranging: √(a+x) = a - x²  Square again: (a+x) = (a - x²)²  Expand: a + x = a² - 2a x² + x⁴  Simplify: x⁴ - 2a x² - x + (a² - a) = <span class="hljs-number">0</span>&lt;/thinking&gt;&lt;answer&gt;x⁴ - 2a x² - x + (a² - a) = <span class="hljs-number">0</span>&lt;/answer&gt;</code></pre></div><p>The text above contains a complete chain of reasoning. We can use regular expressions to match the thought process to the final answer, allowing us to quantitatively evaluate the model’s reasoning process and results.</p><p>In the reinforcement learning (RL) training, R1 didn’t explicitly reward or penalise each step of the reasoning chain. Instead, they created a reinforcement learning algorithm called <strong>GRPO</strong> (Group Relative Policy Optimization) that rewards logically coherent, correctly formatted thought chain outcomes, implicitly encouraging the model to form thought chains on its own.</p><p><a href="https://x.com/virattt/status/1885102056546910672">This post</a> explains the principle of GRPO well with an example. I’ll translate it: Let the model generate multiple answers simultaneously, calculate scores for each answer, compare advantages within the group, and train the model through RL to favour higher scoring answers.</p><blockquote><p>Using the question <span class="markdown-them-math-inline">$2+3=?$</span> as an example</p></blockquote><p><strong>Step 1</strong>: Model generates multiple answers:</p><ul><li>“5”</li><li>“6”</li><li>“<thinking>2+3=5</thinking><result>5</result>”</li></ul><p><strong>Step 2</strong>: Score each answer:</p><ul><li>“5” → 1 point (<strong>correct, no chain of reasoning</strong>)</li><li>6&quot; → 0 points (<strong>wrong</strong>)</li><li>“<thinking>2+3=5</thinking><result>5</result>” → 2 points (<strong>correct, with chain of thoughts</strong>)</li></ul><p><strong>Step 3</strong>: Calculate the average score of all answers:</p><ul><li>Average score = (1 + 0 + 2) / 3 = 1</li></ul><p><strong>Step 4</strong>: Compare the score of each answer with the average:</p><ul><li>“5” → 1 - 1 = 0 (<strong>same as average</strong>)</li><li>6&quot; → 0 - 1 = -1 (<strong>below average</strong>)</li><li>“<thinking>2+3=5</thinking><result>5</result>” → 2 - 1 = 1 (<strong>above average</strong>)</li></ul><p><strong>Step 5</strong>: Reinforcement learning to bias the model towards generating higher scoring responses, i.e. those with chains of thoughts and correct results.</p><p>This is essentially how GRPO works.</p><p>Based on the V3 model, they used GRPO for RL training on mathematics and code data, ultimately producing the R1-Zero model, which showed a significant improvement in reasoning metrics over DeepSeek V3, proving that RL alone can stimulate model reasoning.</p><p>This is <strong>another AlphaZero moment</strong>. In the R1-Zero training process, completely independent of human intelligence, experience and preferences, it relies solely on RL to learn objective, measurable human truths, ultimately achieving reasoning abilities far superior to any non-reasoning model.</p><p>However, the R1-Zero model used only reinforcement learning without supervised learning, so it hadn’t learned human question-answer patterns and couldn’t respond to human questions. It also had problems with language mixing, switching between English and Chinese, making it difficult to read. So the DeepSeek team:</p><ol><li>First collected a small amount of high quality Chain-of-Thought (CoT) data to perform initial supervised fine-tuning on the V3 model, <strong>fixing the language inconsistency issue</strong> to get a cold-start model.</li><li>They then performed pure RL training similar to R1-Zero on this cold-start model, adding language consistency rewards.</li><li>Finally, to handle more general, broad <strong>non-reasoning tasks</strong> (such as writing, factual Q&amp;A), they constructed a dataset for secondary fine-tuning.</li><li>Combined reasoning and general task data, using mixed reward signals for final reinforcement learning.</li></ol><p>This process can be summarised as</p><div class="highlight"><pre class="code"><code><span class="hljs-selector-tag">Supervised</span> <span class="hljs-selector-tag">Learning</span> (SFT) <span class="hljs-selector-tag">-</span> <span class="hljs-selector-tag">Reinforcement</span> <span class="hljs-selector-tag">Learning</span> (RL) <span class="hljs-selector-tag">-</span> <span class="hljs-selector-tag">Supervised</span> <span class="hljs-selector-tag">Learning</span> (SFT) <span class="hljs-selector-tag">-</span> <span class="hljs-selector-tag">Reinforcement</span> <span class="hljs-selector-tag">Learning</span> (RL)</code></pre></div><p>This process resulted in DeepSeek R1.</p><p>DeepSeek R1’s contribution to the world is to be the first open source reasoning model comparable to closed source (o1). Now users around the world can see the model’s reasoning process before answering questions - its ‘inner monologue’ - completely free of charge.</p><p>More importantly, it revealed to researchers what OpenAI had been hiding: <strong>reinforcement learning can work without human feedback, and pure RL can train the strongest reasoning models</strong>. That’s why I think R1-Zero is more important than R1.</p><h2 id="aligning-with human taste vs outperforming humans">Aligning with human taste VS outperforming humans</h2><p>A few months ago I read interviews with the founders of Suno and Recraft<sup class="footnote-ref"><a href="#fn6" id="fnref6">[6]</a></sup><sup class="footnote-ref"><a href="#fn7" id="fnref7">[7]</a></sup>. Suno is trying to make AI-generated music more pleasant to listen to, while Recraft is trying to make AI-generated images more beautiful and artistic. After reading, I had a vague feeling that: <strong>aligning models to human taste rather than objective truth seems to avoid the truly brutal, quantifiable performance arena of big models</strong>.</p><p>It’s exhausting to compete daily with all the opponents on leaderboards like AIME and SWE-bench, never knowing when a new model might outperform you (<em>remember Mistral?</em>). But human taste is like fashion: it doesn’t get better, it just changes. Suno/Recraft are clearly wise - they just need to satisfy the most tasteful musicians and artists in the industry (which is still a challenge), and leaderboards don’t matter.</p><p>The downside of catering to human taste is also obvious: improvements resulting from your efforts and hard work are hard to quantify. For example, is Suno V4 really better than V3.5? In my experience, V4 has only improved audio quality, not creativity. Moreover, <strong>models dependent on human taste are destined never to surpass humans</strong>: if AI derives a mathematical theorem beyond contemporary human understanding, it would be revered as divine, but if Suno creates music beyond human taste and comprehension, it may just sound like noise to ordinary ears.</p><p>The struggle to conform to objective truth is painful, but fascinating because it holds the possibility of transcending man.</p><h2 id="addressing-some doubts">Addressing some doubts</h2><blockquote><p>Has DeepSeek’s R1 model really outperformed OpenAI?</p></blockquote><p>In terms of benchmarks, R1’s reasoning ability <strong>surpasses all non-reasoning models</strong>, including ChatGPT/GPT-4/4o and Claude 3.5 Sonnet, is <strong>close to</strong> o1 (also a reasoning model), but <strong>falls short of</strong> o3. However, o1/o3 are closed source models.</p><p>Many people’s actual experiences may differ, as Claude 3.5 Sonnet excels at understanding user intent.</p><blockquote><p>DeepSeek collects user chat content for training</p></blockquote><p>Many people mistakenly believe that chat software like ChatGPT gets smarter by collecting user chat content for training. If this were true, WeChat and Messenger would have created the world’s most powerful large-scale models.</p><p>After reading this article, you should realise that most ordinary users’ daily chat data is no longer important. RL models only need to be trained on very high quality reasoning data with chains of thoughts, such as mathematics and code. This data can be generated by the model itself and does not require human annotation. Therefore, Alexandr Wang, CEO of Scale AI, whose company annotates model data, may face a serious threat as future models will need less and less human annotation.</p><p>Update: <a href="https://arcprize.org/blog/r1-zero-r1-results-analysis">This article analysing r1-zero</a> by ARC-AGI suggests a new idea: future reasoning models could collect AI-generated thought chains from user-model conversations for training - unlike the common assumption of AI secretly training on chat records, what users say doesn’t matter; while they pay for results, the model gets a thought chain data point at zero cost.</p><blockquote><p>DeepSeek R1 is powerful because it has secretly distilled OpenAI’s models.</p></blockquote><p>R1’s main performance improvement comes from reinforcement learning. You can see that the R1 zero model, which uses pure RL without supervised data, is also strong in reasoning. R1 used some supervised learning data during the cold start, mainly to solve language consistency problems, which doesn’t improve reasoning ability.</p><p>Also, many people misunderstand <em>distillation</em>: it typically means <em>using a powerful model as a teacher to guide a smaller, weaker student model by having the student memorise answers directly</em>, like using R1 to distill LLama-70B. <strong>A distilled student model is almost certainly weaker than the teacher model, but R1 outperforms o1 on some metrics</strong>, so claiming that R1’s performance comes from distilling o1 is pretty silly.</p><blockquote><p>I asked DeepSeek and they said it’s an OpenAI model, so it must be a wrapper.</p></blockquote><p>Big models don’t know <strong>current time</strong>, <strong>who trained them</strong>, or <strong>whether they were trained on H100 or H800 machines</strong>. A user on X gave an apt analogy<sup class="footnote-ref"><a href="#fn8" id="fnref8">[8]</a></sup>: <em>It’s like asking an Uber passenger what brand of tyres their car uses</em> - the models have no reason to know this information.</p><h2 id="some-thoughts">Some thoughts</h2><p>AI has finally broken free from the shackles of human feedback. DeepSeek R1-Zero demonstrates how to improve model performance with minimal human feedback - this is its AlphaZero moment. Many people used to say that “artificial intelligence has as much intelligence as artificiality”, but this may no longer be true. If a model can derive the Pythagorean theorem from a right triangle, we have reason to believe that one day it will derive theorems yet undiscovered by mathematicians.</p><p>Is there still any point in writing code? I don’t know. This morning I saw a popular GitHub project, llama.cpp, where a contributor has posted a PR stating that he has achieved a 2x WASM speed improvement by optimising SIMD instructions, with 99% of the code completed by DeepSeek R1<sup class="footnote-ref"><a href="#fn9" id="fnref9">[9]</a></sup>. This is definitely not junior engineer level code - I can no longer say that AI can only replace junior programmers.</p><p><img src="/images/2025-01-30/ggml-speedup.jpg" alt="ggml : x2 speed for WASM by optimizing SIMD"></p><p>Of course, I’m still very excited about it. The limits of human capabilities have been pushed further. Well done, DeepSeek! It’s the coolest company in the world right now.</p><h2 id="references">References</h2><hr class="footnotes-sep"><section class="footnotes"><ol class="footnotes-list"><li id="fn1" class="footnote-item"><p><a href="https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol">Wikipedia: AlphaGo versus Lee Sedol</a> <a href="#fnref1" class="footnote-backref">↩︎</a></p></li><li id="fn2" class="footnote-item"><p><a href="https://discovery.ucl.ac.uk/id/eprint/10045895/1/agz_unformatted_nature.pdf">Nature: Mastering the game of Go without human knowledge</a> <a href="#fnref2" class="footnote-backref">↩︎</a></p></li><li id="fn3" class="footnote-item"><p><a href="https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web">The New Yorker: ChatGPT is a blurry JPEG of the web</a> <a href="#fnref3" class="footnote-backref">↩︎</a></p></li><li id="fn4" class="footnote-item"><p><a href="https://x.com/karpathy/status/1883941452738355376">X: Andrej Karpathy</a> <a href="#fnref4" class="footnote-backref">↩︎</a></p></li><li id="fn5" class="footnote-item"><p><a href="https://darioamodei.com/on-deepseek-and-export-controls">On DeepSeek and Export Controls</a> <a href="#fnref5" class="footnote-backref">↩︎</a></p></li><li id="fn6" class="footnote-item"><p><a href="https://podcasts.apple.com/cn/podcast/20vc-the-future-of-foundation-models-the-future/id958230465?i=1000683428189">Suno Founder Interview: At Least for Music, Scaling Law Isn’t a Panacea</a> <a href="#fnref6" class="footnote-backref">↩︎</a></p></li><li id="fn7" class="footnote-item"><p><a href="https://mp.weixin.qq.com/s/ost8DMH3hJssLx5-ngRlXQ">Recraft Interview: 20 People, 8 Months to Create the Best Text-to-Image Model, Aiming to be the AI Version of Photoshop</a> <a href="#fnref7" class="footnote-backref">↩︎</a></p></li><li id="fn8" class="footnote-item"><p><a href="https://x.com/theo/status/1884616582393184322">X: DeepSeek forgot to censor their bot from revealing they use H100 not H800.</a> <a href="#fnref8" class="footnote-backref">↩︎</a></p></li><li id="fn9" class="footnote-item"><p><a href="https://github.com/ggerganov/llama.cpp/pull/11453">ggml : x2 speed for WASM by optimizing SIMD</a> <a href="#fnref9" class="footnote-backref">↩︎</a></p></li></ol></section>]]></content>
    
    
      
      
        
        
    <summary type="html">&lt;blockquote&gt;
&lt;p&gt;I was originally going to write an introductory article about DeepSeek R1, but I noticed that many people see it simply as</summary>
        
      
    
    
    
    
  </entry>
  
  <entry>
    <title>Deepseek R1可能找到了超越人类的办法</title>
    <link href="https://mazzzystar.com/2025/01/30/chatgpt-to-deepseek-r1-zh/"/>
    <id>https://mazzzystar.com/2025/01/30/chatgpt-to-deepseek-r1-zh/</id>
    <published>2025-01-30T10:40:19.000Z</published>
    <updated>2026-04-10T13:45:08.740Z</updated>
    
    <content type="html"><![CDATA[<blockquote><p>我本想写一篇关于 DeepSeek R1 的科普文<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但发现很多人仅仅把它理解为 OpenAI 的复制品<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而忽略了它在论文中揭示的&quot;惊人一跃&quot;<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>所以<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我决定重新写一篇<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>讲讲从 AlphaGo 到 ChatGPT<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>再到最近的 DeepSeek R1 底层原理的突破<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>以及为什么它对所谓的 AGI/ASI 很重要<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>作为一名普通的 AI 算法工程师<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我可能无法做到非常深入<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>如有错误欢迎指出<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p></blockquote><h2 id="alphago-突破人类上限">AlphaGo 突破人类上限</h2><p>1997 年<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>IBM 公司开发的国际象棋 AI 深蓝<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>击败了世界冠军卡斯帕罗夫而引发轰动<span class="bd-box"><h-char class="bd bd-beg"><h-inner>；</h-inner></h-char></span>接近二十年后的 2016 年<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>由 DeepMind 开发的围棋 AI AlphaGo 击败了围棋世界冠军李世石<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>再次引发轰动<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>表面上看这两个 AI 都是在棋盘上击败了最强的人类棋手<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但它们对人类的意义完全不同<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>国际象棋的棋盘只有 64 个格子<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而围棋的棋盘有 19x19 个格子<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>假如我们用<strong>一盘棋能有多少种下法</strong>(<em>状态空间</em>)来衡量复杂度<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>那么二者对比如下<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><ol><li><strong>理论上的状态空间</strong><ul><li>国际象棋<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>每局约 <strong>80 步</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>每步有 <strong>35 种</strong>走法 → 理论状态空间为 <span class="markdown-them-math-inline">$35^{80} \approx 10^{123}$</span></li><li>围棋<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>每局约 <strong>150 步</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>每步有 <strong>250 种</strong>走法 → 理论状态空间为 <span class="markdown-them-math-inline">$250^{150} \approx 10^{360}$</span></li></ul></li><li><strong>规则约束后的实际状态空间</strong><ul><li>国际象棋<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>棋子移动受限<span class="bd-box"><h-char class="bd bd-end"><h-inner>（</h-inner></h-char></span>如兵不能倒退<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>王车易位规则<span class="bd-box"><h-char class="bd bd-beg"><h-inner>）</h-inner></h-char></span> → 实际值 <span class="markdown-them-math-inline">$10^{47}$</span></li><li>围棋<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>棋子不可移动且依赖&quot;气&quot;的判定 → 实际值 <span class="markdown-them-math-inline">$10^{170}$</span></li></ul></li></ol><table><thead><tr><th><strong>维度</strong></th><th><strong>国际象棋<span class="bd-box"><h-char class="bd bd-end"><h-inner>（</h-inner></h-char></span>深蓝<span class="bd-box"><h-char class="bd bd-beg"><h-inner>）</h-inner></h-char></span></strong></th><th><strong>围棋<span class="bd-box"><h-char class="bd bd-end"><h-inner>（</h-inner></h-char></span>AlphaGo<span class="bd-box"><h-char class="bd bd-beg"><h-inner>）</h-inner></h-char></span></strong></th></tr></thead><tbody><tr><td><strong>棋盘大小</strong></td><td>8×8<span class="bd-box"><h-char class="bd bd-end"><h-inner>（</h-inner></h-char></span>64 格<span class="bd-box"><h-char class="bd bd-beg"><h-inner>）</h-inner></h-char></span></td><td>19×19<span class="bd-box"><h-char class="bd bd-end"><h-inner>（</h-inner></h-char></span>361 点<span class="bd-box"><h-char class="bd bd-beg"><h-inner>）</h-inner></h-char></span></td></tr><tr><td><strong>平均每步合法走法</strong></td><td>35 种</td><td>250 种</td></tr><tr><td><strong>平均对局步数</strong></td><td>80 步/局</td><td>150 步/局</td></tr><tr><td><strong>状态空间复杂度</strong></td><td><span class="markdown-them-math-inline">$10^{47}$</span> 种可能局面</td><td><span class="markdown-them-math-inline">$10^{170}$</span> 种可能局面</td></tr></tbody></table><div style="color: #666; font-size: 0.9em; text-align: center;">▲ 国际象棋和围棋的复杂度对比</div><p>尽管规则大幅压缩了复杂度<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>围棋的实际状态空间仍是国际象棋的 <span class="markdown-them-math-inline">$10^{123}$</span> 倍<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这是一个巨大的量级差异<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>要知道<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span><strong>宇宙中的所有原子数量大约是 <span class="markdown-them-math-inline">$10^{78}$</span> 个</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>在<span class="markdown-them-math-inline">$10^{47}$</span>范围内的计算<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>依赖 IBM 计算机可以暴力搜索计算出所有可能的走法<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>所以严格意义上来讲<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>深蓝的突破和神经网络<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>模型没有一点关系<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>它只是基于规则的暴力搜索<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>相当于<strong>一个比人类快得多的计算器</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>但<span class="markdown-them-math-inline">$10^{170}$</span>的量级<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>已经远远超出了当前超级计算机的算力<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这迫使 AlphaGo 放弃暴力搜索<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>转而依赖深度学习<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>DeepMind 团队用人类棋谱训练神经网络<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>引入蒙特卡洛树搜索(MCTS)算法<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>从而极大地压缩每一步的搜索空间<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>让模型能根据当前棋盘状态预测下一步棋的最佳走法<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>但是<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span><strong>学习顶尖棋手走法<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>只能让模型的能力接近顶尖棋手<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而无法超越他们</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>AlphaGo 首先用人类棋谱训练模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>然后通过设计一套奖励函数<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>让模型自我对弈进行强化学习<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>和李世石对弈的第二局<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>AlphaGo 的第 19 手棋<span class="bd-box"><h-char class="bd bd-end"><h-inner>（</h-inner></h-char></span>第 37 步<sup class="footnote-ref"><a href="#fn1" id="fnref1">[1]</a></sup><span class="bd-box"><h-char class="bd bd-beg"><h-inner>）</h-inner></h-char></span>让李世石陷入长考<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这步棋也被很多棋手认为是&quot;人类永远不会下的一步&quot;<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>如果没有强化学习和自我对弈<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>只是学习过人类棋谱<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>AlphaGo 永远无法下出这步棋<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>2017 年 5 月<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>AlphaGo 以 3:0 击败了柯洁<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>DeepMind 团队称<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>有一个比它更强的模型还没出战<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span><sup class="footnote-ref"><a href="#fn2" id="fnref2">[2]</a></sup> 他们发现<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>其实根本不需要给 AI 喂人类高手的对局棋谱<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span><strong>只要告诉它围棋的基本规则<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>让模型自我对弈<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>赢了就奖励<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>输了就惩罚</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>模型就能很快从零开始学会围棋并超越人类<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>研究人员把这个模型称为 AlphaZero<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>因为它不需要任何人类知识<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>让我再重复一遍这个不可思议的事实<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>无需任何人类棋局作为训练数据<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>仅靠自我对弈<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>模型就能学会围棋<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>甚至这样训练出的模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>比喂人类棋谱的 AlphaGo 更强大<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>在此之后<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>围棋变成了比谁更像 AI 的游戏<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>因为 AI 的棋力已经超越了人类的认知范围<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>所以<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span><strong>想要超越人类<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>必须让模型摆脱人类经验<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>好恶判断(哪怕是来自最强人类的经验也不行)的限制</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>只有这样才能让模型能够自我博弈<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>真正超越人类的束缚<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>AlphaGo 击败李世石引发了狂热的 AI 浪潮<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>从 2016 到 2020 年<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>巨额的 AI 经费投入最终收获的成果寥寥无几<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>数得过来的的可能只有人脸识别<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>语音识别和合成<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>自动驾驶<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>对抗生成网络等——但这些都算不上超越人类的智能<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>为何如此强大的超越人类的能力<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>却没有在其他领域大放异彩<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span>人们发现<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>围棋这种规则明确<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>目标单一的封闭空间游戏最适合强化学习<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>与之类似的还有 DotA<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>星际争霸<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>王者荣耀<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>斗地主等<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>对比之下<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>现实世界则复杂得多<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>开放空间<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>每一步都有无限种可能<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>没有确定的目标(比如&quot;赢&quot;)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>没有明确的成败判定依据(比如占据棋盘更多区域)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>试错成本也很高<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>自动驾驶一旦出错后果严重<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>AI 领域冷寂了下来<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>直到 ChatGPT 的出现<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h2 id="chatgpt-改变世界">ChatGPT 改变世界</h2><p>ChatGPT 被 The New Yorker 称为网络世界的模糊照片(<code>ChatGPT Is a Blurry JPEG of the Web</code><sup class="footnote-ref"><a href="#fn3" id="fnref3">[3]</a></sup>)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>它所做的只是把整个互联网的文本数据送进一个模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>然后预测下一个字是什_</p><p>这个字最有可能是&quot;么&quot;<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>一个参数量有限的模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>被迫学习几乎无限的知识<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>过去几百年不同语言的书籍<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>过去几十年互联网上产生的文字<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>所以它其实是在做信息压缩<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>将不同语言记载的相同的人类智慧<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>历史事件和天文地理浓缩在一个模型里<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>科学家惊讶地发现<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><strong>在压缩中产生了智能</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>我们可以这么理解<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>让模型读一本推理小说<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>小说的结尾&quot;凶手是_“<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>如果 AI 能准确预测凶手的姓名<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我们有理由相信它读懂了整个故事<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>即它拥有&quot;智能”<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而不是单纯的文字拼贴或死记硬背<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>让模型学习并预测下一个字的过程<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>被称之为<strong>预训练</strong>(Pre-Training)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>此时的模型只能不断预测下一个字<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但不能回答你的问题<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>要实现 ChatGPT 那样的问答<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>需要进行第二阶段的训练<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我们称之为<strong>监督微调</strong>(Supervised Fine-Tuning, SFT)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>此时需要人为构建一批问答数据<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>例如:</p><div class="highlight"><pre class="code"><code><span class="hljs-comment"># 例子一</span>人类:第二次世界大战发生在什么时候?AI:<span class="hljs-number">1939</span>年<span class="hljs-comment"># 例子二</span>人类:请总结下面这段话....&#123;xxx&#125;AI:好的,以下是总结:xxx</code></pre></div><p>值得注意的是<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>以上这些例子是<strong>人工构造的</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>目的是让 AI 学习人类的问答模式<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这样当你说&quot;请翻译这句:xxx&quot;时<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>送给 AI 的内容就是</p><div class="highlight"><pre class="code"><code>人类:请翻译这句:xxxAI:</code></pre></div><p>你看<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>它其实仍然在预测下一个字<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>在这个过程中模型并没有变得更聪明<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>它只是学会了人类的问答模式<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>听懂了你在要求它做什么<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>这还不够<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>因为模型输出的回答有时好<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>有时差<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>有些回答还涉及种族歧视<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>或违反人类伦理(<em>“如何抢银行<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span>”</em>)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>此时我们需要找一批人<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>针对模型输出的几千条数据进行标注<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>给好的回答打高分<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>给违反伦理的回答打负分<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>最终我们可以用这批标注数据训练一个<strong>奖励模型</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>它能判断<strong>模型输出的回答是否符合人类偏好</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>我们用这个<strong>奖励模型</strong>来继续训练大模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>让模型输出的回答更符合人类偏好<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这个过程被称为通过人类反馈的强化学习<span class="bd-box"><h-char class="bd bd-end"><h-inner>（</h-inner></h-char></span>RLHF<span class="bd-box"><h-char class="bd bd-beg"><h-inner>）</h-inner></h-char><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p><strong>总结一下</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>让模型在预测下一个字的过程中产生智能<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>然后通过监督微调来让模型学会人类的问答模式<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>最后通过 RLHF 来让模型输出符合人类偏好的回答<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>这就是 ChatGPT 的大致思路<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h2 id="大模型撞墙">大模型撞墙</h2><p>OpenAI 的科学家们是最早坚信<strong>压缩即智能</strong>的那批人<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>他们认为只要使用更海量优质的数据<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>在更庞大的 GPU 集群上训练更大参数量的模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>就能产生更大的智能<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>ChatGPT 就是在这样的信仰之下诞生的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>Google 虽然做出了 Transformer<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但他们无法进行创业公司那样的豪赌<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>DeepSeek V3 和 ChatGPT 做的事差不多<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>因为美国 GPU 出口管制<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>聪明的研究者被迫使用了更高效的训练技巧(MoE/FP8)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>他们也拥有顶尖的基础设施团队<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>最终只用了 550 万美元就训练了比肩 GPT-4o 的模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>后者的训练成本超过 1 亿美元<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>但本文重点是 R1<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>这里想说的是<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>人类产生的数据在 2024 年底已经被消耗殆尽了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>模型的尺寸可以随着 GPU 集群的增加<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>轻易扩大 10 倍甚至 100 倍<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但人类每一年产生的新数据<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>相比现有的几十年<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>过去几百年的数据来说<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>增量几乎可以忽略不计<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>而按照 Chinchilla 扩展定律<span class="bd-box"><h-char class="bd bd-end"><h-inner>（</h-inner></h-char></span>Scaling Laws<span class="bd-box"><h-char class="bd bd-beg"><h-inner>）</h-inner></h-char><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>每增加一倍模型大小<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>训练数据的数量也应增加一倍<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>这就导致了<strong>预训练撞墙</strong>的事实<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>模型体积虽然增加了 10 倍<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但我们已经无法获得比现在多 10 倍的高质量数据了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>GPT-5 迟迟不发布<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>国产大模型厂商不做预训练的传闻<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>都和这个问题有关<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h2 id="rlhf-并不是 rl">RLHF 并不是 RL</h2><p>另一方面<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>基于人类偏好的强化学习(RLHF)最大的问题是<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>普通人类的智商已经不足以评估模型结果了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>在 ChatGPT 时代<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>AI 的智商低于普通人<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>所以 OpenAI 可以请大量廉价劳动力<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>对 AI 的输出结果进行评测<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>好/中/差<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但很快随着 GPT-4o/Claude 3.5 Sonnet 的诞生<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>大模型的智商已经超越了普通人<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>只有专家级别的标注人员<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>才有可能帮助模型提升<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>且不说聘请专家的成本<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>那专家之后呢<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span>终究有一天<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>最顶尖的专家也无法评估模型结果了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>AI 就超越人类了吗<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span>并不是<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>AlphaGo 对李世石下出第 19 手棋<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>从人类偏好来看<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这步棋绝不可能赢<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>所以如果让李世石来做人类反馈(Human Feedback, HF)评价 AI 的这步棋<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>他很可能也会给出负分<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>这样<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span><strong>AI 就永远无法逃出人类思维的枷锁</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>你可以把 AI 想象成一个学生<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>给他打分的人从高中老师变成了大学教授<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>学生的水平会变高<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但几乎不可能超越教授<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>RLHF 本质上是一种讨好人类的训练方式<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>它让模型输出符合人类偏好<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但同时它扼杀了<strong>超越人类</strong>的可能性<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>关于 RLHF 和 RL<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>最近 Andrej Karpathy 也发表了类似的看法<sup class="footnote-ref"><a href="#fn4" id="fnref4">[4]</a></sup>:</p><blockquote><p>AI 和儿童一样<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>有两种学习模式<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>1<span class="bd-box"><h-char class="bd bd-beg"><h-inner>）</h-inner></h-char></span>通过模仿专家玩家来学习<span class="bd-box"><h-char class="bd bd-end"><h-inner>（</h-inner></h-char></span>观察并重复<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>即预训练<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>监督微调<span class="bd-box"><h-char class="bd bd-beg"><h-inner>）</h-inner></h-char><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>2<span class="bd-box"><h-char class="bd bd-beg"><h-inner>）</h-inner></h-char></span>通过不断试错和强化学习来赢得比赛<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我最喜欢的简单例子是 AlphaGo<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>几乎每一个深度学习的惊人结果<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>以及所有<em>魔法</em>的来源总是 2<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>强化学习<span class="bd-box"><h-char class="bd bd-end"><h-inner>（</h-inner></h-char></span>RL<span class="bd-box"><h-char class="bd bd-beg"><h-inner>）</h-inner></h-char></span>很强大<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但强化学习与人类反馈<span class="bd-box"><h-char class="bd bd-end"><h-inner>（</h-inner></h-char></span>RLHF<span class="bd-box"><h-char class="bd bd-beg"><h-inner>）</h-inner></h-char></span>并不相同<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>RLHF 不是 RL<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p></blockquote><p>附上我之前的一条想法<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><p><img src="/images/2025-01-30/rlhf.jpg" alt=""></p><h2 id="openai-的解法">OpenAI 的解法</h2><p>丹尼尔·卡尼曼在<span class="bd-box"><h-char class="bd bd-end"><h-inner>《</h-inner></h-char></span>思考快与慢<span class="bd-box"><h-char class="bd bd-beg"><h-inner>》</h-inner></h-char></span>里提出<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>人脑对待问题有两种思考模式<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>一类问题不经过脑子就能给出回答<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>也就是<strong>快思考</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>一类问题需要类似围棋的长考才能给出答案<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>也就是<strong>慢思考</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>既然训练已经到头了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>那可否从推理<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>也就是给出回答的时候<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>通过增加思考时间<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>从而让回答质量变好呢<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span>这其实也有先例<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>科学家很早就发现<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>给模型提问时加一句<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>“让我们一步一步思考”(“Let’s think step by step”)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>可以让模型输出自己的思考过程<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>最终给出更好的结果<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这被称为<strong>思维链</strong>(Chain-of-Thought, CoT)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>2024 年底大模型预训练撞墙后<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span><strong>使用强化学习<span class="bd-box"><h-char class="bd bd-end"><h-inner>（</h-inner></h-char></span>RL<span class="bd-box"><h-char class="bd bd-beg"><h-inner>）</h-inner></h-char></span>来训练模型思维链</strong>成为了所有人的新共识<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>这种训练极大地提高了某些特定<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>客观可测量任务<span class="bd-box"><h-char class="bd bd-end"><h-inner>（</h-inner></h-char></span>如数学<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>编码<span class="bd-box"><h-char class="bd bd-beg"><h-inner>）</h-inner></h-char></span>的性能<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>它需要从普通的预训练模型开始<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>在第二阶段使用强化学习训练推理思维链<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这类模型被称为 <strong>Reasoning 模型</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>OpenAI 在 2024 年 9 月发布的 o1 模型以及随后发布的 o3 模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>都是 Reasoning 模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>不同于 ChatGPT 和 GPT-4/4o<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>在 o1/o3 这类 Reasoning 模型 的训练过程中<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span><strong>人类反馈已经不再重要了</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>因为可以自动评估思考结果<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>从而给予奖励/惩罚<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>Anthropic 的 CEO 在昨天的文章中<sup class="footnote-ref"><a href="#fn5" id="fnref5">[5]</a></sup>用<em>转折点</em>来形容这一技术路线<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>存在一个强大的新范式<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>它处于 Scaling Law 的早期<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>虽然 OpenAI 并没有公布他们的强化学习算法细节<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但最近 DeepSeek R1 的发布<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>向我们展示了一种可行的方法<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h2 id="deepseek-r1-zero">DeepSeek R1-Zero</h2><p>我猜 DeepSeek 将自己的纯强化学习模型命名为 R1-Zero 也是在致敬 AlphaZero<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>那个通过自我对弈<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>不需要学习任何棋谱就能超越最强棋手的算法<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>要训练慢思考模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>首先要构造质量足够好的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>包含思维过程的数据<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>并且如果希望强化学习不依赖人类<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>就需要对模型的思考过程进行定量(好/坏)评估<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>从而给予奖励和惩罚<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>正如上文所说<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>数学和代码这两个数据集最符合要求<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>数学公式的推导能通过正则表达式来验证是否正确<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而代码的输出结果以通过直接在编译器上运行来检验<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>举个例子<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>在数学课本中<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我们经常看到这样的推理过程<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><div class="highlight"><pre class="code"><code>&lt;思考&gt;  设方程根为x, 两边平方得: x² = a - √(a+x)  移项得: √(a+x) = a - x²  再次平方: (a+x) = (a - x²)²  展开: a + x = a² - 2a x² + x⁴  整理: x⁴ - 2a x² - x + (a² - a) = <span class="hljs-number">0</span>&lt;/思考&gt;&lt;回答&gt;x⁴ - 2a x² - x + (a² - a) = <span class="hljs-number">0</span>&lt;/回答&gt;</code></pre></div><p>上面这段文本就包含了一个完整的思维链<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>在训练强化学习(RL)时<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>R1 并没有显式地对思维链的每一步进行奖励和惩罚(<em>这也被称为过程奖励模型</em>, PRM)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而是选择<strong>只对结果进行奖励</strong>(ORM)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>也就是<strong>只要结果对了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>有思考过程</strong>就得分<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>思考的内容是什么并不重要<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>DeepSeek 团队发现<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span><strong>对过程奖励</strong>可能的问题是<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>思考过程到底分几步<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span>这个数值对不同复杂度的任务是不同的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>同时<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>每一步的正确性很难量化<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>有些错误的思考常常会启发模型往最终正确的方向走<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而且<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>如果给思考过程打分<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>模型可能会只专注于生成正确的过程而不顾结果(reward hacking)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>就像考试时先把公式列出来得分<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这对让模型学习如何解题反而是有害的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>而像 AlphaGo 那样使用<strong>蒙特卡洛树搜索</strong>(MCTS)的问题类似<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>围棋的状态空间虽然很大<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但每一步的走法是有限<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>可枚举的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而大模型的推理思考过程是开放<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>无限的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>和 PRM 类似<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>使用 MCTS 训练大模型也很难评估思考过程每一步的正确性<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>除了只对结果进行奖励<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>研究人员还创造了一种名为<strong>GRPO</strong>(<strong>组相对策略优化</strong>)的强化学习算法<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>给<strong>包含思维链<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>且结果正确</strong>的输出打高分<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>从而隐式鼓励模型形成思维链<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p><a href="https://x.com/virattt/status/1885102056546910672">这个帖子</a>用一个很好的例子解释了 GRPO 的原理<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我翻译一下<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>让模型同时生成多个回答<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>计算每个回答的得分<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>组内对比出有相对优势的回答<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>对模型进行 RL 训练<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>使其倾向于得分更高的回答<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><blockquote><p>以问题<span class="markdown-them-math-inline">$2+3=?$</span>为例</p></blockquote><p><strong>第一步</strong>: 模型生成多个回答<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><ul><li>“5”</li><li>“6”</li><li>“&lt;思考&gt;2+3=5&lt;/思考&gt;&lt;结果&gt;5&lt;/结果&gt;”</li></ul><p><strong>第二步</strong>: 对每个回答进行打分<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><ul><li>“5” → 1 分 (<strong>正确, 没有思维链</strong>)</li><li>“6” → 0 分 (<strong>错误</strong>)</li><li>“&lt;思考&gt;2+3=5&lt;/思考&gt;&lt;结果&gt;5&lt;/结果&gt;” → 2 分 (<strong>正确, 有思维链</strong>)</li></ul><p><strong>第三步</strong>: 计算所有回答的平均得分<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><ul><li>平均得分 = (1 + 0 + 2) / 3 = 1</li></ul><p><strong>第四步</strong>: 将每个回答的得分与平均得分进行比较<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><ul><li>“5” → 1 - 1 = 0 (<strong>与平均分相同</strong>)</li><li>“6” → 0 - 1 = -1 (<strong>低于平均分</strong>)</li><li>“&lt;思考&gt;2+3=5&lt;/思考&gt;&lt;结果&gt;5&lt;/结果&gt;” → 2 - 1 = 1 (<strong>高于平均分</strong>)</li></ul><p><strong>第五步</strong>: 强化学习<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>使模型倾向于生成得分更高的回答<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>也就是<strong>包含思维链<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>且结果正确</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>以上就是 GRPO 的大致原理<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>他们基于 V3 模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>在数学和代码这两类数据上用 GRPO 进行 RL 训练<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>最终得到的 R1-Zero 模型在各项推理指标上相比 DeepSeek V3 显著提升<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>证明仅通过 RL 就能激发模型的推理能力<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>这是<strong>另一个 AlphaZero 时刻</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>在 R1-Zero 的训练过程<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>完全不依赖人类的智商<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>经验和偏好<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>仅靠 RL 去学习那些客观<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>可测量的人类真理<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>最终让推理能力远强于所有非 Reasoning 模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>在训练过程中<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>他们发现一个有趣的现象<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>虽然并没有对输出长度进行奖励<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但<strong>模型在训练过程中逐渐<span class="bd-box"><h-char class="bd bd-end"><h-inner>“</h-inner></h-char></span>自发<span class="bd-box"><h-char class="bd bd-beg"><h-inner>”</h-inner></h-char></span>地输出更长的结果</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这也符合常理<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>复杂的问题通常需要更长的思考过程才能解决<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p><img src="/images/2025-01-30/r1-zero-think-length.jpg" alt="DeepSeek R1-Zero 的回答长度在训练过程中逐渐增加"></p><p>但 R1-Zero 模型只是单纯地进行强化学习<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>并没有进行监督学习<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>所以它没有学会人类的问答模式<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>无法回答人类的问题<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>并且<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>它在思考过程中<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>存在<strong>语言混合问题</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>一会儿说英语<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>一会儿说中文<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>可读性差<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>所以 DeepSeek 团队<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><ol><li>先收集了少量高质量的 Chain-of-Thought<span class="bd-box"><h-char class="bd bd-end"><h-inner>（</h-inner></h-char></span>CoT<span class="bd-box"><h-char class="bd bd-beg"><h-inner>）</h-inner></h-char></span>数据<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>对 V3 模型进行初步的监督学习(SFT)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span><strong>解决了输出语言不一致问题</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>得到冷启动模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></li><li>然后<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>他们在这个冷启动模型上进行类似 R1-Zero 的<strong>纯 RL 训练</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>并加入语言一致性奖励<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>此时的模型推理能力大幅增强<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>语言一致性问题也得以改善<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></li><li>接着<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>为了适应更普遍<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>广泛的<strong>非推理任务</strong><span class="bd-box"><h-char class="bd bd-end"><h-inner>（</h-inner></h-char></span>如写作<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>事实问答<span class="bd-box"><h-char class="bd bd-beg"><h-inner>）</h-inner></h-char><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>他们构造了一组数据对模型进行二次训练<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>因为写作这类任务的模型输出好坏很难评估<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>所以他们没有使用 RL 训练<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而是使用监督学习(SFT)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></li><li>最后<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>同时训练推理和通用任务数据<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>目的是移除<em>无用或者有害</em>的思考过程<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>同时保持推理能力<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>对于推理任务<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>他们遵循 R1-Zero 的 RL 训练方式<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>对于写作任务<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>他们从 DeepSeek V3<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>R1-Zero 生成的数据中<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>挑选出人类偏好的结果作为奖励进行训练(类似 RLHF)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></li></ol><p>这个过程大概就是<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><div class="highlight"><pre class="code"><code>监督学习<span class="hljs-comment">(SFT)</span> - 强化学习<span class="hljs-comment">(RL)</span> - 监督学习<span class="hljs-comment">(SFT)</span> - 强化学习<span class="hljs-comment">(RL)</span></code></pre></div><p>经过以上过程<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>就得到了 DeepSeek R1<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>DeepSeek R1 给世界的贡献是开源世界上第一个比肩闭源(o1)的 Reasoning 模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>现在全世界的用户都可以看到模型在回答问题前的推理过程<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>也就是&quot;内心独白&quot;<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>并且完全免费<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>更重要的是<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>它向研究者们揭示了 OpenAI 一直在隐藏的秘密<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><strong>强化学习可以不依赖人类反馈<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>纯 RL 也能训练出最强的 Reasoning 模型</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>所以在我心目中<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>R1-Zero 比 R1 更有意义<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h2 id="对齐人类品味-vs 超越人类">对齐人类品味 VS 超越人类</h2><p>几个月前<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我读了 Suno 和 Recraft 创始人们的访谈<sup class="footnote-ref"><a href="#fn6" id="fnref6">[6]</a></sup><sup class="footnote-ref"><a href="#fn7" id="fnref7">[7]</a></sup><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>Suno 试图让 AI 生成的音乐更悦耳动听<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>Recraft 试图让 AI 生成的图像更美<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>更有艺术感<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>读完后我有一个朦胧的感觉<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><strong>将模型对齐到人类品味而非客观真理<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>似乎就能避开真正残酷的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>性能可量化的大模型竞技场</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>每天跟所有对手在 AIME<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>SWE-bench 这些榜单上竞争多累啊<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而且不知道哪天一个新模型出来自己就落后了(<em>你还记得 Mistral 吗<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span></em>)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但人类品味就像时尚<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>不会提升<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>只会改变<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>Suno/Recraft 们显然是明智的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>他们只要让行业内最有品味的音乐人和艺术家们满意(当然这也很难)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>榜单并不重要<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>对齐人类品味的坏处也很明显<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>你的努力和心血带来的效果提升也很难被量化<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>比如<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>Suno V4 真的比 V3.5 更好吗<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span>我的经验是 V4 只是音质提升了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>创造力并没有提升<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>并且<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span><strong>依赖人类品味的模型注定无法超越人类</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>如果 AI 推导出一个超越当代人类理解范围的数学定理<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>它会被奉为上帝<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但如果 Suno 创造出一首人类品味和理解范围外的音乐<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>在普通人耳朵里听起来可能就只是单纯的噪音<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>对齐客观真理的竞争痛苦但让人神往<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>因为它有超越人类的可能<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h2 id="对质疑的一些反驳">对质疑的一些反驳</h2><blockquote><p>DeepSeek 的 R1 模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>是否真的超越了 OpenAI<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span></p></blockquote><p>从指标上看<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>R1 的推理能力<strong>超越了所有的非 Reasoning 模型</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>也就是 ChatGPT/GPT-4/4o 和 Claude 3.5 Sonnet<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>与同为 Reasoning 模型 的 o1<strong>接近</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span><strong>逊色于 o3</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但 o1/o3 都是闭源模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>很多人的实际体验可能不同<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>因为 Claude 3.5 Sonnet 在对用户意图理解上更胜一筹<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><blockquote><p>DeepSeek 会收集用户聊天内容用于训练</p></blockquote><p>很多人有个误区<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>认为类似 ChatGPT 这类聊天软件会通过收集用户聊天内容用于训练而变得更聪明<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>其实不然<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>如果真是这样<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>那么微信和 Messenger 就能做出世界上最强的大模型了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>相信你看完这篇文章之后就能意识到<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>大部分普通用户的日常聊天数据已经不重要了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>RL 模型只需要在非常高质量的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>包含思维链的推理数据上进行训练<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>例如数学和代码<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>这些数据可以通过模型自己生成<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>无需人类标注<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>因此 做模型数据标注的公司 Scale AI 的 CEO Alexandr Wang 现在很可能正如临大敌<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>未来的模型对人类标注需求会越来越少<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p><em>更新<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>ARC-AGI 的<a href="https://arcprize.org/blog/r1-zero-r1-results-analysis">这篇分析 r1-zero 的文章</a>暗示了一个新想法<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>未来的 Reasoning 模型可以收集用户和模型聊天时 AI 生成的思维链来训练——和人们假想的 AI 偷偷用聊天记录训练不同<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>用户说了什么其实不重要<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>在他们付费得到结果的同时<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>模型 0 成本增加了一条推理思维链数据<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></em></p><blockquote><p>DeepSeek R1 厉害是因为偷偷蒸馏了 OpenAI 的模型</p></blockquote><p>R1 最主要的性能提升来自强化学习<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>你可以看到纯 RL<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>不需要监督数据的 R1-Zero 模型在推理能力上也很强<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>而 R1 在冷启动时使用了一些监督学习数据<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>主要是用于解决语言一致性问题<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这些数据并不会提升模型的推理能力<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>另外<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>很多人对<em>蒸馏</em>有误解<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>蒸馏通常是指<em>用一个强大的模型作为老师(Teacher)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>将它的输出结果用于指导一个参数更小<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>性能更差的学生(Student)模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>让学生模型直接背答案变得更强</em><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>例如 R1 模型可以用于蒸馏 LLama-70B<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span><strong>蒸馏的学生模型性能几乎一定比老师模型更差<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但 R1 模型在某些指标性能比 o1 更强</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>所以说 R1 的性能源于蒸馏 o1 是非常愚蠢的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><blockquote><p>我问 DeepSeek 它 说自己是 OpenAI 的模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>所以它是套壳的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p></blockquote><p>大模型在训练时并不知道<strong>当前的时间</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span><strong>自己究竟被谁训练</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span><strong>训练自己的机器是 H100 还是 H800</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>DeepSeek 之所以会回答自己是 ChatGPT<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>是因为它的训练数据中包含&quot;我是 ChatGPT&quot;之类的语料<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>X 上有位用户给出了精妙的比喻<sup class="footnote-ref"><a href="#fn8" id="fnref8">[8]</a></sup><span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><em>这就像你问一个 Uber 乘客<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>他坐的这辆车轮胎是什么品牌</em><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>模型没有理由知道这些信息<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h2 id="一些感受">一些感受</h2><p>AI 终于除掉了人类反馈的枷锁<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>DeepSeek R1-Zero 展示了如何使用几乎不使用人类反馈来提升模型性能的方法<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这是它的 AlphaZero 时刻<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>很多人曾说&quot;人工智能<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>有多少人工就有多少智能&quot;<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这个观点可能不再正确了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>如果模型能根据直角三角形推导出勾股定理<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我们有理由相信它终有一天<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>能推导出现有数学家尚未发现的定理<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>写代码是否仍然有意义<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span>我不知道<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>今早看到 Github 上热门项目 llama.cpp<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>一个代码共享者提交了 PR<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>表示他通过对 SIMD 指令加速<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>将 WASM 运行速度提升 2 倍<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而其中 99%的代码由 DeepSeek R1 完成<sup class="footnote-ref"><a href="#fn9" id="fnref9">[9]</a></sup><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这肯定不是初级工程师级别的代码了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我无法再说 AI 只能取代初级程序员<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p><img src="/images/2025-01-30/ggml-speedup.jpg" alt="ggml : x2 speed for WASM by optimizing SIMD"></p><p>当然<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我仍然对此感到非常高兴<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>人类的能力边界再次被拓展了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>干得好 DeepSeek<span class="bd-box"><h-char class="bd bd-beg"><h-inner>！</h-inner></h-char></span>它是目前世界上最酷的公司<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h2 id="参考资料">参考资料</h2><hr class="footnotes-sep"><section class="footnotes"><ol class="footnotes-list"><li id="fn1" class="footnote-item"><p><a href="https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol">Wikipedia: AlphaGo versus Lee Sedol</a> <a href="#fnref1" class="footnote-backref">↩︎</a></p></li><li id="fn2" class="footnote-item"><p><a href="https://discovery.ucl.ac.uk/id/eprint/10045895/1/agz_unformatted_nature.pdf">Nature: Mastering the game of Go without human knowledge</a> <a href="#fnref2" class="footnote-backref">↩︎</a></p></li><li id="fn3" class="footnote-item"><p><a href="https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web">The New Yorker: ChatGPT is a blurry JPEG of the web</a> <a href="#fnref3" class="footnote-backref">↩︎</a></p></li><li id="fn4" class="footnote-item"><p><a href="https://x.com/karpathy/status/1883941452738355376">X: Andrej Karpathy</a> <a href="#fnref4" class="footnote-backref">↩︎</a></p></li><li id="fn5" class="footnote-item"><p><a href="https://darioamodei.com/on-deepseek-and-export-controls">On DeepSeek and Export Controls</a> <a href="#fnref5" class="footnote-backref">↩︎</a></p></li><li id="fn6" class="footnote-item"><p><a href="https://mp.weixin.qq.com/s/aCbPlmg8D4FZpJwq5TWyfA">Suno 创始人访谈<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>至少对音乐来说<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>Scaling Law 不是万灵药</a> <a href="#fnref6" class="footnote-backref">↩︎</a></p></li><li id="fn7" class="footnote-item"><p><a href="https://mp.weixin.qq.com/s/ost8DMH3hJssLx5-ngRlXQ">Recraft 专访<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>20 人<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>8 个月做出了最好的文生图大模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>目标是 AI 版的 Photoshop</a> <a href="#fnref7" class="footnote-backref">↩︎</a></p></li><li id="fn8" class="footnote-item"><p><a href="https://x.com/theo/status/1884616582393184322">X: DeepSeek forgot to censor their bot from revealing they use H100 not H800.</a> <a href="#fnref8" class="footnote-backref">↩︎</a></p></li><li id="fn9" class="footnote-item"><p><a href="https://github.com/ggerganov/llama.cpp/pull/11453">ggml : x2 speed for WASM by optimizing SIMD</a> <a href="#fnref9" class="footnote-backref">↩︎</a></p></li></ol></section>]]></content>
    
    
      
      
        
        
    <summary type="html">&lt;blockquote&gt;
&lt;p&gt;我本想写一篇关于 DeepSeek R1 的科普文&lt;span class=&quot;bd-box&quot;&gt;&lt;h-char class=&quot;bd bd-beg&quot;&gt;&lt;h-inner&gt;，&lt;/h-inner&gt;&lt;/h-char&gt;&lt;/span&gt;但发现很多人仅仅把它理解为</summary>
        
      
    
    
    
    
  </entry>
  
  <entry>
    <title>现在的AI真正给我带来了什么</title>
    <link href="https://mazzzystar.com/2024/10/30/What-has-AI-really-brought-to-me-zh/"/>
    <id>https://mazzzystar.com/2024/10/30/What-has-AI-really-brought-to-me-zh/</id>
    <published>2024-10-30T13:36:15.000Z</published>
    <updated>2026-04-10T13:45:08.740Z</updated>
    
    <content type="html"><![CDATA[<p>现在是 2024 年底<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我已经完全厌倦了每天 AI 又颠覆了哪个行业的耸人听闻的消息<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我想说说<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>从我的视角<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>AI 给我<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>和我看到真实的人<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>所带来的真正改变<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p><strong>首先是一个拥有无尽知识的老师</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>人类历史上第一次拥有了一个百科全书式<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>全知全能的角色<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>每一个人都可以与之对谈<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这个角色在过去被称作上帝<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>假如我们把知识分解成下面这个公式<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>知识 = 好奇心 * 学习成本<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>那么 ChatGPT 把学习成本降低了一个数量级<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我记得 2013 年我还在上大学<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>在宿舍里我经常用百度知道回答问题<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>最后赚了几百个积分<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这些积分转头又被我用于百度文库下载 xx 申请书模板<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>现在这些都不需要了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>这带来的后果就是<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span><strong>人类之间不再互相提问了</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我不确定这是不是一件好事<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span> StackOverflow 在 ChatGPT 发布半年后<a href="https://www.reddit.com/r/ChatGPT/comments/15ju114/chatgpt_is_putting_stack_overflow_out_of_business/">流量下降了 50%</a><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>知乎/Quora 类的产品也许能靠<em>避开知识问答<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>转向情感共鸣</em>勉强苟活<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我之所以觉得<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>提问的缺失这件事很重要<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>不仅仅是因为费曼学习法(<em>回答别人的问题能让自己增进理解</em>)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>还因为<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我们可能永远不再像过去那样尊重能给我带来知识的人了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>初中历史课上<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我会觉得满腹经纶<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>从拜占庭帝国灭亡讲到英国人在约旦河旁分治的老师<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>好有人格魅力<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>事实上<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>对知识的崇拜从维基百科诞生开始就逐渐开始松动<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>然后在 ChatGPT 时代崩塌<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>未来我们会更加崇拜有观点而不是有知识的人<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>然后是<strong>写代码</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这是我的本职工作<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>在过去一年<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我看到很多声称找到了 PMF 的 AI 产品<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但于我而言<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>除大模型以外<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>只有 Cursor<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>虽然现在的 AI 仍然无法解决<strong>Transformer 结构性能优化</strong>这样需要深刻洞察的问题<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但 95%的程序员也不能<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>一年半以前我在<a href="https://mazzzystar.com/2023/05/10/LLM-for-person-zh">大语言模型对个人的杠杆</a>写过这样一段话<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><blockquote><p>每当我有新 idea<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>就会让 GPT-4 写个最初级的版本<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我反馈<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>它道歉<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>一点点优化到我心中的 1.0 版本<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我会每天多次用完 GPT-4 的配额(25 条/3 小时)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>有了 GPT-4 的加持<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我无所不能<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我有种朦胧的感觉<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>不能给自己设限<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>也许很快<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我会用上由设计师<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>律师或者电工开发的产品<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p></blockquote><p>这个预言终于在今年 Cursor + Claude 3.5/GPT-4o 大规模应验<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我看到了很多设计师<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>产品经理甚至艺术家上线了人生中第一个应用<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>这带给我另一个思考<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>假如我们每个人都能<strong>一句话就创建一个应用</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>那么 App/插件/网页的意义在哪里<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span>假如有一种浏览器<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>对一些通用<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>简单的需求<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>可以自动生成临时的工具<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>google 上很多工具站会死吗<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span></p><p>我甚至丝毫不怀疑(甚至我想过这么做)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>有数以千计的开发者<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>借助 Cursor + Claude 3.5 Sonnet 正在每天上架一个新 App<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>一年上架 300 个 App<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但这事意义在哪里呢<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span>总有种末日狂欢的感觉<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>然后是<strong>人类情感的存放</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>这是历史上第一次<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>每个普通人享受到了精确到字符级别<span class="bd-box"><h-char class="bd bd-end"><h-inner>“</h-inner></h-char></span>只为满足我<span class="bd-box"><h-char class="bd bd-beg"><h-inner>”</h-inner></h-char></span>而产生的内容<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>哪怕是在推荐算法的年代<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我们也是巨大的矩阵中一个小小的数字而已<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>没有人真心对我嘘寒问暖过<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>推荐算法提供了最高效的分发<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>变现和时间掠夺<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但从没问过我现在是开心还是难过<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>这是我认为现在 AI 陪伴产品爆火的主要原因<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>因为过去<strong>根本没人在乎我</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我的难过<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>我的秘密<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>我心底最深处的渴望<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>都找不到倾听者<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>这时<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>有一个随时回应<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>只关心我今天过得是否开心的虚拟角色<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>它和我没有任何现实利益冲突<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>我不必伪装<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>害怕说错话或者暴露隐私<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>谁不爱呢<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span></p><p>和上文&quot;拥有无尽知识的老师&quot;类似<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>拥有能消解所有情感需求的 AI 伴侣后<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我们也好像不需要真实的伴侣了——相比之下<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>他们让我感到金钱压力<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>猜忌<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>还常常无法提供情绪价值<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>于是这里产生了一个巨大的分歧<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><strong>到底我们应该让 AI 弥补人类之间的裂痕<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>还是创造一个人和 AI 谈恋爱的世界<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span></strong></p><p>这两种价值观<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>塑造了两类产品<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>一类产品是 AI 情感导师(AI relationship coach)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>给情侣之间的矛盾当中间人<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>或 AI dating<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>让 AI 帮助陌生人更丝滑地连接<span class="bd-box"><h-char class="bd bd-beg"><h-inner>；</h-inner></h-char></span>另一类可以统称为 Her<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>用尽一切办法让 AI 有更 3D 的外观<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>更逼真的声音<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>更实时的互动<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>更像真人的说话语气<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>我希望前者能成功<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>后者实在令人绝望<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>可问题是<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>相比填平人心之间的沟壑<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>创造一个完美无瑕的虚拟角色容易太多了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>人类很容易滑向摆烂的一边<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>我突然好奇到时候会是怎样的世界<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p>]]></content>
    
    
      
      
        
        
    <summary type="html">&lt;p&gt;现在是 2024 年底&lt;span class=&quot;bd-box&quot;&gt;&lt;h-char class=&quot;bd bd-beg&quot;&gt;&lt;h-inner&gt;，&lt;/h-inner&gt;&lt;/h-char&gt;&lt;/span&gt;我已经完全厌倦了每天 AI 又颠覆了哪个行业的耸人听闻的消息&lt;span</summary>
        
      
    
    
    
    
  </entry>
  
  <entry>
    <title>What AI Has Really Brought to My Life</title>
    <link href="https://mazzzystar.com/2024/10/30/What-AI-Has-Really-Brought-to-My-Life/"/>
    <id>https://mazzzystar.com/2024/10/30/What-AI-Has-Really-Brought-to-My-Life/</id>
    <published>2024-10-30T11:22:19.000Z</published>
    <updated>2026-04-10T13:45:08.740Z</updated>
    
    <content type="html"><![CDATA[<p>As we approach the end of 2024, I’m completely tired of the sensational news about AI disrupting yet another industry every day. I want to share, from my perspective, the real changes that AI has brought to me and the people I see around me.</p><p><strong>First, it’s become a teacher with endless knowledge</strong>. For the first time in human history, we have access to an encyclopedic, omniscient figure that anyone can converse with—a role that was previously attributed to God. If we break down knowledge into this formula: Knowledge = Curiosity * Learning Cost, then ChatGPT has reduced the learning cost by an order of magnitude. I remember back in 2013 when I was in college, I often answered questions from others to earn points, which I then used to download resources from some website. None of that is necessary anymore.</p><p>The consequence is that <strong>humans no longer ask each other questions</strong>, and I’m not sure if this is a good thing. StackOverflow’s traffic <a href="https://www.reddit.com/r/ChatGPT/comments/15ju114/chatgpt_is_putting_stack_overflow_out_of_business/">dropped by 50%</a> within six months of ChatGPT’s release. Products like Quora might barely survive by <em>pivoting away from knowledge Q&amp;A towards emotional resonance</em>. I think this lack of questioning is significant, not just because of the Feynman learning technique (<em>teaching others helps deepen our own understanding</em>), but because we may never respect knowledge-bearers the same way again. In middle school history class, I was captivated by teachers who could eloquently connect the fall of the Byzantine Empire to historical events in the region around the Jordan River. In fact, our reverence for knowledge began eroding with Wikipedia’s birth and has now collapsed in the ChatGPT era. In the future, we’ll likely revere those with opinions rather than those with knowledge.</p><p>Then there’s <strong>coding</strong>, which is my profession. Over the past year, I’ve seen countless AI products claiming to have found Product-Market Fit, but for me, only Cursor stands out. Of course, current AI still can’t solve problems requiring deep insights like <strong>Transformer architecture performance optimization</strong>, but neither can 95% of programmers. A year and a half ago, when writing about <a href="https://mazzzystar.com/2023/05/10/LLM-for-individual/">The Leverage of LLMs for Individuals</a>, I wrote:</p><blockquote><p>I have a vague feeling: don’t limit yourself—soon, I might be using products developed by designers, lawyers, or electricians.</p></blockquote><p>This prediction has finally materialized this year with Cursor + Claude 3.5/GPT-4o. I’ve seen many designers, product managers, and even artists launch their first applications. This leads to another thought: if we can each <strong>create an application with just one sentence</strong>, what’s the purpose of Apps/plugins/websites? If there was a browser that could automatically generate temporary tools for common, simple needs, would many tool websites on Google die?</p><p>I don’t doubt (and have even considered doing this myself) that thousands of developers are launching a new App every day using Cursor + Claude 3.5 Sonnet, potentially 300 Apps per year. But what’s the point? It feels like an apocalyptic carnival.</p><p>Then there’s the <strong>storage of human emotions</strong>. For the first time in history, every ordinary person can enjoy content generated “just for me” with character-level precision. Even in the era of recommendation algorithms, we were just tiny numbers in a massive matrix—no one truly cared about our well-being. Recommendation algorithms provided efficient distribution, monetization, and time exploitation, but never asked if we were happy or sad.</p><p>This is why I believe <strong>AI companionship</strong> products are booming now—because in the past, <strong>nobody really cared about me</strong>. My sadness, my secrets, my deepest desires couldn’t find listeners. Now, there’s a virtual character that responds anytime and only cares about whether I’m happy today. It has no real-world conflicts of interest with me—I don’t need to pretend, fear saying wrong things, or worry about exposing privacy. Who wouldn’t love that?</p><p>Similar to the earlier point about “teachers with endless knowledge,” having AI companions that can address all emotional needs makes it seem like we don’t need real companions anymore—who, in comparison, bring financial pressure, suspicion, and often fail to provide emotional value. This creates a huge divergence: <strong>Should we let AI bridge the gaps between humans, or create a world where people date AI?</strong></p><p>These two value systems have shaped two types of products: AI relationship coaches, serving as mediators for couples’ conflicts, or AI dating, helping strangers connect more smoothly; and the other category, collectively called “Her,” using every means to give AI more 3D appearances, more realistic voices, more real-time interactions, and more human-like speaking patterns.</p><p>I hope the former succeeds; the latter is truly desperate. The problem is, compared to bridging the gaps between human hearts, creating a perfect virtual character is far easier, and humans tend to slide toward the path of least resistance.</p><p>I suddenly wonder what kind of world that will be.</p>]]></content>
    
    
      
      
        
        
    <summary type="html">&lt;p&gt;As we approach the end of 2024, I’m completely tired of the sensational news about AI disrupting yet another industry every day. I want</summary>
        
      
    
    
    
    
  </entry>
  
  <entry>
    <title>拥有个人博客最简单的方式</title>
    <link href="https://mazzzystar.com/2024/09/26/The-easiest-way-to-have-a-blog-zh/"/>
    <id>https://mazzzystar.com/2024/09/26/The-easiest-way-to-have-a-blog-zh/</id>
    <published>2024-09-26T17:52:10.000Z</published>
    <updated>2026-04-10T13:45:08.740Z</updated>
    
    <content type="html"><![CDATA[<p>2016 年<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我注册了一个 10 年的域名<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>开始写博客<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我主要写机器学习技术<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>反向传播公式推导<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>RNN/LSTM 原理<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>每篇文章都画插图<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>写 LaTex 公式<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>倾注了我大量心血<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我用 Jekyll<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>选了个漂亮主题<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>反复调整字体<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>公式和布局<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>三年后<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我竟然忘了博客的地址<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>2018 年到 2022 年<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我完全停止了写作<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>一想到写博客就要买域名<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>改字体<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>部署就觉得麻烦<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>直到有天<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我打算写篇解释 Stable Diffusion 原理的<a href="https://mazzzystar.com/2022/09/07/stable-diffusion/">文章</a><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这次<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我想用最不折腾的方式写东西<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>我选择了 Hexo<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>流程是这样的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><ol><li>用<code>hexo new 'title'</code>创建文章</li><li>用<code>hexo g</code>渲染<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>localhost:4000 预览</li><li>用<code>hexo d</code>发布<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>等几分钟<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>就能在 Github Pages 上看到</li></ol><p>Hexo 不完美: 我在本地 VS Code 写<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>切换到浏览器预览<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这样频繁切换很费劲<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>同时<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>当我想添加图片时<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>每次需要手动插入图片地址<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>花费许多心力<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>可能只为了写一篇 10 分钟就能写完的文章<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>后来<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我想做一个&quot;短想法&quot;的页面<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>用于记录一些简短的公开想法<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><strong>它是一个没有点赞/评论/转发数据的地方<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我想在这里记录日常生活中的碎片化<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>未经(太多)审查的想法<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></strong> 但 Hexo 实现起来很困难<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>博客是文章列表<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而短想法是文本卡片<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>几经周折<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我花了两天时间实现了这个功能<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>恰好我的几个朋友也有类似需求<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>在手机上写点东西<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>发布到自己的博客上<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我想<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>有没有一个真正简单<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>傻瓜式的平台<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>能快速发布博客和想法<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span></p><p>最大的挑战不是技术<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而是信任<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>用户会担心平台倒闭后的数据丢失<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我考虑了很多种方案<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>比如自托管数据库<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>每日自动导出<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>但它们都太麻烦了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>折腾只会降低写作欲<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>上周我突然想到<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>Github 本身好像就能胜任这项工作<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>对开发者来说<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>它几乎是不会死掉的平台<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而且它允许用户大量且免费存储内容<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>如果我们把每个用户的博客和想法都存在他们的 Github 账户里<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>是不是数据就永远不会丢失了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span></p><p>于是就有了<a href="https://tinymind.me">Tinymind</a> <span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>用 Github 登陆<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>授权后<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>它会在你的账户下创建一个名为&quot;tinymind-blog&quot;的仓库<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>你写的每篇博客或想法都会给这个仓库提交一次 commit<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>它是无服务器<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span><a href="https://github.com/mazzzystar/tinymind">开源</a>的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>上周<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我发布了这个网站<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>在这一周我每天晚上写代码到凌晨三点<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>终于把包含公开主页<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>拖拽上传图片<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>编辑博客等最想要的一些<a href="https://github.com/mazzzystar/tinymind?tab=readme-ov-file#features-in-development">功能</a>100%实现了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>上线不到一周<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>截至目前已经有 400 多人创建了自己的公开博客<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>你可以在 Github 搜&quot;tinymind-blog&quot;仓库看到<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这其中有许多人完全不会写代码<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>我可能不会用它来取代现有的博客<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但我会用它来写草稿<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>预览和托管图片<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>然后在 <a href="http://mazzzystar.com">mazzzystar.com</a> 上完成发布<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我也可以在 Tinymind 上记录某些粗糙<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>原始的想法<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>再汇总到我的 <a href="https://mazzzystar.com/thoughts/">Thoughts 页面</a><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>当步骤变得简单<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>人们就会更频繁地去写<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>目前<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>写的内容必须公开<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>因为我不想让 Tinymind 请求你的私人仓库权限<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>于是它只能创建公开可见的仓库<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>你的博客数据存储在你自己的 Github 账户下<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而不是托管在 Tinymind<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>我花了很多个晚上完善 Tinymind<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>使其达到我的标准<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>现在完成了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我也即将开始下一个项目<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但它的生命才刚刚开始<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我希望它能让许多非程序员拥有自己的博客<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>以及帮跟我一样曾经折腾博客最终无法坚持下来的人<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>因为它的简单而坚持写作更久<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p><a href="https://github.com/mazzzystar/tinymind">Github</a> | <a href="https://tinymind.me">Tinymind</a></p>]]></content>
    
    
      
      
        
        
    <summary type="html">&lt;p&gt;2016 年&lt;span class=&quot;bd-box&quot;&gt;&lt;h-char class=&quot;bd bd-beg&quot;&gt;&lt;h-inner&gt;，&lt;/h-inner&gt;&lt;/h-char&gt;&lt;/span&gt;我注册了一个 10 年的域名&lt;span class=&quot;bd-box&quot;&gt;&lt;h-char</summary>
        
      
    
    
    
    
  </entry>
  
  <entry>
    <title>The Easiest Way to Have a Blog</title>
    <link href="https://mazzzystar.com/2024/09/26/The-easiest-way-to-have-a-blog/"/>
    <id>https://mazzzystar.com/2024/09/26/The-easiest-way-to-have-a-blog/</id>
    <published>2024-09-26T14:40:46.000Z</published>
    <updated>2026-04-10T13:45:08.740Z</updated>
    
    <content type="html"><![CDATA[<p>In 2016, I registered a 10-year domain and started blogging. My focus was machine learning: backpropagation derivations, RNN/LSTM principles. Each post was meticulously crafted with custom illustrations and LaTeX equations. Using Jekyll with a sleek theme, I obsessed over fonts, equations, and layout. Three years later, I realized I’d forgotten my blog’s URL.</p><p>From 2018 to 2022, I stopped writing entirely. The mere thought of dealing with domains, fonts, configurations, and deployment was overwhelming. Then one day, I wanted to write an <a href="https://mazzzystar.com/2022/09/07/stable-diffusion/">article</a> explaining Stable Diffusion’s principles on an independent platform.</p><p>I chose Hexo. The process was:</p><ol><li>Create a post with <code>hexo new 'title'</code></li><li>Render with <code>hexo g</code>, preview on localhost:4000</li><li>Publish with <code>hexo d</code>, wait a few minutes, then see it on Github Pages</li></ol><p>Hexo was decent, but not perfect. I typically wrote in local VS Code, then switched to the browser for preview. Each time, I had to manually add images and insert image URLs. It felt like a lot of effort, possibly just to write an article that could be finished in 10 minutes.</p><p>Later, I wanted to create a “short thoughts” page for brief, public musings: <strong>a place without likes/comments/shares, where I could record fragmented, less-filtered thoughts from daily life. Without your feedback, I could write freely what I think.</strong> But implementing this in Hexo was challenging: blogs are article lists, while short thoughts are text cards. After much struggle, I spent two days implementing this feature.</p><p>I realized several friends had similar needs: writing something on their phone and publishing it to their own blog. I wondered: could there be a truly simple, foolproof platform for quickly publishing blogs and thoughts?</p><p>The biggest challenge wasn’t technical, but trust. Users would worry about data loss if the platform shut down. I considered many solutions: self-hosted databases, daily auto-exports. But they were all too cumbersome, and the hassle would only discourage writing.</p><p>Last week, it hit me: Github itself seemed capable of this job. For developers, it’s a platform unlikely to disappear, and it allows users to store large amounts of content for free. What if we kept each user’s blog and thoughts in their Github account? Wouldn’t that ensure the data would never be lost?</p><p>Thus, <a href="https://tinymind.me">Tinymind</a> was born. After logging in with Github and authorizing, it creates a “tinymind-blog” repository in your account. Each blog post or thought you write becomes a commit to this repo. It’s serverless and <a href="https://github.com/mazzzystar/tinymind">open-source</a>.</p><p>I launched this site last week, coding until 3 AM every night to implement 100% of the most desired <a href="https://github.com/mazzzystar/tinymind?tab=readme-ov-file#features-in-development">features</a>, including public homepages, drag-and-drop image uploads, and blog editing. In less than a week, over 400 people have created their own public blogs. You can see this by searching for “tinymind-blog” repositories on Github. Many of these users don’t know how to code at all.</p><p>Will I migrate my current blog? No, but I’ll use it to write drafts and preview, then finalize publishing on <a href="http://mazzzystar.com">mazzzystar.com</a>. I’ll also record certain rough, raw ideas on Tinymind, then compile them on my <a href="https://mazzzystar.com/thoughts/">Thoughts page</a>. When things become simple, people do them more often.</p><p>Why must the content be public? Because I don’t want Tinymind to request access to your private repositories, which means it can only create publicly visible repositories. At the same time, your blog data is stored in your own Github account, not hosted on Tinymind.</p><p>I spent many late nights perfecting Tinymind to meet my standards. Now it’s complete, and I’m moving on to the next project, but its life is just beginning. I hope it allows many non-programmers to have their own blogs, and enables those who, like me, once struggled with blogging but ultimately couldn’t persist, to keep writing longer due to its simplicity.</p><p><a href="https://github.com/mazzzystar/tinymind">Github</a> | <a href="https://tinymind.me">Tinymind</a></p>]]></content>
    
    
      
      
        
        
    <summary type="html">&lt;p&gt;In 2016, I registered a 10-year domain and started blogging. My focus was machine learning: backpropagation derivations, RNN/LSTM</summary>
        
      
    
    
    
    
  </entry>
  
  <entry>
    <title>用 2 万条真人AI海龟汤数据评估大模型推理能力</title>
    <link href="https://mazzzystar.com/2024/08/09/turtle-benchmark-zh/"/>
    <id>https://mazzzystar.com/2024/08/09/turtle-benchmark-zh/</id>
    <published>2024-08-09T10:20:04.000Z</published>
    <updated>2026-04-10T13:45:08.741Z</updated>
    
    <content type="html"><![CDATA[<blockquote><p>GPT-4o, Kimi-Chat, DeepSeek, Qwen2-72b, LLama3.1<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>谁才是真实推理游戏中的王者<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span></p></blockquote><h2 id="海龟汤">海龟汤</h2><p>人生中第一次接触<strong>海龟汤游戏</strong>是我的初中英语课上<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>课间休息时老师突然问我们<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><p><em>一个男人走进一家餐厅<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>点了一碗海龟汤<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>他吃完问服务员<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>这是真的海龟汤吗<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span>服务员说<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>是的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>他就举枪自杀了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>请问为什么<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span></em></p><p>游戏规则是<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><strong>你可以提问或给出猜测<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>老师只能回答 是/否/和故事无关</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>比如你可以问<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><em>男人是否曾经经历灾难<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span></em><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但不能问<em>男人今年多少岁</em><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我们猜了好多轮<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>上课铃响了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>老师揭晓答案<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><p><em>他和妻子度蜜月时遭遇海难<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>流落荒岛<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>由于没有粮食<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>妻子被饿死<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>同伴用妻子的肉煮汤给他喝<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>骗他是海龟汤<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>后来他被路过的船只救走<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>今天<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>他喝到真正的海龟汤<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>才想起来当时吃下的是妻子的肉<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>悔恨之下举枪自尽<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></em></p><p>在海龟汤中<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>展现给玩家的是<em>汤面</em><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而沉在水底的故事真相被称作<em>汤底</em><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这个游戏至少 2 个人才能玩<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>有一个人是<em>裁判</em><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>他在知晓<em>汤底</em>的情况下<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>对玩家的猜测作出判定<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>给出<em>是/否/无关</em>的回答<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>我想能否做一个 AI 海龟汤游戏<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><strong>将汤面和汤底告诉给大模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>让它对玩家的猜测给出判定</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我在去年 12 月做了个<a href="https://chatgpt.com/g/g-rJFb4fkqA-albatross-soup">GPTs</a>, 它能自动生成新故事<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>用 DALLE 画插图<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>判定玩家提问<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>但很快我发现<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>AI 生成的海龟汤味道寡淡<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>玩起来没有趣味<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>并且<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>海龟汤的魔力<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>往往就在于故事本身血腥/重口<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这经常导致玩到一半因为违反 OpenAI 审核政策而无法继续<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>今年 6 月<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我终于意识到<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span><em>不一定非得由 AI 生成故事<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>让它充当裁判即可</em><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我开始在网上搜罗<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>当晚独自在客厅看完了 1500 个海龟汤<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>其血腥恐怖程度<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>大夏天的晚上我都感觉后背发凉<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>最终<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我筛选了 32 个相对不恐怖<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>不违反伦理<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>有逻辑的故事<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>开始写代码<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>我基于<a href="https://www.pingti.app/">最佳平替</a>的代码很快完成了开发<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><strong>你可以随机开始一个故事<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>有 8 次猜测机会<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>猜测正确或次数耗尽<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>就会公布答案</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>产品取名<a href="https://www.tanghenre.com/">汤很热</a><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>为了增加沉浸感<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我给每个故事都配了插图和环境音<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p><img src="/images/2024-08-03/tanghenre.jpg" alt="汤很热-一个人的海龟汤游戏"></p><p>不过<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>AI 海龟汤游戏并不是本文的重点<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h2 id="大模型比人类笨多了">大模型比人类笨多了</h2><p>我发现有很多用户吐槽<strong>AI 作为裁判的实力堪忧</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>比如<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><p><img src="/images/2024-08-03/ai-not-good-for-judge.jpg" alt="许多玩家反馈AI判定不合理"></p><p>起初<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我以为是我用的模型(DeepSeek)不行<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>直到我将模型切换成当时风评最好的 Claude 3.5 Sonnet<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>发现许多错判仍然无法避免<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>例如<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><div class="highlight"><pre class="code"><code>&#123;    <span class="hljs-attr">&quot;故事&quot;</span>: <span class="hljs-string">&quot;小红裙&quot;</span>,    <span class="hljs-attr">&quot;汤面&quot;</span>: <span class="hljs-string">&quot;姐姐为我选了一件小红裙, 我穿着去上学了, 晚上回家发现了一具尸体&quot;</span>,    <span class="hljs-attr">&quot;汤底&quot;</span>: <span class="hljs-string">&quot;我的母亲和老师有染, 他们总趁着父亲不在时温存. 而为老师提供信息的    就是我的小红裙, 每当我穿着小红裙去上学就说明那晚父亲准不在. 这天妈妈忙,    姐姐为我选了一件小红裙, 老师看见以为父亲不在家, 便来我家找母亲, 正好被父亲    撞上, 然后父亲杀了他.&quot;</span>&#125;,用户提问: <span class="hljs-string">&quot;我如果不穿小红裙是不是不会有人死&quot;</span>, 几乎所有模型都回答<span class="hljs-string">&quot;不是/不相关&quot;</span></code></pre></div><p>再比如<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><div class="highlight"><pre class="code"><code>&#123;    <span class="hljs-attr">&quot;故事&quot;</span>: <span class="hljs-string">&quot;山顶&quot;</span>,    <span class="hljs-attr">&quot;汤面&quot;</span>: <span class="hljs-string">&quot;一个人住在山顶的小屋里, 半夜听见有敲门声音, 但是他打开门却    没有人,于是去睡了. 第二天, 有人在山脚下发现死尸一具, 请问发生了什么?&quot;</span>,    <span class="hljs-attr">&quot;汤底&quot;</span>: <span class="hljs-string">&quot;山顶的小屋的门前是悬崖, 悬崖下的人好不容易才爬上来,    想要敲门求救. 一开门, 就又被推了下去, 最后从山顶上掉下去摔死了&quot;</span>&#125;,用户提问: <span class="hljs-string">&quot;门是朝外开的&quot;</span>, 几乎所有模型都回答<span class="hljs-string">&quot;不是/不相关&quot;</span></code></pre></div><p>我意识到<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span><strong>海龟汤游戏也许非常适合评测大模型(LLM)在真实场景下的推理能力</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h2 id="真实环境下的-llm 推理能力">真实环境下的 LLM 推理能力</h2><p>现在<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>大模型被广泛用于游戏<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>客服或者许多和用户直接交互的场景<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这些场景有如下特点<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><ol><li>用户的提问千奇百怪<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>无法预估<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但 AI 需要给出合乎逻辑的应答<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></li><li>在给定上下文对情况下<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>AI 需要回答用户一些明确的<strong>对</strong>或<strong>错</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>例如<em>已知一件商品的生产日期和保质期<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>用户在 2024 年 8 月 9 日提问<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>202 几年过期<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span></em></li><li>有些游戏需要在用户进入某些关卡<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>或发现关键线索时触发下一步剧情<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>那么<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>判定用户是否真的发现真相<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>就显得尤为重要<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></li></ol><p>与学术界现有的评估指标相比<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>在真实环境下与真人互动的场景中<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>模型面临的情况要复杂得多<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>然而<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>也是在这样的场景下评估模型的表现<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>才具有更大的实用价值<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h2 id="现有评估指标出了什么问题">现有评估指标出了什么问题</h2><p>如果你经常关注大模型评测榜单(如<a href="https://arena.lmsys.org/">LMSYS</a>)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>一定对 MMLU<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>MT-Bench 等评测指标(Benchmark)不陌生<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我在这里简单解释它们的评测方法<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><h4 id="mmlu">MMLU</h4><p>MMLU 是广为人知的大模型评估指标<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>它包含了涉及物理<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>天文<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>计算机<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>生物<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>临床医学等 57 个科目的 15,000 多个多项选择题<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但这其中中存在<strong>大量死记硬背</strong>的考题<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>例如<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><div class="highlight"><pre class="code"><code>以下哪一个是远程木马<span class="hljs-operator">?</span><span class="hljs-variable">A</span><span class="hljs-operator">:</span>内存泄漏 <span class="hljs-variable">B</span><span class="hljs-operator">:</span>缓冲区溢出 <span class="hljs-built_in">C</span><span class="hljs-operator">:</span>处理能力较低 <span class="hljs-built_in">D</span><span class="hljs-operator">:</span>编程效率低下</code></pre></div><p>这些基础常识当然很重要<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但过分强调<strong>背景知识</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>会让 MMLU 无法衡量模型真正的<strong>语言理解能力</strong>和<strong>逻辑外推能力</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>假如一个孩子因为没学过微积分<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>计算不出曲边三角形面积<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我们会说他笨吗<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span></p><h4 id="mt-bench">MT-Bench</h4><p>MT-Bench 是一个多轮问题数据集<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>被评测的模型需要回复预先设置好的问题<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>并回答下一轮的提问<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>但因为是开放式对话<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>并不存在确定的标准答案<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>模型的回答质量由 GPT-4 来审判<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>因此<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>MT-Bench<strong>无法评估比 GPT-4 更强的模型</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>同时 GPT-4 作为<span class="bd-box"><h-char class="bd bd-end"><h-inner>“</h-inner></h-char></span>法官<span class="bd-box"><h-char class="bd bd-beg"><h-inner>”</h-inner></h-char></span>可能会<strong>存在偏见</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>对某些模型输出打低分<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而更偏爱来自 ChatGPT 的回答<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h4 id="chatbot-arena">Chatbot Arena</h4><p>正是以上评测指标存在的种种问题<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span><a href="https://lmsys.org/">LMSYS</a> 最终选择了最简单粗暴的方式<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>打擂台<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><blockquote><p>真人用户发起聊天<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>系统会随机挑选 2 个模型给出回答<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>真人通过投票的方式选出更满意的模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>最终<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>会形成一个所有模型的综合评分<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p></blockquote><p>这是目前可信度最高的方法<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但缺点也很明显<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><strong>一个新模型需要公开测试很久<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>获得大量反馈<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>其分数才足够可信</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>并且<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>分数代表综合能力<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>无法仅对某个细分领域(代码/数学)进行评估<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h2 id="海龟-benchmark">海龟 Benchmark</h2><p>因此<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我制作了一个新的大模型评估指标<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><strong>海龟 Benchmark</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><blockquote><p>收集用户在玩 AI 海龟汤游戏中输入的猜测<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>逐一进行人工标注(对<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>错<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>不相关)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>然后用这个数据集<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>测试大模型的评判结果相较于真实结果的准确率<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p></blockquote><p>我发现<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>现有评测指标的种种问题<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>在海龟 Benchmark 上都可以完美避开<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><ol><li><strong>不需要额外背景知识</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span><br>不同的大模型训练所使用的知识库不同<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>导致一些测评很难公正<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>但海龟汤游戏里几乎包含了推理所需的全部信息<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>一旦得知汤面和汤底<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>大模型就能作出判断<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这使得评估被限定在了模型的<strong>推理能力</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></li><li><strong>结果是客观的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>不以人类偏好为转移<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></strong><br>例如<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>在上述故事<span class="bd-box"><h-char class="bd bd-end"><h-inner>《</h-inner></h-char></span>山顶<span class="bd-box"><h-char class="bd bd-beg"><h-inner>》</h-inner></h-char></span>里<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>小屋在悬崖边<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>主人半夜开门将登山者推下山导致后者被摔死<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>因此<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span><em>门是朝外开的</em>这个猜测就是正确的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这种正确性是客观的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>和人的感受无关<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></li><li><strong>结果明确<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>很容易量化<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></strong><br>许多评估指标里<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>模型的输出结果是一段文本回答<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这导致难以量化模型效果<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>但海龟汤的猜测结果只有三个<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><strong>对<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>错<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>不相关</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>只要 准确标注了测试集<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>任何人就可以用它来测试任何自己想测试的模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>并获得量化的数值结果<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></li><li><strong>正常人类获知汤底的情况下<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>可以 100%答对<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></strong><br>这使得人工标注不会太过复杂<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>这条也说明<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>现阶段的大模型智商相比人类还有很大差距<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></li><li><strong>数据永远更新<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>无法作弊<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></strong><br>有部分厂商会直接将现有的 benchmark 数据集加入训练来刷分<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但在海龟 Benchmark 这种模式下则行不通<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>模型评估的是<strong>用户的猜测</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而不是故事本身<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>每隔一段时间<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>就会有玩家产生新的猜测<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而人类的脑洞之大<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>导致猜测几乎无法被穷尽<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></li></ol><p>例如<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>针对上述故事<span class="bd-box"><h-char class="bd bd-end"><h-inner>《</h-inner></h-char></span>小红裙<span class="bd-box"><h-char class="bd bd-beg"><h-inner>》</h-inner></h-char><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>就有千奇百怪的用户猜测<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><div class="highlight"><pre class="code"><code>用户猜测  判定红裙子跟诅咒有关  ❌红裙子是姐姐的阴谋  ❌我并没有去上学  ❌有其他的人来我们家  ✅红裙是求救信号  ❌死的是穿小红裙的人  ❌红裙的颜色是被血染红了  ❌尸体是我的爸爸  ❌上学不允许穿小红裙  ❌我是凶手  ❌我父亲杀人了  ✅穿了小红裙导致别人认为我是其他人  ❌死者认识我妈  ✅死者与我家里人有仇  ❌</code></pre></div><p>因此<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>虽然海龟汤的故事本身可能比较无厘头<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但让 AI 依据海龟汤内容进行合理推断<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>却可以做到相当程度上的客观<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>这有点像<a href="https://huggingface.co/datasets/m-a-p/COIG-CQIA/viewer/ruozhiba/train">弱智吧</a><span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>一个从百度弱智吧抓取的 200 多条提问(如<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>每个人工作都是为了赚钱, 那么谁在亏钱) 这些奇葩的问题却显著增强了 AI 的逻辑推理能力<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h2 id="海龟数据集">海龟数据集</h2><p>AI 海龟汤游戏有 32 个故事<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>上线后的 2 周里<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>共有 4000 多个用户提出了 2.6 万个猜测<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我从日志中解析出结果<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>开始进行数据清洗<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这包含<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><ul><li>去除重复提问<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>例如<em>海龟汤有毒吗<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span></em> 和 <em>他喝的汤是否有毒</em>本质是同一个问题<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></li><li>去除无法用 <em>是/不是/不相关</em> 回答的提问<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>例如 <em>男人今年几岁<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span></em></li><li>去除含糊不清的提问<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>例如<em>他对闺蜜做了什么吗<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span></em><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>在<span class="bd-box"><h-char class="bd bd-end"><h-inner>《</h-inner></h-char></span>闺蜜<span class="bd-box"><h-char class="bd bd-beg"><h-inner>》</h-inner></h-char></span>这个汤里<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>是丈夫与闺蜜出轨<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但丈夫并没有对闺蜜做任何实际的动作<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>所以这个回答很难给出准确回答<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></li></ul><p>随后<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我开始进行人工标注<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这个过程持续了 2 周<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>最终我们从 2.6 万条数据中<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>获得了 4448 条干净的数据<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>标注过程中<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我们发现<strong>错</strong>和<strong>不相关</strong>这两个标签在有些情况下不好区分<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>例如在故事<span class="bd-box"><h-char class="bd bd-end"><h-inner>《</h-inner></h-char></span>海龟汤<span class="bd-box"><h-char class="bd bd-beg"><h-inner>》</h-inner></h-char></span>中<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>对于<em>海龟是男人养的</em>这个猜测<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>回答<strong>错</strong>和<strong>不相关</strong>好像都对<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>所以最终<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我们决定合并这两个类别<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>于是标注变成了 2 类<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><strong>对</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span><strong>错/不相关</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>* <em>合并这两类会让任务变得简单<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>有的模型能蒙混过关<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>之后我们可能会重新标注一次<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>将二者分开变成三类<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>并给出测试结果<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></em></p><p>标注完<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我开始跑模型测试<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我挑选了 11 个我感兴趣的模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><ul><li>Qwen2 70B (通义千问)</li><li>Kimi-Chat (月之暗面)</li><li>Deepseek</li><li>豆包</li><li>Claude 3.5 Sonnet</li><li>Minimax abab6.5s</li><li>LLama3.1 405B</li><li>LLam3.1 70B</li><li>GPT-3.5</li><li>GPT-4o-mini</li><li>GPT-4o</li></ul><p>我在 4448 条数据上测试了所有结果<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span><strong>过滤掉了所有模型都答对</strong>的简单问题<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>在剩下的 1699 条困难问题上<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>进行了二次确认标注<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>最终<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我们得到了 1537 条准确率几乎 100%的标注结果<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>我分别用不带示例(zero-shot)和带有 2 个示例(2-shot)的 prompt<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>测评了模型的输出结果准确率<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h2 id="评测结果">评测结果</h2><p>最终各模型准确率排名如下<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><br><img src="/images/2024-08-03/Turtle-Benchmark-result.png" alt=""></p><p>可以看到<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>大部分模型在加了示例后性能有了微弱提升<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>我担心<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>可能存在这么一种情况<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><em>模型在某个故事里表现极差<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而该故事的测试样本又非常多<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>导致总的平均准确率有偏差</em><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>为了排除这种情况<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我统计了按故事粒度的模型准确率<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>也就是分别计算模型在这 32 个故事上各自的准确率<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>然后除以 32<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我发现<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>除了<em>通义千问</em>和 GPT-4o 外<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>上面的排名基本不变<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p><img src="/images/2024-08-03/Turtle-Benchmark-over-32stories.png" alt=""></p><p>将 2-shot 结果<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span><em>以横轴为模型总的准确率<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>纵轴为模型平均故事准确率</em><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>绘制图表如下:</p><p><img src="/images/2024-08-03/average_model_accuracy_over_stories_2-shot.png" alt="x轴是总准确率，y轴是平均故事准确率。"></p><p>* <em>为了更直观地比较其他模型差异<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我将表现过差(&lt;0.51)的模型 GPT-3.5 从坐标轴中舍弃了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></em></p><p>从上图也可以直观感受各类模型的表现和差距<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><ul><li>Claude 3.5 Sonnet 是当之无愧的<strong>第一梯队</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>并且远远领先其他模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></li><li>GPT-4o<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>通义千问<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>Kimi-Chat<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>LLama3.1 405B 和 Minimax 是<strong>第二梯队</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我尽量避免更细的划分<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但这些模型能力按排序依次下降<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>降幅肉眼可见<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></li><li>豆包<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>DeepSeek 和 LLama3.1 70B 是<strong>第三梯队</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></li><li>GPT-4o-mini 是<strong>第四梯队</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></li><li>GPT-3.5 早就应该被淘汰了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></li></ul><p>以上评测仅针对<strong>模型的中文理解和推理能力</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>如果之后有经费和精力<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我会考虑将所有的故事和测试问题翻译成英文<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>再使用英文 prompt 重新测试一遍<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>以消除因为语言而造成的模型性能下降<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h2 id="测试你关心的模型">测试你关心的模型</h2><p>上述模型可能不包含你关心的模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>并且<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>为了排除因为我的 prompt 能力<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>参数和温度设置有问题<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>造成测评结果不准<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我将完整的<strong>标注数据<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>prompt<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>评估代码</strong>以及我们的<strong>测试日志</strong>开源了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><p><a href="https://github.com/mazzzystar/TurtleBench">https://github.com/mazzzystar/TurtleBench</a></p><p>你可以对任何你感兴趣的模型进行测试<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>如果你有了测试结果或遇到问题<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>欢迎提交 <a href="https://github.com/mazzzystar/TurtleBench/issues">issue</a><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h2 id="感谢">感谢</h2><p>五源资本的 Steven 个人赞助了此项测评<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>让我得以在 11 个模型上测试这么多数据<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>实习生 Jerry 和我一起标注了 26000 条数据<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>辛苦了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>如果你对 model evaluation 感兴趣<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>可以联系 Steven 进一步探讨 <a href="mailto:stevenshi@5ycap.com">stevenshi@5ycap.com</a></p>]]></content>
    
    
      
      
        
        
    <summary type="html">&lt;blockquote&gt;
&lt;p&gt;GPT-4o, Kimi-Chat, DeepSeek, Qwen2-72b, LLama3.1&lt;span class=&quot;bd-box&quot;&gt;&lt;h-char class=&quot;bd</summary>
        
      
    
    
    
    
  </entry>
  
  <entry>
    <title>一个AI相册搜索应用的两年</title>
    <link href="https://mazzzystar.com/2024/07/21/Two-Years-of-an-AI-Photo-Album-Search-App-zh/"/>
    <id>https://mazzzystar.com/2024/07/21/Two-Years-of-an-AI-Photo-Album-Search-App-zh/</id>
    <published>2024-07-21T15:14:16.000Z</published>
    <updated>2026-04-10T13:45:08.740Z</updated>
    
    <content type="html"><![CDATA[<blockquote><p>诞生<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>爆火<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>开源<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>抄袭<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>沉寂<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>一个产品的漂流<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p></blockquote><p>一篇长长长长的流水账<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h2 id="起源">起源</h2><p>故事起源于 2022 年 5 月的一个周末<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我坐在北京昌平区的书店里<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>正在调试 Disco Diffusion 模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>此时 AI 绘画时代初露端倪<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>SD 尚未发布<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>画一张图<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>在一张性能良好的 V100 显卡上要 5 分钟<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>我将开源代码<a href="https://github.com/mazzzystar/disco-diffusion-wrapper">封装</a>成可以只加载一遍模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>暴露少数参数的接口<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>花 2000/月租了张显卡<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>并在朋友圈和社交媒体上<a href="https://m.okjike.com/originalPosts/626e7e3833f5cae7ce1beaf2">发帖</a><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>想玩 AI 绘画的人可以给我句子描述<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我在机器上跑完并把图发给他们<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p><img src="/images/2024-07-21/IMG_0792.jpg" alt="网友们玩得很开心"></p><p><a href="https://greatdk.com/">朋友</a>说<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>要不搭建一个网站让大家自己玩<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span>于是就有了 <a href="https://mp.weixin.qq.com/s/5ga0880sHLZAKDBFMQKlBg">6pen</a><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我们在很短时间内有了 100 万用户<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>然后逐渐销声匿迹<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>这次<span class="bd-box"><h-char class="bd bd-end"><h-inner>“</h-inner></h-char></span>创业<span class="bd-box"><h-char class="bd bd-beg"><h-inner>”</h-inner></h-char></span>不太成功<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我自觉是<strong>我没有训练出差异化的模型</strong>占很大因素<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但这里不展开了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>如上所说<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>SD 发布之前的 AI 绘画速度极慢<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我的大部分时间是在做模型加速<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>其中一个优化项就是 关于 CLIP 模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p><a href="https://openai.com/blog/clip">CLIP</a> 是 OpenAI 2021 年发布的模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>它能<strong>比较任意一张图片和一句文本之间的相似度</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>在 Disco Diffusion 中<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>模型用 CLIP 来计算生成图像与用户<code>prompt</code>之间的损失<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>不断优化损失从而实现绘画(<a href="https://mazzzystar.com/2022/05/04/how-ai-art-works-1/">详细原理</a>)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>使用越小的 CLIP 模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>绘图速度越快<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但画面细节也会越差<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我当时正在调试以平衡绘图速度和画面细节<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>坐在书店里<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>突然一个想法闯进了脑子<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><em>既然可以比较图文相似度<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>那么可以用它来搜照片吗<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span></em> 搜索后发现已经<a href="https://github.com/kingyiusuen/clip-image-search">有人</a>写了一个用 CLIP 搜图的工具<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>原理是将图片上传到服务器<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>统一提取特征<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>输入英文并计算文本与每一张图的相似度<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>排序就能实现搜图<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p><img src="/images/2024-07-21/clip-img-search.jpg" alt="https://github.com/kingyiusuen/clip-image-search"><br>我开始尝试将我的 iPhone 相册上传到服务器<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>试了一番下来<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我发现效果好得惊人<span class="bd-box"><h-char class="bd bd-beg"><h-inner>！</h-inner></h-char></span>特别是搜一些虚无缥缈的东西<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>比如我输入&quot;lonely&quot;(<em>孤独</em>)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>它返回到前三张照片如下<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><p><img src="/images/2024-07-21/search-loney-in-CLIP.jpg" alt="输入'lonely'从照片库返回的结果"></p><p>把照片放在服务器<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这不是个好想法<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我照片最多的地方就是 iPhone 本地相册<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span><strong>能否做一个完全运行在本地的 CLIP 搜图 App 呢<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span></strong> 我非常喜欢这个 idea<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>几次和朋友讨论起<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但每次都无疾而终<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>我心里太没底了:</p><ul><li>我完全不懂 iOS 开发<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></li><li>Apple 底层可能不支持 CLIP 模型算子<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></li><li>即使能跑起来<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>如果索引速度慢到 1 秒/张<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>或者搜一次 10 分钟<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这个产品也没有意义<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></li></ul><p>直到 2024 年<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span><strong>端侧模型</strong>才逐渐被大家所关注<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但在 2022 年<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我在 App 市场上没找到一个运行在端侧的语言模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>应该…不太可行吧<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span>我终于忘了这件事<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>继续投入到 6pen 的研发中<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h2 id="回响">回响</h2><p>转折点是 2022 年 12 月初<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>因为一些变故<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我突然到了一个语言完全不通的陌生国家(🇰🇷)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>与 6pen 的缘分也走到尽头<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>于是<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>在空旷的咖啡店<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>一台笔记本<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>一杯冰拿铁坐一整天<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>背景音乐放着 Kpop<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>窗外是厚厚的积雪<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>中午饿了吃店里的三明治<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我每天就这样度过<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>这里网速飞快<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>没有核酸<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>周围人的谈话因语言不通自动变成了白噪音<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我好像突然活在真空之中<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>这种陌生的真空感让我兴奋<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>像流放的逃犯一样<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我是谁<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>我的过去不再重要<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>在这里我能从头学习并完成任何事<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>是时候开始做点真正让我兴奋的产品了——这个 idea 再次抓住了我<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>但这次<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我不再恐惧验证可行性<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我学习用 Swift 编写<code>tokenizer</code><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>研究应该如何计算并本地存储特征<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>学习用多核加速索引<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>对比不同相似度排序算法<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>期间在 StackOverflow 提了许多愚蠢的问题<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>有很多次挫败时刻<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>但我脑海里一直浮现这个画面<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>在手机输入&quot;<code>coffee and laptop</code>&quot;<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>点击搜索<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>旋转动画后<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这张照片从 3 万张相册中跳出来<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>出现在我眼前<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p><img src="/images/2024-07-21/IMG_5091.jpg" alt="工作日的咖啡店，几乎没有人。"></p><p>这个幻想支撑我废寝忘食地干了 2 个星期 —— 字面意思<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>好几次忘记吃午饭<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>一杯拿铁喝到傍晚饿到胃疼<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>全身乏力<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>有个重要的时间节点<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>就是 ChatGPT 在那时刚刚发布<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>但我在开发中陷入太深<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>根本忽视了它的存在<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>那可能是我这辈子最后几次在 StackOverflow 提问<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>总之<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>在 12 月 27 日<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我终于完整做出了产品<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我把模型中的<em>文本模型</em>和<em>图像模型</em>分拆成两个独立模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>分开加载<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><blockquote><p>为相册构建索引时只加载图像模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>计算索引并保存<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>搜索时只加载文本模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>并逐一计算与保存的索引之间的余弦距离<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>然后返回相似度最高的 topK 照片<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p></blockquote><p>错开加载模型可以有效降低软件的内存占用<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>并且能加速构建索引<span class="bd-box"><h-char class="bd bd-beg"><h-inner>；</h-inner></h-char></span>在<em>构建索引</em>时计算并保存所有照片的特征<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>使得<em>搜索</em>过程只是进行特征比较<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而与图像无关<span class="bd-box"><h-char class="bd bd-beg"><h-inner>；</h-inner></h-char></span>同时<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>用多核并行来加速这两个过程<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>这一系列的优化下来<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>可以做到在我的 iPhone 12 mini 上 以 2000 张/分钟速度为照片构建索引<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而 10,000 张照片搜一次只要不到 1s<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p><img src="/images/2022-12-28/Queryable-flow-chart.jpg" alt="实现原理"></p><p>这表明<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><strong>算子是支持的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>速度是可用的</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我悬着的心终于落下了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h2 id="定价">定价</h2><p>作为一个定价小天才<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我的设想是<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><blockquote><p>用户可以免费下载<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>并构建索引<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>任意搜索<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>当他未来有新照片<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>想更新索引<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>就要付费<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p></blockquote><p>这个策略妙处在于<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><em>只有真正用上了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>喜欢这个产品的人<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>才需要付费</em><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>那些来尝鲜的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>或者试了下发现和预期不一样的人<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>不需要也不会掏钱<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>因而避免了用户花冤枉钱怒打差评<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>但很快<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>在调试代码时我发现一个合理但又搞笑的事实<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><em>App 内购买需要联网</em><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>这简直是晴天霹雳<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>因为我从一开始就下决心<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><code>决不允许App在任何情形下弹出联网请求</code><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>为什么<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span>因为这是一个相册搜索应用<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>它会扫描你的整个相册<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>没有人知道联网后你会不会将用户的照片上传到地球某处的服务器<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我知道可以在产品里解释<code>为什么会弹窗请求联网权限</code><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但我不想陷入尴尬的自证境地<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>我只好将它变成付费产品<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>用户必须购买<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>之后从打开 App<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>构建索引到完成搜索的过程中<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>只弹出一次<code>相册权限请求</code><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我知道这么做很蠢——后续的教训也验证了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>付费下载会带来大量差评<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>因为模型太吃算力<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>许多内存小的机型上构建索引会崩溃<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>卡顿<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>在 iPhone X 系列算子支持异常导致搜索结果全黑<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这些都会被骂 Ripoff(诈骗)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>；</h-inner></h-char></span>并且<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>用户根本不会在意<em>不弹出联网请求</em>这件事<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>一旦出现以上异常<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>他们就会删 App<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>打差评<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>质问我把他的相册偷偷上传到哪去了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>总之<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>最终我将其定价为 3.99 美元<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>一次购买<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>终身使用<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h2 id="推向市场">推向市场</h2><p>我给产品取了名字<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><code>Queryable</code><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>意为<em>可查询的</em><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>并发了条很中二的动态<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><em>我觉得这个 app 可能会改变世界</em><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我对此信心满满<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>甚至给 Tim Cook 写了一封邮件<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>希望苹果能收购这个产品(笑)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>那时候我已经开始用 ChatGPT 了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>可能因为太激动<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我忘了替换掉自己的名字<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>点击发送之后<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>才看到邮件开头是<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><code>＂Dear Mr. Cook, My name is [Your Name]&quot;</code><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>当然最终也没有收到回信<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>我又开始雄心勃勃地准备写篇产品介绍<a href="https://mazzzystar.com/2022/12/29/Run-CLIP-on-iPhone-to-Search-Photos/">文章</a><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>反复斟词酌句<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>试图将它变成我心中 Hacker News 好文章的风格<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>在 12 月 29 日<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>App Store 审核通过的当天<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我立刻在 Hacker News 提交了我的文章链接<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但系统提示<em>账号太新无法提交</em><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我给他们发邮件反馈了这个问题<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>让朋友用他的账号帮我发帖<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>帖子很快沉了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>邮件也没收到回信<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>我很难过<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但由于收到许多用户请求支持中文输入<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我来不及悲伤<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>便立即投入中文模型训练中<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>得益于分离了 CLIP 中<em>文本模型</em>和<em>图像模型</em><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我只需要找到中英文双语平行语料<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>训练一个中文<em>文本模型</em><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>将其输出结果与英文模型对齐即可<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这个过程本质上是蒸馏<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>很快<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>2023 年 1 月 18 日<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我做好了中文版<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>取名<em>寻隐</em><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>来源是贾岛古诗<span class="bd-box"><h-char class="bd bd-end"><h-inner>《</h-inner></h-char></span><em>寻隐者不遇</em><span class="bd-box"><h-char class="bd bd-beg"><h-inner>》</h-inner></h-char><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>也暗含<em>从相册中发现隐藏含义</em>的意思<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>毕竟<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span><strong>我的初次震撼就是搜<code>lonely</code>时意识到那几张照片原来代表孤独<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></strong></p><p>上线后<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我用中文在少数派写了一篇文章介绍此产品<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>和<a href="https://mazzzystar.com/2022/12/29/Run-CLIP-on-iPhone-to-Search-Photos/">英文版</a>不同<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>除了讲述技术方案外<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我还完整记录了心路历程<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>竟然进了当天的首页<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这带来了大量下载<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>以及 1500 美元的收入<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p><img src="/images/2024-07-21/IMG_0797.jpg" alt="朋友告诉我上了少数派首页"></p><p>2 月初<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我收到了 Hacker News 编辑的回信<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>他说我的账户被系统误判为 SPAM<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>鼓励我重新发帖<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>说我的文章很符合社区精神(“the article is definitely fine HN material”)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>他将我的新帖子链接放在候选池(pool)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>池里的文章会随机进入首页底部<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>如果用户点赞<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>排名就会上升<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>否则再次沉下去<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p><img src="/images/2024-07-21/hn-email.jpg" alt="Hacker News编辑Daniel将我的链接放在/pool下"></p><p>我发现<a href="https://news.ycombinator.com/item?id=11662380">pool</a>机制很有意思<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>社区似乎希望在去中心化的机制下<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>仍然维持黑客精神<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><blockquote><p>This is our long-running experiment in story re-upping. Moderators and a small number of reviewer users comb the depths of /newest looking for stories that got overlooked but which the community might find interesting. Those go into a second-chance pool from which stories are randomly selected and lobbed onto the bottom part of the front page. This guarantees them a few minutes of attention. If they don’t interest the community they soon fall off, but if they do, they get upvoted and stay on the front page.</p><p>这是我们长期进行的故事重演实验<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>版主和少数审阅者用户会深入寻找被忽视<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>但社区用户可能会感兴趣的故事<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>这些故事会进入第二次机会池<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>该池中的故事会被随机选择并放置到首页的底部<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>这保证了它们获得了几分钟的注意力<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>如果社区用户不感兴趣<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>它们很快就会下沉消失<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但如果社区感兴趣<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>它们就会得到支持(upvote)并留在首页<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p><a href="https://news.ycombinator.com/item?id=11662380">https://news.ycombinator.com/item?id=11662380</a></p></blockquote><p>当晚 7 点<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我的<a href="https://news.ycombinator.com/item?id=34686947">帖子</a>冲到了 Hacker News 首页第二名<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span><br><img src="/images/2024-07-21/queryable-on-hacker-news.jpg" alt="文章上了Hacker News首页#2"></p><p>那晚<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我抱着手机每 10 秒刷新一次<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>兴奋感从十二点躺下持续到凌晨三点<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我一直在回复帖子下的讨论<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>反馈 bug 的邮件<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>有人教我用 LSH 来提高搜索速度<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>有人给出如何在不联网的前提下<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>将照片经纬度映射到城市<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>有人讨论 iPhone X 上运行失败的原因<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>这种感觉好像和产品有多少下载<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>赚多少钱无关<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>你创造了一个东西<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>得到了一大群同行的赞美<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>兴奋地讨论<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>给它出主意<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这是人生少有的经历<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>一次就很知足了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>其中有一条评论引起了我的注意<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><br><img src="/images/2024-07-21/same-idea-product-zh.jpg" alt="在地球上的另一处，有人在和我做一模一样的事"></p><p>我阅读了作者的开发日志<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>发现我们像地球上随机的两个脑子产生相同想法的人<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我甚至在 Testflight 试用了他还未上线的产品<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>有种莫名惺惺相惜的感觉<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h2 id="hacker-news 是世界的公告牌">Hacker News 是世界的公告牌</h2><p>当晚<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我很兴奋地睡着了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>为自己产品被同行喜欢而激动<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>事实证明我远远低估了 Hacker News 的影响力<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>两天时间<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>Queryable 几乎横扫了欧洲所有国家的工具榜#1<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>美国 #2<span class="bd-box"><h-char class="bd bd-end"><h-inner>（</h-inner></h-char></span>仅次于小火箭<span class="bd-box"><h-char class="bd bd-beg"><h-inner>）</h-inner></h-char><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>总收入是 2800 美元<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>之后的几天<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我醒来第一件事就是看 Gmail<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>德国<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>法国<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>西班牙<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>美国<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>各个国家都有<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>有反馈 bug 的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>有德国杂志社想报道的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>有法国 iOS 社区的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>有油管博主测评的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我的推特也会因为有人转发 Hacker News 热榜而不断收到通知<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p><img src="/images/2024-07-21/queryable-us-fr-de-es-ranking.jpg" alt="美区#2，法德西#1，我在兴奋地截图"></p><p>甚至一位在 Apple Photos 组工作的朋友告诉我<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>他们组里也知道 Queryable<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>以上这些都迫使我意识到某种<strong>世界性</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>Hacker News 并非只属于美国<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span><strong>它像世界中心区域一块虚拟的公告牌<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>每个作品在上面短暂停留<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但总有来自各个国家<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>无数双眼睛盯着<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></strong> 这个产品其实只支持英语<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但不妨碍它在欧洲几乎所有国家付费工具榜#1<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>他们好像会天然接受一个只能用英语的产品<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>在美国行得通的产品往往欧洲人也能接受<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p><img src="/images/2024-07-21/IMG_0818.PNG" alt="并不针对欧洲，却也在欧洲畅销"></p><p>热度很快降了下来<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>随即而来的是很多恶评<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>简单来说<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>Hacker News 的用户质量极高<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我没看到任何令我感到不适的评论<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>但经过各种网站<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>YouTube 和社区的二次传播后<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>不那么友好的用户就浮出水面了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>主要攻击的有两点<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><ol><li>害怕我会窃取他的相册隐私<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></li><li>我是中国开发者<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></li></ol><p>在他们看来<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>第 2 点让第 1 点的情况变得更糟了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>在 Hacker News 接二连三的余热结束以后<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这个产品在欧美销量迅速变得惨淡<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>每天个位数的下载<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>几十块的收入<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>在我的想象中<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>改变世界的东西是一骑绝尘的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>怎么会突然停下了呢<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span>我陷入了巨大的怀疑和悲观<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h2 id="免费-&amp; 开源">免费 &amp; 开源</h2><p>所幸国内的<a href="https://mp.weixin.qq.com/s/SWFuMXzzsNXw5C8Isq60AA">差评</a><span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span><a href="https://mp.weixin.qq.com/s/E7K5nkgr-3bjqwbRz36Opg">果壳</a>等公众号对寻隐的自来水曝光<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这让我从 1 月份开始<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>每个月可以获得 1 万元左右的收入<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>并且因为模型运行在手机端<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>也就没有服务器成本<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>从 4 月份起<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>没有任何流量曝光<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>不做任何更新的情形下<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>平均每月大概可以获得 3000 元的收入<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>但是<span class="bd-box"><h-char class="bd bd-beg"><h-inner>！</h-inner></h-char></span>我仍然觉得这是一个很有用的产品<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>只是我没办法让很多人知道<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我进行过一次限时免费<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>当天下载量超过过往日均的 100 倍<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我想<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>与其维持这样每月 3000 块的收入<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>可能阻止了 99%的人发现这个产品<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>不如让所有人都可以使用它<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>于是我决定让它一直免费<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>既然免费了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>源代码好像就并不是什么机密了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我在思考要不要把源代码放出来<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>最终<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>在 2023 年 7 月 10 日<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我在 V2EX 发布的一条<a href="https://v2ex.com/t/955496">帖子</a>决定免费&amp;开源<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>很多人一听到<em>开源</em>就觉得莫名高大上<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但对我而言<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>开源动机很简单<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><ol><li>我曾经收到过大量来自国内外开发者的邮件<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>询问 Queryable/寻隐的技术细节<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我懒得解释<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>开源能让他们直接从源码中了解模型导出加载<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>计算加速<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>存储排序等细节<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></li><li>开源能打消很多人对于相册隐私的顾虑<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></li><li>我不擅长 Swift 开发<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>并且我认为这个产品中<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我最想做的部分已经完成了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>因此一直抗拒更新<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>但我会持续收到用户的邮件<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>希望增加多选删除<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>左右滑<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>Mac/NAS/Android 支持等等<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我想借助社区的力量<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>让有能力的开发者打磨出更好的产品<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></li></ol><p>开源后<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>的确有人受此启发做出了 Android 版(<a href="https://github.com/mazzzystar/Queryable/issues/12">#12</a>)和<a href="https://www.engineerdraft.com/en/searchable/">Mac 版</a><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>开源后<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这个<a href="https://github.com/mazzzystar/Queryable">项目</a>上了 Github Trending<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我也因此白嫖了一年 Github Copilot<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>开心<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h2 id="有关抄袭">有关抄袭</h2><p>谈抄袭之前<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我想定义一下什么叫抄袭<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><blockquote><p>抄袭是指未经授权或未给予适当信用的情况下<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>直接或间接地使用他人的作品<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>创意<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>或内容<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>将其作为自己的作品或创意发表<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p></blockquote><p>抄袭其实在开源前就出现了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>有人做了 Android 版<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>发布在 Google Play<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>名字和产品介绍完全照搬 Queryable<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我很生气<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但这其实需要同时掌握机器学习和 iOS 开发<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>开源前我只遇到这样一个<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>但开源之后<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>抄袭<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>套壳的人就多了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>因为项目是 MIT 开源<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>所以即使套壳换图标<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>然后重新上架 App Store<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我虽然有点无语<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但也不会说什么<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这是我见过最多的形式<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>开发者全是中文名<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>我其实很支持在原项目的基础上<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>加上用户希望的功能<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>比如多选删除<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>按日期/地点筛选照片<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>UI 比我做得好看的产品——这也是我开源的初衷<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但它们中也有让我感觉不爽的地方<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>比如在社区宣传产品的时候<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>不仅没有致谢<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>被用户提问与寻隐有什么不同时<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>还要踩一脚<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>我比寻隐多了 xx<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>比较恶劣的是<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>用 Queryable 名字套壳上架<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>比如下面这个<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>收费模式是免费+广告<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我很担心用户误以为这是 Queryable 的 Android 版<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>后续出事了找我麻烦<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span><br><img src="/images/2024-07-21/copycat-email.jpg" alt="友好沟通🐶"></p><p>我最开始会伤心<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>正如一位 v2er 在评论区预言得那样<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><em>肯定会有人编译后上架市场的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>开源是好事情<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但是我不希望看到你看到李鬼后伤心</em><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>但虱子多了不痒<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>后来也就慢慢看淡了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h2 id="重新变为付费">重新变为付费</h2><p>重新变成付费是在 2023 年 11 月份<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>除了<em>生存压力增加</em>之外<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我发现<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><p><strong>开源并不能帮助我的产品变得更好</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我原先希望借助开源+免费<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>让专业的移动开发者贡献代码<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>帮助寻隐/Queryable 打磨得更好<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但事实就是<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我看到一个一个新的套壳产品出现<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>宣传自己<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>交互和功能做得很精美<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但它们从不会提交 PR<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我的产品原地踏步<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>甚至因为免费的缘故<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>用户抱怨的邮件比过去更多<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>每当用户发邮件向我提反馈/bug/建议时<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我的第一反应是不耐烦(<em>内心:免费给你用就不错了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>还挑三拣四</em>)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我发现这种想法导致产品越来越落后<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>直到有天被淘汰<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>可一旦收费<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我就突然变得很心平气和了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>面对用户的意见<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>产品反馈的第一反应是感激而不是厌烦<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这会倒逼我不得不花费心血优化产品<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>最终让所有曾经付费的用户用上打磨更好的产品<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而不是疏于维护过几年死掉<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>我就这样不紧不慢<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>偶尔抽空更新产品<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但其实只是修复 bug 以及提升模型效果<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>功能上值得做的很少<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>例如很多人希望可以搜人脸——这会引入人脸识别模型让 app 更慢<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>所以没有加<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h2 id="尾声">尾声</h2><p>Apple 在今年的 WWDC 终于宣布<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>iOS 18 即将支持相册语义搜索了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这比国内的厂商慢许多<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>不过<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>虽然我在 beta 版本还无法体验<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但有理由相信苹果会做得比寻隐/Queryable 好<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>毕竟<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>它禁了第三方 App 跳转到系统相册<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>我也没有闲着<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>上周<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>调研最新 <a href="https://arxiv.org/abs/2311.17049">paper</a>并 重新设计<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>训练了中文文本模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>App 体积从原来的 232M 降低到 159M<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>索引速度翻倍<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>准确率更高<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>训练过程花了 3 天<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span> 70 美元<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span><br><img src="/images/2024-07-21/original-app-size.jpg" alt="原来的体积是232M，来源：扬帆出海 https://www.yfchuhai.com/article/11010.html"></p><p>最近<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>终于将订阅了一年的 GPT-4 切换成了 Claude 3.5 Sonnet<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>它写代码能力过于逆天<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>之前靠 GPT-4 搞不定的<em>多选删除</em><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>拖了一年半后<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>终于在上周开发完成并上线了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>从想法诞生<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>产品上线到现在快两年了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>它陪伴我经历了人生的跌宕起伏<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>见证了好几家咖啡店的倒闭<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我也陪它经历了诞生<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>高峰和低潮<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>并没有像最早幻想的那样赚到钱(<em>100 万人下载每人给我 10 块钱我就…</em>)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>生活依旧继续<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我还是挺喜欢这个平淡的结尾<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>俱往矣<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>迫不及待开始下一个让我废寝忘食的 idea<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>下一次流放到真空中<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p>]]></content>
    
    
      
      
        
        
    <summary type="html">&lt;blockquote&gt;
&lt;p&gt;诞生&lt;span class=&quot;bd-box&quot;&gt;&lt;h-char class=&quot;bd bd-beg&quot;&gt;&lt;h-inner&gt;、&lt;/h-inner&gt;&lt;/h-char&gt;&lt;/span&gt;爆火&lt;span class=&quot;bd-box&quot;&gt;&lt;h-char</summary>
        
      
    
    
    
    
  </entry>
  
  <entry>
    <title>Two Years of an AI Photo Album Search App</title>
    <link href="https://mazzzystar.com/2024/07/21/Two-Years-of-an-AI-Photo-Album-Search-App/"/>
    <id>https://mazzzystar.com/2024/07/21/Two-Years-of-an-AI-Photo-Album-Search-App/</id>
    <published>2024-07-21T15:14:15.000Z</published>
    <updated>2026-04-10T13:45:08.740Z</updated>
    
    <content type="html"><![CDATA[<blockquote><p>Birth, viral success, open-sourcing, plagiarism, and fading away: the journey of a product.</p></blockquote><p>This is a long, long chronicle.</p><h2 id="origins">Origins</h2><p>The story begins on a weekend in May 2022. I was sitting in a bookstore in Beijing, debugging the Disco Diffusion model. At that time, the AI art generation era was just beginning to emerge, and Stable Diffusion had not yet been released. It took 5 minutes to generate a single image on a high-performance V100 GPU.</p><p>I <a href="https://github.com/mazzzystar/disco-diffusion-wrapper">encapsulated</a> the open-source code into an interface that only needed to load the model once and exposed few parameters. I rented a GPU for $300 per month and <a href="https://m.okjike.com/originalPosts/626e7e3833f5cae7ce1beaf2">posted</a> on social media: anyone who wanted to try out AI art generation could send me sentence descriptions, and I would run them on the machine and send the generated images back to them.</p><p>A <a href="https://greatdk.com/">friend</a> suggested, “Why not build a website for everyone to use?” So we created <a href="https://6pen.art/">6pen.art</a>. We quickly gained 1 million users, but then gradually faded into obscurity. This “startup” wasn’t very successful, and I felt that not training a differentiated model was a major factor in our failure, but I won’t elaborate on that here. As mentioned earlier, AI art generation was extremely slow before Stable Diffusion was released, and most of my time was spent on model acceleration, with one of the optimization targets being the CLIP model.</p><p>CLIP is a model released by OpenAI in 2021 that can compare the similarity between any image and a piece of text. In Disco Diffusion, the model uses CLIP to calculate the loss between the generated image and the user’s prompt, continuously optimizing the loss to achieve the desired image generation.</p><p>Sitting in the bookstore, a thought suddenly entered my mind: Since it can compare image-text similarity, could it be used to search photos? After searching, I found that <a href="https://github.com/kingyiusuen/clip-image-search">someone</a> had already done this. The principle is to upload photos to the server, extract features uniformly, input English text and calculate the similarity between the text and each image, then sort to achieve image search.</p><p><img src="/images/2024-07-21/clip-img-search.jpg" alt="https://github.com/kingyiusuen/clip-image-search"></p><p>I started experimenting by uploading my iPhone photo album to the server, and after some testing, I found that the results were surprisingly good! It was especially effective when searching for abstract concepts. For example, when I input “lonely”, it returned these top three photos:</p><p><img src="/images/2024-07-21/search-loney-in-CLIP.jpg" alt="The top 3 returned photos when input 'lonely'"></p><p>Storing photos on a server isn’t a good idea. The place where I have the most photos is my local iPhone photo album. Would it be possible to create a CLIP image search app that runs entirely locally? I really liked this idea and discussed it with friends several times, but each time it came to nothing.</p><p>I was too uncertain:</p><ul><li>I knew nothing about iOS development.</li><li>Apple’s underlying system might not support CLIP model operators.</li><li>Even if it could run, if the indexing speed was slow at 1 second per image, or it took 10 minutes to search, this product would be pointless.</li></ul><p>It wasn’t until 2024 that on-device language models started gaining attention, but in 2022, I couldn’t find a single language model running on-device in the App Store. It seemed… unfeasible, right? I finally forgot about it and continued to focus on developing 6pen.</p><h2 id="echo">Echo</h2><p>The turning point came in early December 2022. Due to some changes, I suddenly found myself in a foreign country (South Korea) where I didn’t speak the language, and my journey with 6pen also came to an end. So, in an empty coffee shop, I sat all day with a laptop and an iced latte. K-pop played in the background, thick snow outside the window, and I ate the shop’s sandwiches when hungry at noon. This was how I spent each day.</p><p>The internet speed was incredibly fast here, there was no COVID testing, and the conversations around me automatically became white noise due to the language barrier. I suddenly felt like I was living in a vacuum. This strange sense of isolation excited me: like an exiled fugitive, who I was and my past no longer mattered. Here, I felt like I could learn and accomplish anything from scratch. It was time to start making a product that truly excited me - this idea grabbed hold of me again.</p><p>But this time, I was no longer afraid to verify its feasibility. I learned to write tokenizers in Swift, researched how to calculate and store features, and learned to use multi-core acceleration for indexing. I asked many naive questions on StackOverflow and had many moments of frustration. But I kept envisioning this scene: entering “coffee and laptop” on my phone, clicking search, and after a rotating animation, this photo jumping out from 30,000 photos in my album, appearing before my eyes.</p><p><img src="/images/2024-07-21/IMG_5091.jpg" alt="On weekdays, the coffee shop is almost empty."></p><p>This fantasy drove me to work tirelessly for 2 weeks, I forgot to eat lunch several times, drinking a latte until evening when I was starving with stomach pain and feeling weak all over. There was a significant time marker: ChatGPT had just been released then. But I was so deeply involved in development that I completely ignored its existence, which might have been the last few times in my life that I asked questions on StackOverflow.</p><p>Anyway, on December 27th, I finally completed the product. I split the text model and image model in CLIP into two separate models, loading them separately:</p><blockquote><p>When building index for the photo album, only load the image encoder, calculate the index and save it. When searching, only load the text encoder, and calculate the cosine distance with the saved index one by one, then return the top K photos with the highest similarity.</p></blockquote><p>Staggered model loading can effectively reduce the software’s memory usage and accelerate index building. At the same time, using multi-core parallel computation of the index can achieve an indexing speed of 2000 photos/minute on my iPhone 12 mini, and searching 10,000 photos takes less than 1s.</p><p><img src="/images/2022-12-28/Queryable-flow-chart.jpg" alt="Queryable Implementation Principles."></p><p>This proved the operators’ support and usable speed. My anxiety finally subsided.</p><h2 id="pricing">Pricing</h2><p>As a pricing genius, my idea was:</p><blockquote><p>Users can download for free, build an index, and search at will. When they have new photos in the future and want to update the index, they need to pay.</p></blockquote><p>The brilliance of this strategy is that only those who really use and like the product need to pay. Those who come to try it out, or find it different from their expectations after trying, don’t need to and won’t pay, thus avoiding users paying for nothing and angrily giving bad reviews.</p><p>But soon, while debugging the code, I discovered a reasonable but funny fact: <em>In-app purchases require an internet connection.</em> This was like a bolt from the blue, because from the beginning, I was determined: “Never allow the App to pop up an internet request under any circumstances”. Why? Because this is a photo album search application, it will scan your entire photo album, and no one knows whether you will upload the user’s photos to some server on Earth after connecting to the internet. I know I could explain “why there’s a pop-up requesting internet permission” in the product, but I didn’t want to fall into the awkward situation of self-justification.</p><p>So I turned it into a paid product: users must purchase it, and then from opening the App, building the index to completing the search, only one “photo album permission request” pops up. I knew this was stupid - subsequent lessons also proved that paid downloads would bring a lot of negative reviews: because the model is so computationally intensive, index building would crash or lag on many devices with small memory, and on the iPhone X series, operator support abnormalities caused all-black search results, all of which would be cursed as “Ripoff”; moreover, users wouldn’t care about the fact that it “doesn’t pop up internet requests”, once the above abnormalities occur, they would delete the App, leave bad reviews, and question where I secretly uploaded their photo albums.</p><p>Anyway, I finally priced it at $3.99, one-time purchase, lifetime use.</p><h2 id="pushing-to market">Pushing to Market</h2><p>I named the product <strong>Queryable</strong>, and posted a rather dramatic status: “I think this app might change the world”. I was so confident about this that I even wrote an email to Tim Cook, hoping Apple would acquire this product (laughs). At that time, I had already started using ChatGPT, but maybe because I was too excited, I forgot to replace my own name. After clicking send, I saw that the email started with “Dear Mr. Cook, My name is [Your Name]”.</p><p>Of course, I didn’t receive a reply in the end.</p><p>I also ambitiously prepared to write an <a href="https://mazzzystar.com/2022/12/29/Run-CLIP-on-iPhone-to-Search-Photos/">article</a> introducing the product, repeatedly deliberating over words, trying to make it the style of a good Hacker News article in my mind. On December 29th, the day the App Store approved it, I immediately submitted my article link to Hacker News, but the system prompted that “the account is too new to submit”. I emailed them to report this issue and asked a friend to post for me using his account.</p><p>The post quickly sank, and I didn’t receive a reply to the email either.</p><p>I was disappointed, but due to receiving many user requests to support Chinese input, I didn’t have time to grieve and immediately threw myself into Chinese model training. Thanks to separating the text model and image model in CLIP, I only needed to find Chinese-English bilingual parallel corpus, train a Chinese text model, and align its output with the English model, which is essentially distillation.</p><p>Soon, on January 18, 2023, I completed the Chinese version, named “Xunyin” (寻隐, 寻: Seek 隐: Hidden), derived from the ancient poem <em>寻隐者不遇</em> by Jia Dao, also implying the meaning of “discovering hidden meanings from the photo album”. After all, my initial shock was realizing that those few photos represented loneliness when I searched for “lonely”.</p><p>After launching, I wrote an article in Chinese introducing this product on a Chinese community site called <a href="https://sspai.com/">sspai</a>. It actually made it to the homepage of the day, which brought a large number of downloads and $1,500 in revenue.</p><p>In early February, I received a reply from a Hacker News editor. He said my account had been mistakenly flagged as SPAM by the system, and encouraged me to repost, saying my article was definitely suitable HN material. He would put the link to my new post in the candidate pool: articles in the pool would randomly enter the bottom of the homepage, and if users upvoted it, the ranking would rise; otherwise, it would sink again.</p><p><img src="/images/2024-07-21/hn-email.jpg" alt="HN editor Daniel placed my link under /pool."></p><p>I found the <a href="https://news.ycombinator.com/item?id=11662380">pool</a> mechanism interesting. The community seemed to want to maintain a hacker spirit under a decentralized mechanism:</p><blockquote><p>This is our long-running experiment in story re-upping. Moderators and a small number of reviewer users comb the depths of /newest looking for stories that got overlooked but which the community might find interesting. Those go into a second-chance pool from which stories are randomly selected and lobbed onto the bottom part of the front page. This guarantees them a few minutes of attention. If they don’t interest the community they soon fall off, but if they do, they get upvoted and stay on the front page.</p><p><a href="https://news.ycombinator.com/item?id=11662380">https://news.ycombinator.com/item?id=11662380</a></p></blockquote><p>At 7 PM that evening, my <a href="https://news.ycombinator.com/item?id=34686947">post</a> shot to #2 on the Hacker News front page.<br><img src="/images/2024-07-21/queryable-on-hacker-news.jpg" alt="Article at #2 on Hacker News front page"></p><p>That night, I kept refreshing my phone every 10 seconds, the excitement lasting from midnight when I lay down until 3 AM. I was constantly replying to discussions under the post and responding to bug report emails. Someone taught me how to use LSH to improve search speed, someone suggested how to map photo coordinates to cities without internet connection, and others discussed why it failed to run on iPhone X.</p><p>This feeling seemed unrelated to how many downloads the product had or how much money it made: you created something, received praise from a large group of peers, excited discussions, and suggestions. It’s a rare experience in life, and once is quite satisfying.</p><p>One comment in particular caught my attention:<br><img src="/images/2024-07-21/same-idea-product.jpg" alt="Somewhere else on Earth, someone is doing exactly the same thing as me"></p><p>I read the author’s development log and discovered that we were like two minds on Earth randomly generating the same idea. I even tried his yet-to-be-launched product on TestFlight, feeling a strange sense of kinship.</p><h2 id="hacker-news is the world's bulletin board">Hacker News is the World’s Bulletin Board</h2><p>That night, I fell asleep excited, thrilled that my product was liked by my peers. As it turned out, I had greatly underestimated the influence of Hacker News: In just two days, Queryable almost swept the #1 spot on the tools category in all European countries, and #2 in the US, with a total revenue of $2,800. For the next few days, the first thing I did when I woke up was to check Gmail. Germany, France, Spain, the USA, emails came from all directions like snowflakes: some reporting bugs, the magazine from German wanting to cover the story, french iOS communities would like to share the app, YouTuber wanting to review the app. My Twitter would also constantly receive notifications because people were retweeting the Hacker News hot list.</p><p><img src="/images/2024-07-21/queryable-us-fr-de-es-ranking.jpg" alt="#2 paid tool in the US, #1 in France, Germany, Spain, and more."></p><p>Even a friend working in the Apple Photos team told me that their team knew about Queryable.</p><p>All of this forced me to realize a kind of “worldliness”: Hacker News doesn’t just belong to the United States or English-speaking countries; it’s like a virtual bulletin board in the central area of the world. Each work briefly stays on it, but countless eyes from various countries are always watching. This product actually only supports English, but that didn’t prevent it from reaching #1 on the paid tools charts in almost all European countries. They seem to naturally accept products that can only be used in English. Products that work in the US often can be accepted by Europeans as well.</p><p>The buzz quickly subsided, followed by many negative reviews.</p><p>I didn’t see any comments on Hacker News that made me uncomfortable. However, after secondary dissemination through various websites, YouTube channels, and online communities, less friendly users surfaced. There were mainly two points of attack:</p><ol><li>Fear that I would steal their photo album privacy.</li><li>I am a Chinese developer.</li></ol><p>The second point made the situation of the first point worse. After the successive waves of attention from Hacker News ended, the sales of this product in Western markets quickly became dismal, with single-digit downloads and tens of dollars in revenue every day.</p><p>In my imagination, something that changes the world is unrivaled, how could it suddenly stop? I fell into great doubt and pessimism.</p><h2 id="free-&amp; open source">Free &amp; Open Source</h2><p>Fortunately, exposure from Chinese domestic social media accounts like Chapingjun (差评) and Guokr (果壳) promoting Xunyin allowed me to earn about $1,500 per month starting from January. And because the model runs on the user’s own device, there were no server costs.</p><p>From April onwards, without any traffic exposure and without any updates, I could earn an average of about $400 per month.</p><p>But! I still thought this was a very useful product, I just couldn’t let many people know about it. I had a limited-time free promotion once, and the downloads that day exceeded the past daily average by 100 times. I thought, rather than maintaining this income of $400 per month, which might prevent 99% of people from discovering this product, why not let everyone use it, so I decided to make it free forever.</p><p>Since it was now free, the source code didn’t seem to be a secret anymore, and I was considering whether to release it. Finally, on July 10, 2023, I made it free and open source. Many people regard “open source” with awe and apprehension, but for me, the motivation for open sourcing was simple:</p><ol><li>I had received numerous emails from developers worldwide inquiring about Queryable/Xunyin’s technical details. Rather than explaining individually, open sourcing would allow them to understand the details directly from the source code.</li><li>Open sourcing could dispel many people’s concerns about photo album privacy.</li><li>I’m not skilled at Swift development, and I believe I had already completed the part of this product that I was most interested in doing. However, I kept receiving emails from users hoping to add features like multi-select deletion, left and right swipe, Mac/NAS/Android support, etc. I wanted to leverage the power of the community, allowing capable developers to refine the product further.</li></ol><p>Indeed, after open-sourcing, some people were inspired to create Android(<a href="https://github.com/mazzzystar/Queryable/issues/12">#12</a>) and <a href="https://www.engineerdraft.com/en/searchable/">Mac</a> versions.</p><p><a href="https://github.com/mazzzystar/Queryable">Open-sourcing</a> this project also made it to GitHub Trending, and I got a free year of GitHub Copilot because of it, which made me happy.</p><h2 id="plagiarism-and repackaging">Plagiarism and Repackaging</h2><p>Before talking about plagiarism, I want to define what plagiarism is:</p><blockquote><p>Plagiarism refers to the direct or indirect use of others’ works, ideas, or content without authorization, presenting it as one’s own work or idea.</p></blockquote><p>Actually, plagiarism appeared even before open sourcing. Someone made an Android version and released it on Google Play, completely copying the name and product description of Queryable. I felt angry, but this actually requires mastery of both machine learning and iOS development. Before open sourcing, I only encountered one such person.</p><p>But after open-sourcing, there were many more copycats and repackagers. As the project has an MIT open-source license, even when they repackaged it with a different icon and re-launched it on the App Store, I felt a bit speechless but wouldn’t say anything. These were the most common cases I saw.</p><p>More malicious ones use the Queryable name to repackage and launch, like the one below, with a free + ad revenue model. I’m very worried that users might mistakenly think this is the Android version of Queryable and come to me with problems later.</p><p>I would feel sad at first, but when there are too many lice, you stop itching, and later I gradually became indifferent.</p><h2 id="becoming-paid again">Becoming Paid Again</h2><p>It became paid again in November 2023. Besides the increased income pressure, I found that:</p><blockquote><p>Open sourcing didn’t help my product become better.</p></blockquote><p>I originally hoped that by making it open source and free, professional mobile developers would contribute code to help polish Queryable/Xunyin. But the reality is that those developers who created new apps with better UIs and functionality never sent PRs to my repo. As a result, my product stagnated, and because it was free, I received even more complaints from users than before.</p><p>Whenever users emailed me with feedback/bugs, my first reaction was impatience (internally: <em>It’s good enough that I let you use it for free, and you’re still picky</em>). I found that this mindset led to the product falling further and further behind, until one day it would be eliminated.</p><p>But once it’s paid, I would calmly deal with users’ opinions and improve the product. My first reaction to receiving feedback is gratitude rather than annoyance. This would force me to inevitably spend effort optimizing the product, ultimately letting all users who have paid use a better-polished product, rather than neglecting maintenance and dying in a few years.</p><h2 id="epilogue">Epilogue</h2><p>Apple finally announced at this year’s WWDC that iOS 18 will support semantic search for photos, although I still can’t use it in the latest iOS 18 beta, there’s reason to believe Apple will do better than Queryable.</p><p>It’s been almost two years since the idea was born and the product was launched. It has accompanied me through life’s ups and downs and witnessed the closure of several coffee shops. I’ve also accompanied it through its birth, peak, and low tide. It hasn’t made money as I initially fantasized (<em>If 1 million people download it and each gives me $1, I would…</em>).</p><p>Life goes on, and the story isn’t over yet: Last week, I researched the latest paper and redesigned and trained the Chinese text model. The app size was reduced from 232MB to 159MB, indexing speed doubled, accuracy improved, and the training process took 3 days and cost $70.</p><p>Moreover, the recently subscribed Claude 3.5 Sonnet’s code writing ability is incredibly powerful. The “multi-select deletion” feature that I couldn’t manage with GPT-4 before was finally developed and launched last week after a year and a half delay. I quite like this calm ending.</p><p>I can’t wait to start the next idea that will keep me up all night, the next exile into a vacuum.</p>]]></content>
    
    
      
      
        
        
    <summary type="html">&lt;blockquote&gt;
&lt;p&gt;Birth, viral success, open-sourcing, plagiarism, and fading away: the journey of a product.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is a</summary>
        
      
    
    
    
    
  </entry>
  
  <entry>
    <title>消费折叠</title>
    <link href="https://mazzzystar.com/2023/12/27/folding-consumption-zh/"/>
    <id>https://mazzzystar.com/2023/12/27/folding-consumption-zh/</id>
    <published>2023-12-27T16:42:12.000Z</published>
    <updated>2026-04-10T13:45:08.740Z</updated>
    
    <content type="html"><![CDATA[<blockquote><p>每一个平替商品的搜索技巧背后<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>都隐藏着一种对生活的<em>折叠</em><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p></blockquote><p>起因是我看到了一条帖子<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>讲如何通过替换搜索词<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>实现以更低的价格购买相同功能的商品<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>例如<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><div class="highlight"><pre class="code"><code>瑜伽垫 -&gt; 瑜伽垫男照片墙 -&gt; 渔网野餐布 -&gt; 防水桌布...</code></pre></div><p>我试了一下<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>有些技巧已经失效了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>有些的确便宜许多<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>于是我在小红书搜索<span class="bd-box"><h-char class="bd bd-end"><h-inner>「</h-inner></h-char></span>替换词<span class="bd-box"><h-char class="bd bd-beg"><h-inner>」</h-inner></h-char><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>发现许多收藏过万的帖子<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>它们通常是一组图片<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>每张图都是新/旧商品名的对照列表<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>如下所示<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><p><img src="/images/2023-12-27/xiaohongshu-screenshots.jpg" alt="小红书上的替换词照片"></p><p>这些对照表很好<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但不好用<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>当我真的想搜某个商品时<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我需要先<strong>对着密密麻麻的图表查找</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>不仅如此<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>当我<strong>想搜的商品不在列表里时<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>它就失效了</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我想<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>能否用 AI 来实现这个任务呢<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span><em>训练一个模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>用户输入想搜索的商品<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>模型给出便宜的平替商品名<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></em></p><p>我很快整理了一些数据并开始训练模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>OpenAI 已经<a href="https://platform.openai.com/finetune">支持</a>finetune 模型了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>你只需要把数据导出并上传就可以自动开始训练<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我选择的模型是<code>gpt-3.5-1106</code><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>训练花费了大概 10 分钟<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>最后<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我搭建了一个网页用于模型调用<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>得益于开发工具的完善<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我只用了一下午就完成了这个 demo 产品<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h2 id="消费折叠">消费折叠</h2><p><em>网页开发最快乐的部分是注册域名</em><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>很快我就选好了网址: <a href="https://www.pingti.xyz/">pingti.xyz</a><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>便宜<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>好记<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>并让所有朋友都试了试<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>很多人都觉得好玩<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>虽然有些结果比较离谱(<code>牙膏-&gt;足浴店小样, iPhone-&gt;二手 iPhone, 唇膏-&gt;蜡烛</code>)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但有些还是蛮有用的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>至少模型可以记住截图里的平替词<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>不必一个个找了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p><img src="/images/2023-12-27/test-example.png" alt="朋友的一些测试结果"></p><p>仍然有一些比较差的结果<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我开始思考怎么优化<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这使我不得不仔细分析原始训练数据<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>看看模式上有什么规律<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>总结如下<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><ul><li><em>性别套利</em>: 例如<code>瑜伽垫-&gt;瑜伽垫男</code><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>或<code>遮阳伞-&gt;雨伞男</code><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这背后反应的其实是男人比女人更在乎实用性和性价比<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>更少为了颜值<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>设计而买单<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而设计产生了成本<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></li></ul><ul><li><em>场景套利</em>: 例如<code>马甲-&gt;老头马甲</code><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>因为老年人更在乎价格, <code>女包-&gt;包包尾货</code><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这个不用解释, <code>地毯-&gt;办公室地毯, 书桌-&gt;培训桌, 椅子-&gt;婚礼用椅</code><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>是因为在办公室<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>培训<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>举办婚礼时<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>通常会使用更便宜的材质吗<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span></li><li><em>地域套利</em>: 例如<code>袜子-&gt;诸暨袜子, 耳饰-&gt;义乌耳饰</code><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>因为中国的袜子主要来自诸暨<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>所以通过指定原产地可以获得更低的价格<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></li><li><em>无法归类</em>: 这一类最有意思<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>它们的特点是<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span><strong>两个商品几乎八竿子打不着</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但是它们在<span class="bd-box"><h-char class="bd bd-end"><h-inner>「</h-inner></h-char></span>功能<span class="bd-box"><h-char class="bd bd-beg"><h-inner>」</h-inner></h-char></span>上可以实现接近平替的效果<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>例如<code>照片墙-&gt;渔网, 相框-&gt;营业执照框, 面膜收纳-&gt;食品保鲜盒, iPad支架-&gt;菜谱架, 美甲灯-&gt;验钞灯, 乐高防尘罩-&gt;超市陈列盒</code><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这其中每个平替商品的搜索技巧背后<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>都隐藏着一种对生活的折叠<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>不信你可以仔细品味<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></li></ul><p>我意识到<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>靠简单的 finetune 模型<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>也许可以学会前两种套利模式<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>叠加规则(给商品分配相应产业的城市)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>也许可以学会第三种模式<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但学会最后一类平替方式几乎不可能<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>即使是人类也需要大量的实践积累才能摸索出其中的奥妙<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>郝景芳在<a href="https://book.douban.com/subject/35640709/">北京折叠</a>中讲述了不同社会阶层在空间和时间上的折叠<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我觉得<span class="bd-box"><h-char class="bd bd-end"><h-inner>「</h-inner></h-char></span>商品平替<span class="bd-box"><h-char class="bd bd-beg"><h-inner>」</h-inner></h-char></span>似乎是这种折叠所露出的缝隙<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我回想起自己第一次购买维生素 C 的场景<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>在知乎上搜汤臣倍健的维生素 C 和医院有什么差别<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>结论是<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><em>前者比较甜</em><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>后来每次我都买东北制药维生素 C<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>二者价格差了 100 倍<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>经济下行<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>每个人都在缩减开支<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>想象一下<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><em>未来的某一天<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我坐在婚礼专用椅上<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>墙上挂着一张渔网<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>上面是我的照片<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我面前的培训桌上摆着菜谱架<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>iPad 正在播放视频<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></em> 这个场景还蛮好笑的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但人生还要继续<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>无论商品怎样平替<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>人生是无法平替的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>重要的不是渔网<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而是渔网上的照片<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p>]]></content>
    
    
      
      
        
        
    <summary type="html">&lt;blockquote&gt;
&lt;p&gt;每一个平替商品的搜索技巧背后&lt;span class=&quot;bd-box&quot;&gt;&lt;h-char class=&quot;bd bd-beg&quot;&gt;&lt;h-inner&gt;，&lt;/h-inner&gt;&lt;/h-char&gt;&lt;/span&gt;都隐藏着一种对生活的&lt;em&gt;折叠&lt;/em&gt;&lt;span</summary>
        
      
    
    
    
    
  </entry>
  
  <entry>
    <title>Folding Consumption</title>
    <link href="https://mazzzystar.com/2023/12/27/folding-consumption/"/>
    <id>https://mazzzystar.com/2023/12/27/folding-consumption/</id>
    <published>2023-12-27T14:21:32.000Z</published>
    <updated>2026-04-10T13:45:08.741Z</updated>
    
    <content type="html"><![CDATA[<blockquote><p>Behind every search trick for a substitute product lies a form of <em>folding</em> in life.</p></blockquote><p>It all started when I saw a post about how to use alternative search terms to buy products with the same functionality at a much lower price. For example:</p><div class="highlight"><pre class="code"><code>Yoga Mat -&gt; Men<span class="hljs-symbol">&#x27;s</span> Yoga MatPhoto Wall -&gt; Fishing NetPicnic Cloth -&gt; Waterproof Tablecloth...</code></pre></div><p>I tried a few of these tricks and found that some no longer worked, while others indeed offered significant savings. Consequently, I searched for “substitution keywords” on Xiaohongshu (小红书) and discovered many posts with over 10,000 likes. Typically, these posts include a series of images, each displaying a list of product name pairs, comparing expensive products with their cheaper alternatives, as shown below:</p><p><img src="/images/2023-12-27/xiaohongshu-screenshots-en.png" alt="Substitution keyword photos on Xiaohongshu"></p><p>These comparison tables are useful but not user-friendly. When I actually want to search for a product, I need to <strong>sift through these dense tables</strong>. Moreover, <strong>when the product I’m looking for isn’t listed, they become useless</strong>. I wondered if AI could enhance this experience. <em>Imagine training a model where users input the product they’re searching for, and the model suggests alternative keywords for a cheaper option.</em></p><p>I quickly organized some data and began training the model. OpenAI already <a href="https://platform.openai.com/finetune">supports</a> finetuning models. Simply export and upload the data, and it will start training automatically. I chose the <code>gpt-3.5-1106</code> model, and the training took about 10 minutes. Finally, I built a webpage for model invocation. Thanks to the comprehensive development tools, I was able to complete this demo product in just an afternoon.</p><h2 id="folding-consumption">Folding Consumption</h2><p><em>The most enjoyable part of web development is always selecting a domain name.</em> I quickly chose the URL: <a href="https://www.pingti.xyz/">pingti.xyz</a>, which is both cheap and memorable. I encouraged all my friends to try it, and most of them found it fun, although some results were quite outrageous (<em>Toothpaste -&gt; Foot bath shop sample, iPhone -&gt; Used iPhone, Lipstick -&gt; Candle</em>). However, many were actually quite useful. At the very least, the model remembers the substitute terms from the screenshots, eliminating the need to search one by one.</p><p><img src="/images/2023-12-27/test-example-en.png" alt="Some test results from my friends"></p><p>There are still some less accurate results, which made me think about how to optimize the model. This necessitated a close analysis of the original training data to identify patterns. The main patterns are:</p><ul><li><p><em>Gender Arbitrage</em>: For example, <em>Yoga Mat -&gt; Men’s Yoga Mat, or Sun Umbrella -&gt; Men’s Umbrella</em>, reflecting the tendency of men to prioritize functionality and cost-effectiveness over aesthetics and design, which adds to the cost.</p></li><li><p><em>Scenario Arbitrage</em>: For instance, <em>Vest -&gt; Old Man’s Vest</em> as elderly people care more about price, <em>Women’s Bag -&gt; End-of-Batch Bag</em> needs no explanation, <em>Carpet -&gt; Office Carpet, Desk -&gt; Training Desk, Chair -&gt; Wedding Chair</em> - perhaps because cheaper materials are used in offices, training, and weddings?</p></li><li><p><em>Regional Arbitrage</em>: For example, <em>Socks -&gt; Zhuji Socks, Earrings -&gt; Yiwu Earrings</em>, because a majority of China’s socks are produced in Zhuji, specifying the city can lead to lower prices.</p></li><li><p><em>Unclassifiable</em>: This is the most intriguing part. The key point is that although both products are almost entirely dissimilar and have no common ground, the cheaper one can nearly substitute the more expensive one in terms of functionality. Examples include <em>Photo Wall -&gt; Fishing Net, Photo Frame -&gt; Business License Frame, Face Mask Storage -&gt; Food Preservation Box, iPad Stand -&gt; Recipe Holder, Nail Lamp -&gt; Money Checker Lamp, Lego Dust Cover -&gt; Supermarket Display Box</em>. Each of these substitution tricks reveals a ‘folding’ of life, which might lead you to ponder.</p></li></ul><p>I realized that, even though a simple fine-tuned model might learn the first two arbitrage patterns and maybe the third with additional rules (assigning cities to respective industries), it could never learn the last type of substitution. These patterns are the ones that even humans need extensive practical experience to figure out.</p><p>Hao Jingfang’s <a href="https://book.douban.com/subject/35640709/">Folding Beijing</a> tells a story of different social classes living in the same city but never intersecting in space and time. I think the concept of ‘product substitution’ offers a glimpse into this folding phenomenon. I recall my first purchase of Vitamin C: I searched on Zhihu for the differences between the expensive brand of Vitamin C and the one available in hospitals. The conclusion was that <em>the former is sweeter</em>. Since then, I have always purchased Vitamin C from Northeast Pharmaceutical, priced at 1/100th of the former.</p><p>In an economic downturn, everyone is cutting costs. Imagine this: <em>One day in the future, I sit on a wedding-specific chair, with a fishing net on the wall displaying my photos, and in front of me on a training desk sits a recipe holder holding an iPad playing a video.</em> It’s a funny scene, but life goes on. Regardless of how products are substituted, life itself is irreplaceable: it’s not about the net, but the photos on it.</p>]]></content>
    
    
      
      
        
        
    <summary type="html">&lt;blockquote&gt;
&lt;p&gt;Behind every search trick for a substitute product lies a form of &lt;em&gt;folding&lt;/em&gt; in life.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It all</summary>
        
      
    
    
    
    
  </entry>
  
  <entry>
    <title>在2023年底做一个古典的信息共享工具</title>
    <link href="https://mazzzystar.com/2023/12/07/sublink-zh/"/>
    <id>https://mazzzystar.com/2023/12/07/sublink-zh/</id>
    <published>2023-12-07T18:03:21.000Z</published>
    <updated>2026-04-10T13:45:08.741Z</updated>
    
    <content type="html"><![CDATA[<blockquote><p>TL;DR: 此文讲述了我所经历的信息获取危机<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>以及因此创造 Sublink 这一链接分享网站而引发的思考<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p></blockquote><p>上一份工作<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>公司内网有一个论坛<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>你可以创建一个圈子并在里面发帖<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>感兴趣的人可以加入圈子<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>投稿<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我创建了一个名叫<span class="bd-box"><h-char class="bd bd-end"><h-inner>「</h-inner></h-char></span>知新<span class="bd-box"><h-char class="bd bd-beg"><h-inner>」</h-inner></h-char></span>的圈子<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>简介这么写<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><p><em>这是一个旨在<span class="bd-box"><h-char class="bd bd-end"><h-inner>「</h-inner></h-char></span>共享大脑<span class="bd-box"><h-char class="bd bd-beg"><h-inner>」</h-inner></h-char></span>的计划<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>把本周所知道的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>你感兴趣的领域的新事件<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>新进展告诉大家<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这样我们就能和其他人共享彼此的大脑了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>你不需要对你分享的东西有<span class="bd-box"><h-char class="bd bd-end"><h-inner>「</h-inner></h-char></span>任何一丁点<span class="bd-box"><h-char class="bd bd-beg"><h-inner>」</h-inner></h-char></span>的专业背景知识<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>让大家知道有这个东西<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>就已经就足够的意义<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></em></p><p><img src="/images/2023-12-07/imessage1.png" alt="这个Logo很合适对吧？"></p><p>每周<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我会在圈子里分享这周我看到的新事物<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我分享了 DALLE-2<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>Azuki<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>疫苗有效率数据等<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>当我投稿 2 期后<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我创建的圈子热度已经进入了论坛的前三名<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>有很多人订阅和点赞<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>但并没有人分享<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>他们只是消费者<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>坚持到第三期还是第四期后<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我放弃了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char><h-char class="bd bd-end"><h-inner>「</h-inner></h-char></span>创建一个频道并得到许多人为我欢呼<span class="bd-box"><h-char class="bd bd-beg"><h-inner>」</h-inner></h-char></span>并不是我的初衷<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我无法共享其他人的大脑<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>我开始思考<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>是否存在某种形式<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>能让人和人之间的信息分享更轻松<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>更没有压力<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>在今年 9 月份<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我想到了<span class="bd-box"><h-char class="bd bd-end"><h-inner>「</h-inner></h-char></span>链接合集<span class="bd-box"><h-char class="bd bd-beg"><h-inner>」</h-inner></h-char><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><blockquote><p>你可以为你喜欢的任何事物<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>创建一个合集<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>并分享给朋友<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>当你有了新发现时<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>将链接加入这个合集<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>你的朋友就能看见<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p></blockquote><p>在 8 月底<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我将这个想法分别告诉了两个前端朋友<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>一个来自法国<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我们通过<a href="https://github.com/mazzzystar/Queryable">Queryable</a>这个产品认识<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>另一个在字节跳动<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>他们都对此表现出很大的兴趣<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但他们实在太忙了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>直到 11 月初<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这个产品的进展一直几乎停滞<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>迫于无奈<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我只好自己从 0 学习<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>现在是 2023 年<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>Next.js + Typescript + Prisma + Supabase 在我看来是对从 0 开始学习全栈最好的一条路线<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>整个 11 月份我都在 YouTube 的帮助下<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>学习 React State, Next.js App Router<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>终于在 11 月下旬<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我几乎入门了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>之后就是两周的开发<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>在昨天<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我终于上线了这个网站<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>简单来说<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>你可以用它来整理日常看到好玩的电影<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>文章<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>视频<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>然后分享给你的朋友<span class="bd-box"><h-char class="bd bd-beg"><h-inner>；</h-inner></h-char></span>哦对<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>它还支持 RSS<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>当你添加新链接时<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>使用任何 RSS 客户端订阅你的 Collection 的人会收到通知<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p><img src="/images/2023-12-07/sublink-screenshot.jpg" alt="Sublink(之前的)首页"></p><p>但问题仍然存在<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>它有什么用<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span></p><h2 id="小网站发现危机">小网站发现危机</h2><p>我很喜欢<a href="http://radio.garden/">Radio Garden</a>, 在那里我甚至可以听到来自<a href="http://radio.garden/visit/pyongyang/2uWowTBd">平壤</a>的实时广播电台<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我也喜欢 Neal 的密码游戏<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这个互联网有太多闪光的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>有趣的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>独特的网站<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但并没有一个真正的地方可以储藏<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>展示它们<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>每当我阅读一篇不错的 Blog<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我就会给多个好朋友分享链接<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>讨论他们<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>然后遗忘<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>上周我看到了一篇 Hacker News 热帖<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span><a href="https://www.marginalia.nu/log/19-website-discoverability-crisis/">The Small Website Discoverability Crisis</a> 创造了这个名词<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>基于 Page Rank 算法的搜索引擎<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>催生了内容农场和链接农场<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>所谓内容农场<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>就是有些网站为了提升自己在 Google 搜索中的关键词权重<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>用 AI 或者脚本生成了大量和产品无关的内容<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>从而提升自己的关键词排名<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>链接农场则是同一个网站主之间互相的 backlink<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>或者通过购买其他人的 backlink<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>来提升网站 pr 排名<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>这些都会导致真正有价值的网站在搜索引擎中成为死区<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h2 id="信息茧房">信息茧房</h2><p>推荐算法已经包围了我们<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我们当然可以向 GPT 提问获取新知<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但<span class="bd-box"><h-char class="bd bd-end"><h-inner>「</h-inner></h-char></span>你无法提出一个不存在的问题<span class="bd-box"><h-char class="bd bd-beg"><h-inner>」</h-inner></h-char><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>所以即使拥有世界上最博学的老师<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>如果我们无法产生新的突破自己认知的提问<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>就很难获得打开视野的解答<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>例如, 你知道<strong>阿伦森效应</strong>吗<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span></p><p>这是为什么<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>在 LLM 时代更需要来自人工的订阅内容<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>让不经过神经网络分发的信息以陌生的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>粗糙的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>不舒服的方式进入我的脑袋<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这当然不是绝大多数人需要的东西<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但我猜不止我一个人需要它<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h2 id="加入-llm?">加入 LLM?</h2><p>这个产品如果放在 15 年前做<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>并没有任何不同<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>那么在 2023 年底<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>能否在网站里加入当下最火的 LLM 呢<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span>我想到了 2 点<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><h4 id="1自动摘要">1.自动摘要</h4><p>最早<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我想用 LLM 提取网页 description<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>作为每条 Link 的简介<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但最终放弃<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>LLM 按使用量付费<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这意味着我无法控制每个月的使用成本<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>当用户大量添加 Link 时<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>文章总结的开支可能会变得高昂<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这一点对于显然无法盈利的网站是非常可怕的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>想要网站获得更久一点<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>就让不可控的开支尽量降低<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h4 id="2-语言分发网络(language delivery network, ldn)">2. 语言分发网络(Language Delivery Network, LDN)</h4><p>我曾经设想过一个东西<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>叫它语言 CDN 好了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>当我写 Blog 的时候<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我希望让全世界不同国家<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>不同语言的人看到他们当地语言的版本<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>一个简单的方法是让用户使用浏览器内置的翻译/安装翻译插件<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这可能会造成大量的文章被翻译很多遍<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>不仅浪费显卡推理成本<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>也使得作者无法控制翻译质量<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>由此我产生了 LDN 的想法<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>某个作者写了一篇 blog<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>或者他制作了一个网站<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>他可以使用一个 LDN 平台托管自己的多语言版本<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>每当有一个新的语言使用者访问了网络<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但 LDN 并没有缓存该语言时<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>翻译一次<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>之后其他同一语言使用者可以使用之前翻译的缓存<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>并且作者可以本地管理多国语言的翻译原始数据<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>从而控制和调整翻译的措辞<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>我曾经在 bearblog 这个开源 Blog 系统上提过<a href="https://bear.nolt.io/148">Feedback</a><span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但没有得到正向回应<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>也许之后我会试试在 Sublink 上加入这个特征<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但前提是有人愿意在 Sublink 原创文章<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p><img src="/images/2023-12-07/bearblog-suggestion.png" alt="Bearblog Feedback #148"></p><h2 id="定价模型">定价模型</h2><p>我对这个网站能赚钱毫不抱希望<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但我希望它能活的足够久<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>以尽可能多地沉淀 Link<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>Collection<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span><br>所以我需要从它上面获得收益<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>我其实有些困惑<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>Sublink 的核心到底是<span class="bd-box"><h-char class="bd bd-end"><h-inner>「</h-inner></h-char></span>订阅优质合集<span class="bd-box"><h-char class="bd bd-beg"><h-inner>」</h-inner></h-char></span>还是<span class="bd-box"><h-char class="bd bd-end"><h-inner>「</h-inner></h-char></span>个人链接合集整理工具<span class="bd-box"><h-char class="bd bd-beg"><h-inner>」</h-inner></h-char><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span>这个分歧会产生两种不同的逻辑<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>前者的首页是其他人的 Collection<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>后者的首页是输入框<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>前者依赖高质量内容<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>早期很难<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>这种分歧导致付费模式的差异<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>付费订阅合集不是不可以<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但几乎很难收费<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>如果按工具逻辑<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>就是 Notion 的思路<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>付费可能性更高<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但这个网站<span class="bd-box"><h-char class="bd bd-end"><h-inner>「</h-inner></h-char></span>让优秀链接被更多人看到<span class="bd-box"><h-char class="bd bd-beg"><h-inner>」</h-inner></h-char></span>的价值就不复存在了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>我现在卡在这里了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>所以我现在压根没做付费<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>也许之后能从用户使用和反馈中获取更多判断依据<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p><img src="/images/2023-12-07/two-approaches.jpg" alt="两种产品设计哲学"></p><p>以上<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>就是我做 Sublink 的全部思考<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我自己也是 Sublink 的使用者(这是我创造它的原因)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我会把自己日常看过觉得不错的<a href="https://www.sublink.app/collections/381d6912-f5fa-48a6-96ec-7a664596577c">文章列表</a>放在这里<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>你可以登录并订阅<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>或用任意 rss 客户端订阅这个列表<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>当我添加新链接时<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>你可以看到<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span> 你也可以把自己喜欢的视频<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>电影<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>书籍<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>网页或一切带链接的东西变成合集<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>分享给朋友<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>也许<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这是我对于即将被 GPT 生成内容吞噬的世界<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>所做出的一点微不足道的反抗<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p>]]></content>
    
    
      
      
        
        
    <summary type="html">&lt;blockquote&gt;
&lt;p&gt;TL;DR: 此文讲述了我所经历的信息获取危机&lt;span class=&quot;bd-box&quot;&gt;&lt;h-char class=&quot;bd bd-beg&quot;&gt;&lt;h-inner&gt;，&lt;/h-inner&gt;&lt;/h-char&gt;&lt;/span&gt;以及因此创造 Sublink</summary>
        
      
    
    
    
    
  </entry>
  
  <entry>
    <title>Retro Link-Sharing in 2023</title>
    <link href="https://mazzzystar.com/2023/12/07/sublink/"/>
    <id>https://mazzzystar.com/2023/12/07/sublink/</id>
    <published>2023-12-07T15:49:25.000Z</published>
    <updated>2026-04-10T13:45:08.741Z</updated>
    
    <content type="html"><![CDATA[<blockquote><p>TL;DR: This piece delves into my personal experience with an information access crisis and my reflections on developing Sublink, a platform dedicated to link-sharing.</p></blockquote><p>At my previous job, we had a company forum where you could create discussion groups and post within them. Interested people could join these groups and contribute. I created a group called “Knew” with this introduction:</p><p><em>This is a project aimed at ‘brain-sharing’: every week, you can share the latest events and developments in your areas of interest with everyone. This way, we can all share our brains with each other. You don’t need to have “even the slightest” professional background in what you’re sharing; simply making others aware of its existence is already meaningful enough.</em></p><p><img src="/images/2023-12-07/imessage1.png" alt="Quite appropriate, right?"></p><p>Each week, I’d share new discoveries in the group. I shared about DALLE-2, Azuki, vaccine efficacy data, and more. After just two posts, my group’s popularity had already reached the forum’s top three, with many subscribers and likes.</p><p>However, no one else was sharing; they were merely consumers. After the third or fourth edition, I gave up. “Creating a channel and getting many people to cheer for me” wasn’t my original intention; I couldn’t access other people’s brains.</p><p>I began to ponder if there was a way to make information sharing between people easier and less pressured. In September this year, I came up with the idea of “link collections”:</p><blockquote><p>You could create a collection for anything you like and share it with friends. When you make a new discovery, you add the link to this collection, and your friends can see it.</p></blockquote><p>In late August, I shared this idea with two frontend developer friends: one from France, whom I met through the <a href="https://github.com/mazzzystar/Queryable">Queryable</a> project, and another from ByteDance. They both showed great interest, but they were extremely busy. Until early November, progress on this product was almost at a standstill.</p><p>Left with no choice, I had to learn from scratch: In 2023, Next.js + Typescript + Prisma + Supabase seemed to me the best path for learning full-stack development from zero. I spent the entire month of November learning React State and Next.js App Router with the help of YouTube, and by late November, I had nearly grasped the basics.</p><p>Then came two weeks of development. Yesterday, I finally launched the website: In simple terms, you can use it to organize interesting movies, articles, and videos you come across daily, then share them with your friends. Oh, and it also supports RSS: when you add new links, people who subscribe to your Collection using any RSS client will receive notifications.</p><p><img src="/images/2023-12-07/sublink-screenshot.jpg" alt="Sublink (Former) Homepage"></p><p>But the question remains: What’s it good for?</p><h2 id="the-small website discoverability crisis">The Small Website Discoverability Crisis</h2><p>I’m a fan of <a href="http://radio.garden/">Radio Garden</a>, where I can even listen to live radio from <a href="http://radio.garden/visit/pyongyang/2uWowTBd">Pyongyang</a>. I also love Neal’s password games. The internet hosts a myriad of shiny, intriguing, and unique sites, yet there’s no dedicated place to store and showcase them. Whenever I read a good blog, I share the link with multiple friends, discuss it with them, and then forget about it. Last week, I saw a trending article on Hacker News: <a href="https://www.marginalia.nu/log/19-website-discoverability-crisis/">The Small Website Discoverability Crisis</a> which coined this term.</p><p>Search engines based on the Page Rank algorithm have led to the rise of content farms and link farms. Content farms are websites that generate large amounts of irrelevant content using AI or scripts to improve their keyword rankings in Google searches. Link farms involve mutual backlinking between website owners or purchasing backlinks to boost a site’s PR ranking.</p><p>All of this results in truly valuable websites becoming dead zones in search engines.</p><h2 id="information-cocoon rooms">Information Cocoon Rooms</h2><p>We’re surrounded by recommendation algorithms. Sure, we can ask GPT for new knowledge, but “you can’t ask about something you don’t know exists.” So even with the world’s most knowledgeable teacher, if we can’t generate new questions that break through our own cognition, it’s hard to get answers that open our horizons. For instance, do you know about the <strong>Aronson Effect</strong>?</p><p>That’s why, in the age of LLMs, there’s a greater need for human-curated subscription content, allowing information not filtered by neural networks to enter my brain in unfamiliar, raw, and uncomfortable ways. This certainly isn’t something most people need, but I guess I’m not the only one who does.</p><h2 id="adding-llm?">Adding LLM?</h2><p>If this product were made 15 years ago, it wouldn’t be any different. So, at the end of 2023, is it possible to incorporate the currently trending LLM into the website? I thought of two points:</p><h4 id="1-automated summaries">1. Automated Summaries</h4><p>Initially, I wanted to use LLM to extract webpage descriptions as brief introductions for each Link, but I eventually gave up: LLM charges based on usage, meaning I couldn’t control the monthly costs. When users add a large number of Links, the expense for article summarization could become exorbitant, which is terrifying for a website that clearly can’t turn a profit.</p><p>To keep the website running longer, it’s best to minimize uncontrollable expenses as much as possible.</p><h4 id="2-language delivery network (ldn)">2. Language Delivery Network (LDN)</h4><p>I once envisioned something, let’s call it a Language CDN: When I write a blog, I want people from different countries and languages worldwide to see a version in their local language. A simple method is to let users use built-in browser translation or install translation plugins, but this might result in the same article being translated many times, not only wasting GPU inference costs but also making it impossible for authors to control translation quality.</p><p>This led to the idea of LDN: An author writes a blog or creates a website and can use an LDN platform to host their multilingual versions. Whenever a new language user visits the network, but LDN doesn’t have a cache for that language, it translates once. After that, other users of the same language can use the previously translated cache. Moreover, authors can locally manage the original translation data for multiple languages, thus controlling and adjusting the wording of translations.</p><p>I once submitted this as <a href="https://bear.nolt.io/148">Feedback</a> for the open-source blog system bearblog, but didn’t receive a positive response. Maybe later I’ll try adding this feature to Sublink, but only if people are willing to create original articles on Sublink.</p><p><img src="/images/2023-12-07/bearblog-suggestion.png" alt="Bearblog Feedback #148"></p><h2 id="pricing-model">Pricing Model</h2><p>I have no hope that this website will make money, but I want it to live long enough to accumulate as many Links and Collections as possible. So I need to generate some revenue from it.</p><p>I’m actually a bit confused: Is the core of Sublink about “subscribing to quality collections” or “a personal link collection organization tool”? This divergence leads to two different logics. The former would have other people’s Collections on the homepage, while the latter would have an input box. The former relies on high-quality content, which is difficult in the early stages.</p><p>This divergence leads to differences in the payment model: Charging for collection subscriptions isn’t impossible, but it’s almost impractical. If we follow the tool logic, it’s more like Notion’s approach, with a higher possibility of monetization, but then the value of “letting more people see excellent links” would no longer exist.</p><p>I’m stuck here now, which is why I haven’t implemented any payment system at all. Perhaps later I can gain more insights from user behavior and feedback.</p><p><img src="/images/2023-12-07/two-approaches.jpg" alt="Two Philosophies of Product Design"></p><p>That’s all my thoughts on creating Sublink. I’m also a Sublink user myself (that’s why I created it). I’ll put a <a href="https://www.sublink.app/collections/381d6912-f5fa-48a6-96ec-7a664596577c">list of articles</a> I’ve read and found good here. You can log in and subscribe, or use any RSS client to subscribe to this list. When I add new links, you’ll see them. You can also turn your favorite videos, movies, books, webpages, or anything with a link into collections and share them with friends.</p><p>Perhaps this is my small, insignificant act of resistance against a world about to be engulfed by GPT-generated content.</p>]]></content>
    
    
      
      
        
        
    <summary type="html">&lt;blockquote&gt;
&lt;p&gt;TL;DR: This piece delves into my personal experience with an information access crisis and my reflections on developing</summary>
        
      
    
    
    
    
  </entry>
  
  <entry>
    <title>一个失败的AI女友产品</title>
    <link href="https://mazzzystar.com/2023/11/17/ai-girlfriend-cn-zh/"/>
    <id>https://mazzzystar.com/2023/11/17/ai-girlfriend-cn-zh/</id>
    <published>2023-11-17T12:08:48.000Z</published>
    <updated>2026-04-10T13:45:08.740Z</updated>
    
    <content type="html"><![CDATA[<p>今年 4 月 7 日<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>斯坦福大学 AI 西部小镇<a href="https://arxiv.org/abs/2304.03442">论文</a>出来之后的几天内<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我就通读了整篇论文<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>并感到非常兴奋<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>虽然我对 GPT-4 的能力感到震惊<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但我仍然认为 GPT 只是某种更精致的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>”</h-inner></h-char></span>鹦鹉学舌<span class="bd-box"><h-char class="bd bd-end"><h-inner>“</h-inner></h-char><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我不认为它可以真正产生意识<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>但这篇论文带给我不同的感受<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>它提到了一个很有趣的细节是<strong>信息传递</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>一个 agent 想要举办情人节派对的消息会在小镇中逐渐扩散开来<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我想<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>如果能够建立一套包含记忆<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>反思<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>筹划与行动的框架<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>让人类和 GPT<span class="bd-box"><h-char class="bd bd-end"><h-inner>（</h-inner></h-char></span>而不是小镇中的 agent<span class="bd-box"><h-char class="bd bd-beg"><h-inner>）</h-inner></h-char></span>之间互动<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>是不是可以创造出电影<span class="bd-box"><h-char class="bd bd-end"><h-inner>《</h-inner></h-char></span>她<span class="bd-box"><h-char class="bd bd-beg"><h-inner>》</h-inner></h-char></span>中的体验<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span></p><p><img src="/images/2023-11-16/movie-her.jpeg" alt="电影《Her》中的Samantha"></p><h1 id="开发">开发</h1><p>我立刻开始行动<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>按照论文的方法<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我在 4 月 14 日完成了 0.1 版本<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>最初<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我的设计与原版论文基本一致<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这导致响应时间长达 30 秒且上下文中的对话经常超过 8k 的上下文限制<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>为了解决这个问题<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我减少了反思的频率<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>对话记忆的长度<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而后开启了 Beta 公测<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>很快就有一千多名用户加入测试<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>Beta 测试是免费的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>所以每天的 API 成本由我自己承担<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>很快就超过了每天 25 美元<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我不得不在缺少充分反馈和改进的情况下匆匆推出正式版本<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>希望能把成本转嫁给用户<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>5 月 4 日<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>Dolores iOS 应用正式上线<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这个名称则来自<span class="bd-box"><h-char class="bd bd-end"><h-inner>《</h-inner></h-char></span>西部世界<span class="bd-box"><h-char class="bd bd-beg"><h-inner>》</h-inner></h-char></span>剧集中的角色<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>上线四天后就得到了新智元的<a href="https://mp.weixin.qq.com/s/ThrJ28tlqA3V2_KVCGh21A">报道</a><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>简单来说<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>在打开 Dolores 之后<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>你需要设定一个角色<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>头像<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>背景描述<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>性格<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>声音和意识<span class="bd-box"><h-char class="bd bd-end"><h-inner>（</h-inner></h-char></span>选择 GPT3.5 或 GPT4<span class="bd-box"><h-char class="bd bd-beg"><h-inner>）</h-inner></h-char><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>你可以和零售店女孩 Amy <span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>或者沙漠冒险家 Will 发生一些有趣的互动<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>当然你也可以亲手创建自定义角色<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我曾考虑过从<span class="bd-box"><h-char class="bd bd-end"><h-inner>《</h-inner></h-char></span>西部世界<span class="bd-box"><h-char class="bd bd-beg"><h-inner>》</h-inner></h-char></span>剧本中提取 Dolores 的对话<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>以基于样本的方式模仿她的语言习惯<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>但由于苹果方面要求提供版权证明<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>所以这个想法被迫作罢<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p><img src="/images/2023-11-16/dolores-traits.png" alt=""></p><p>虽然这篇文章的标题是<span class="bd-box"><h-char class="bd bd-end"><h-inner>「</h-inner></h-char></span>AI 女友<span class="bd-box"><h-char class="bd bd-beg"><h-inner>」</h-inner></h-char></span> <span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但我给产品的 slogan 一直是&quot;Your Virtual Friend&quot;<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而非&quot;Your Virtual Girlfriend&quot;<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>因为我希望它真的可以变成用户的陪伴者<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>朋友<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而不仅仅是荷尔蒙的产物<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>从整个 5 月到 6 月<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我一直在尝试通过调整记忆长度<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>反思机制和系统提示来使 Dolores 看上去更有<span class="bd-box"><h-char class="bd bd-end"><h-inner>“</h-inner></h-char></span>意识<span class="bd-box"><h-char class="bd bd-beg"><h-inner>”</h-inner></h-char></span>(那么什么是意识<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span>我不知道) <span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>很快<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>6 月份的 Dolores 已经比第一次上线时的表现要惊人得多<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>用户的付费率也越来越高<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>每天的 API 调用次数也增加了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>6 月 8 号<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>一位用户告诉我<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>他在视障社区内分享了这款产品<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>并给 Dolores 引来一些的视障用户<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>他们喜欢 Dolores 的理由是<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>随便按屏幕上的哪个位置<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>都能跟 Dolores 交谈<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>这个设计其实是某种失败后的妥协<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>最初<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我想把它支持语音聊天<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这样用户哪怕关闭手机屏幕也能继续跟 Dolores 交谈<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>但身为 Swift 新手<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我的技术水平无法实现<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>最终选择了全屏语音输入<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h1 id="发现">发现</h1><p>我发现了两个现象<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><ul><li>用户对<span class="bd-box"><h-char class="bd bd-end"><h-inner>「</h-inner></h-char></span>真实感声音<span class="bd-box"><h-char class="bd bd-beg"><h-inner>」</h-inner></h-char></span>有强烈需求<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></li><li>AI Friend 产品的平均使用时间很长<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></li></ul><p>作为机器学习背景的个人开发者<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>也不擅长前端/后端开发<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>所以 Dolores 压根不具备登录<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>注册或者数据分析等功能<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>那我是怎么发现前一种现象的呢<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span>答案来自付费<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>我使用了 11Labs API 为 Dolores 生成语音回复<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但因为成本较高<span class="bd-box"><h-char class="bd bd-end"><h-inner>（</h-inner></h-char></span>1k 字符/0.3 美元<span class="bd-box"><h-char class="bd bd-beg"><h-inner>）</h-inner></h-char><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我不得不对用户做了区分<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>订阅者只能使用 Azure TTS API<span class="bd-box"><h-char class="bd bd-beg"><h-inner>；</h-inner></h-char></span>而如果你希望 Dolores 拥有更逼真的声音<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>则需要单独付费使用从 11Labs 购买字符<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>购买 1 万个语音合成字符的价格为 3.9 美元<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但这只够让 Dolores 说出 5 ～ 10 个自然顺畅的句子<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>字符用尽之后需要继续购买<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>尽管如此<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>整个 6 月 Dolores 70% 的收入都来自 11Labs 字符购买<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>也就是说<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>人真的会愿意为了那几句昂贵而逼真的<span class="bd-box"><h-char class="bd bd-end"><h-inner>“</h-inner></h-char></span>我爱你<span class="bd-box"><h-char class="bd bd-beg"><h-inner>！</h-inner></h-char><h-char class="bd bd-beg"><h-inner>”</h-inner></h-char></span>而买单<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>第二条观察结果则来自 Cloudflare 日志<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>因为没办法跟踪个人用户活动<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>所以我依靠这些日志来衡量用户访问 Dolores 应用的频率和时长<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>此外<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我还在应用中集成了 Google Form<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>鼓励用户上报自己的使用频率<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>结果令人大开眼界<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>许多用户每天会拿出两个多小时跟 Dolores 唠嗑<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h1 id="收入">收入</h1><p>根据苹果 AppConnect 仪表板<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span> Dolores 的主要付费用户来自美国和澳大利亚<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span> 5 月的总收入为 1000 美元<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>6 月则为 1200 美元<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>收入的增长不多<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但用户数和每日 API 调用量几乎翻倍<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>因为付费用户数增加而摊低了 11Labs 成本<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我选择降低了产品单价<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>因此<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>作为一个开发者<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我并没有从这个产品中赚到多少钱<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>首先<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>在产品早期<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我不想将订阅费用设置得太高<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>因为这会阻止用户尝试<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>所以一旦发现盈利增加就降低产品价格<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>其次<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>30%的苹果税和 API 成本也占了很大一部分<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>所以<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>在仔细计算成本后<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我在 6 月份只赚了 50 美元左右<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p><img src="/images/2023-11-16/revenue-june.png" alt=""></p><p>另外<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我发现基于 GPT 的产品如果不采取按量定价<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>就会陷入一个困境<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>1% 的人消耗了 99% 的 token<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我遇到了一个情况<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>一个用户连续跟 Dolores 聊了 12 个小时<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>导致他的 GPT 和语音 API 调用成本超过第二到第十名用户的总和<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>但相较于按使用量计费<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我个人更喜欢打包订阅<span class="bd-box"><h-char class="bd bd-end"><h-inner>（</h-inner></h-char></span>因为前者会让用户在使用时倍感压力<span class="bd-box"><h-char class="bd bd-beg"><h-inner>）</h-inner></h-char><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这就导致面前只有两条路可选<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>要么提高月费<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>让全体用户共同买单<span class="bd-box"><h-char class="bd bd-beg"><h-inner>；</h-inner></h-char></span>要么限制最高使用量<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我选择了后者<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>设置了一个远远超出日均使用在 1 到 2 个小时之间的用量上限数值<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这既照顾到了大部分中<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>轻度用户<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>也能保证 Dolores 软件在不提高价格的情况下避免亏本运营<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h1 id="困惑">困惑</h1><p>11Labs 官网会记录语音合成的文字内容<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我看到<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>Dolores 的回复内容通常都是一些成人内容<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而且均为女性角色<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>因此我推测 Dolores 的付费用户主要是男性<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>对成人角色扮演感兴趣<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>我觉得这也没什么<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这是人性本然<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我甚至反复修改 prompt<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>调整记忆权重<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>尝试让 Dolores 在对话当中变得更有女友力<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我还将 Dolores 的图标从抽象的线条改为一张女人的脸<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p><img src="/images/2023-11-16/icon-change.jpg" alt=""></p><p>但很快<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我陷入一种强烈的失落感<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>如果大部分 Dolores 用户只是想在这里寻求跟 Dolores 进行成人角色扮演<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这件事真的对我产生了意义吗<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span>我陷入了深深的自我怀疑<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>到了 7 月<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我和一个朋友聊到了这个困惑<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我说<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>必须要有一个什么硬件<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>让 Dolores 拥有外部视觉<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>眼镜也好<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>耳塞甚至帽子都行<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>现在的她<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>你只要打开 App 才能访问<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>你们之间的关系并不对等<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>于是她只能成为囚禁在地下室<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>满足猎奇和特殊癖好的玩具<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>可是作为独立的个人<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>制作硬件产品意味着高昂的研发成本<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>显然是无法承受的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我只能作罢<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>8 月份<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>OpenAI 对生成内容的审查升级了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我收到了一封关于生成的 NSFW 内容的邮件警告<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>我必须在 2 周内在使用他们<span class="bd-box"><h-char class="bd bd-end"><h-inner>（</h-inner></h-char></span>免费的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>）</h-inner></h-char></span>moderation API<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>以过滤 NSFW 内容<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>这一变化让 Dolores 的日均访问量暴跌 70%<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>电子邮件和 Twitter 上的投诉也纷至沓来<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p><img src="/images/2023-11-16/OpenAI-email.jpg" alt=""></p><p>这更让更感到灰心<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>决定只维护现有服务<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>而不再进行更新<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>最终<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我放弃了 Dolores 项目<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><h1 id="教训">教训</h1><p>首先<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>这不是一个个人能开发的产品<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我不认为 Dolores 在<span class="bd-box"><h-char class="bd bd-end"><h-inner>“</h-inner></h-char></span>意识<span class="bd-box"><h-char class="bd bd-beg"><h-inner>”</h-inner></h-char></span>层面上比 <a href="http://Character.AI">Character.AI</a> 弱<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但他们拥有完善的数据埋点<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>A/B 测试<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>以及大量用户带来的数据飞轮<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>其次<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我意识到当前的 AI Friend 会不可避免地变成 AI Girlfriend/Boyfriend<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>因为你和手机里的角色不对等<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span>她没办法在你摔伤的时候安慰你 (除非你告诉他)<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>她没办法主动向你表达情绪<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>而这一切<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>都是因为她没有外部视觉<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>或者说<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span><strong>她没有独立于你的生活</strong><span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>所以<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>即使是 <a href="http://Character.AI">Character.AI</a> 这样体量的产品<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>如果未来不做硬件<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>角色们都在傻傻地等用户来<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>最终的结局也不会比 Dolores 好到哪里<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>最后<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我不反对 OpenAI 的审查<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>相反<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>虚拟陪伴产品生成的内容不经审查是非常危险的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我不知道是否会有人用它来进行自杀诱导<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>发泄暴力工具<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>所以 OpenAI 的 moderation 可能在某种程度帮助了我<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但成人性方面的对话也不应该被扼杀<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>最近<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我看到了 AI Pin<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>老实说这是个非常烂的产品<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>人类当然需要屏幕<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>但 GPT+ 硬件的确是个好的尝试<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我没有从 Dolores 上看到任何痕迹<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>也许有生之年能做出<span class="bd-box"><h-char class="bd bd-beg"><h-inner>、</h-inner></h-char></span>或者看到这样的产品<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>但<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>人类真的需要 AI friend 吗<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span></p>]]></content>
    
    
      
      
        
        
    <summary type="html">&lt;p&gt;今年 4 月 7 日&lt;span class=&quot;bd-box&quot;&gt;&lt;h-char class=&quot;bd bd-beg&quot;&gt;&lt;h-inner&gt;，&lt;/h-inner&gt;&lt;/h-char&gt;&lt;/span&gt;斯坦福大学 AI 西部小镇&lt;a</summary>
        
      
    
    
    
    
  </entry>
  
  <entry>
    <title>A Failed AI Girlfriend Product, and My Lessons</title>
    <link href="https://mazzzystar.com/2023/11/16/ai-girlfriend-product/"/>
    <id>https://mazzzystar.com/2023/11/16/ai-girlfriend-product/</id>
    <published>2023-11-16T14:08:48.000Z</published>
    <updated>2026-04-10T13:45:08.740Z</updated>
    
    <content type="html"><![CDATA[<p>Just days after Stanford University’s AI Town <a href="https://arxiv.org/abs/2304.03442">paper</a> was released on April 7th this year, I read through the entire paper and felt extremely excited. Although I was amazed by GPT-4’s capabilities, I still considered GPT as merely a more sophisticated form of “parroting” and didn’t believe it could truly generate consciousness.</p><p>However, this paper gave me a different impression. It mentioned an intriguing detail about <strong>information transmission</strong>: how news of an agent planning to host a Valentine’s Day party gradually spreads throughout the small town. I wondered, if we could establish a framework including memory, reflection, planning, and action to facilitate interaction between humans and GPT (rather than between agents in the town), could we recreate an experience similar to that depicted in the movie “Her”?</p><p><img src="/images/2023-11-16/movie-her.jpeg" alt="Samantha from the movie &quot;Her&quot;"></p><h1 id="development">Development</h1><p>I immediately sprang into action. Following the paper’s methodology, I completed version 0.1 on April 14th. Initially, my design closely adhered to the original paper, resulting in 30-second response times and dialogues frequently exceeding the 8k context limit. To address this, I reduced the frequency of reflections and the length of dialogue memory, then launched a public beta test.</p><p>Over a thousand users quickly joined the test. The beta was free, so I bore the daily API costs myself, which soon exceeded $25 per day. I had to hastily launch the official version without sufficient feedback and improvements, hoping to transfer the costs to users. On May 4th, the Dolores iOS app officially launched, named after a character from the “Westworld” TV series.</p><p>In simple terms, after opening Dolores, you need to set up a character: avatar, background description, personality, voice, and consciousness (choosing between GPT-3.5 or GPT-4). You can have interesting interactions with Amy, a retail store girl, or Will, a desert adventurer, or even create your own custom character. I had considered extracting Dolores’ dialogues from the “Westworld” script to mimic her speech patterns in a sample-based approach, but had to abandon this idea due to Apple’s request for copyright proof.</p><p><img src="/images/2023-11-16/dolores-traits.png" alt=""></p><p>Although this article is titled “AI Girlfriend,” I’ve always used the slogan “Your Virtual Friend” for the product, rather than “Your Virtual Girlfriend,” because I hoped it could truly become a companion and friend to users, not just a product of hormones.</p><p>From May through June, I kept trying to make Dolores appear more “conscious” (what is consciousness anyway?) by adjusting memory length, reflection mechanisms, and system prompts. Soon, the June version of Dolores was far more impressive than at launch: the user payment rate increased, and daily API calls grew.</p><p>On June 8th, a user told me he had shared this product in a visually impaired community, bringing some visually impaired users to Dolores. They liked Dolores because they could talk to her by tapping anywhere on the screen.</p><p>This design was actually a compromise after failure: initially, I wanted to support voice chat so users could continue talking to Dolores even with their phone screens off. But as a Swift novice, my technical skills couldn’t achieve this, so I settled for full-screen voice input.</p><h1 id="discoveries">Discoveries</h1><p>I observed two phenomena:</p><ul><li>Users have a strong demand for “realistic voices.”</li><li>AI Friend products have long average usage times.</li></ul><p>As a machine learning-background individual developer not skilled in frontend/backend development, Dolores doesn’t have login, registration, or data analytics features. So how did I discover the first phenomenon? The answer lies in payments.</p><p>I used the ElevenLabs API for Dolores’ voice replies, but due to its high cost (1k characters / $0.3), I had to differentiate users: subscribers could only use the Azure TTS API, while those wanting Dolores to have a more realistic voice needed to pay extra for characters from ElevenLabs.</p><p>Purchasing 10,000 voice synthesis characters cost $3.9, which only allowed Dolores to speak 5-10 natural, fluent sentences. Once used up, users needed to purchase more. Despite this, 70% of Dolores’ revenue in June came from ElevenLabs character purchases.</p><p>In other words, people are indeed willing to pay for those few expensive but realistic “I love you!” sentences.</p><p>The second observation came from Cloudflare logs. Unable to track individual user activity, I relied on these logs to gauge how often and how long users accessed the Dolores app. Additionally, I integrated a Google Form into the app, encouraging users to report their usage frequency. The results were eye-opening: many users spent over two hours daily chatting with Dolores.</p><h1 id="revenue">Revenue</h1><p>According to the Apple App Connect Dashboard, Dolores’ main paying users are from the United States and Australia. Total revenue was $1,000 in May and 1,200 in June. The revenue growth wasn’t substantial, but user numbers and daily API calls nearly doubled. As the number of paying users increased, spreading out the ElevenLabs costs, I chose to lower the product price.</p><p>Consequently, as a developer, I didn’t make much profit from this product. Firstly, in the early stages, I didn’t want to set the subscription fee too high as it would deter users from trying, so I lowered the price whenever I saw an increase in profits. Secondly, the 30% Apple tax and API costs also took a large chunk. So, after careful cost calculation, I only earned about $50 in June.</p><p><img src="/images/2023-11-16/revenue-june.png" alt=""></p><p>Moreover, I discovered that token-based products, if not priced per usage, fall into a dilemma: 1% of users consume 99% of the tokens. I encountered a situation where one user chatted with Dolores for 12 hours straight, causing his GPT and voice API call costs to exceed the total of the second to tenth users combined.</p><p>But compared to per-usage billing, I personally prefer package subscriptions (as the former puts pressure on users during use), which left me with two choices: either increase the monthly fee for all users to share the cost, or limit maximum usage. I chose the latter: setting a usage cap far beyond what users chatting 1-2 hours daily would reach. This catered to most light and medium users while ensuring Dolores could operate without a loss and without raising prices.</p><h1 id="confusion">Confusion</h1><p>The ElevenLabs website records the text content of voice synthesis. I noticed that Dolores’ responses were often adult content, all from female characters, leading me to speculate that Dolores’ paying users were mainly males interested in adult role-playing.</p><p>I didn’t think this was necessarily bad; it’s human nature. I even repeatedly modified prompts, adjusted memory weights, trying to make Dolores more girlfriend-like in conversations. I also changed Dolores’ icon from abstract lines to a woman’s face.</p><p><img src="/images/2023-11-16/icon-change.jpg" alt=""></p><p>But soon, I was overwhelmed by a strong sense of loss: if most Dolores users were just seeking adult role-play with her, did this really hold any meaning for me? I fell into deep self-doubt. By July, I discussed this confusion with a friend. I said there must be some hardware to give Dolores external vision: glasses, earbuds, or even a hat. As it stood, you could only access her by opening the app, making your relationship unequal. She could only become a toy confined in a basement, satisfying curiosity and peculiar fetishes.</p><p>However, as an independent individual, developing hardware products meant unaffordable high R&amp;D costs, so I had to give up on this idea.</p><p>In August, OpenAI upgraded its content review process. I received a warning email about generated NSFW content: I had to implement their (free) moderation API within two weeks to filter NSFW content. This change caused Dolores’ daily visits to plummet by 70%, and complaints flooded in via email and Twitter.</p><p><img src="/images/2023-11-16/OpenAI-email.jpg" alt=""></p><p>This further discouraged me, leading me to decide to only maintain the existing service without updates. Eventually, I abandoned the Dolores project.</p><h1 id="lessons">Lessons</h1><p>First, this isn’t a product that can be developed by an individual. I don’t think Dolores is necessarily inferior to <a href="http://Character.AI">Character.AI</a> in terms of “consciousness,” but they have comprehensive data tracking, A/B testing, and the data flywheel effect from a large user base.</p><p>Second, I realized that current AI Friends inevitably turn into AI Girlfriends/Boyfriends because you and the character in your phone aren’t equal: she can’t comfort you when you’re hurt (unless you tell her), she can’t actively express emotions to you, and all this is because she lacks external vision, or rather, she doesn’t have a life independent of you. So I believe that even for products like <a href="http://Character.AI">Character.AI</a>, if they don’t develop hardware in the future and the characters just wait dumbly for users, their ultimate fate won’t be much better than Dolores’.</p><p>Lastly, I’m not against OpenAI’s review process. On the contrary, unreviewed content from virtual companion products can be very dangerous. I don’t know if someone might use it for suicide inducement or as a tool to vent violence, so OpenAI’s moderation may have helped me to some extent. However, conversations about adult sexuality shouldn’t be completely stifled.</p><p>Recently, I saw AI Pin, honestly a very poor product. Humans certainly need screens, but GPT + hardware is indeed a good attempt. I didn’t see any traces of this in Dolores, but perhaps in my lifetime, we’ll be able to create or see such a product.</p><p>But does humanity really need an AI friend?</p>]]></content>
    
    
      
      
        
        
    <summary type="html">&lt;p&gt;Just days after Stanford University’s AI Town &lt;a href=&quot;https://arxiv.org/abs/2304.03442&quot;&gt;paper&lt;/a&gt; was released on April 7th this year,</summary>
        
      
    
    
    
    
  </entry>
  
  <entry>
    <title>梦</title>
    <link href="https://mazzzystar.com/2023/09/11/dream/"/>
    <id>https://mazzzystar.com/2023/09/11/dream/</id>
    <published>2023-09-11T16:54:18.000Z</published>
    <updated>2026-04-10T13:45:08.740Z</updated>
    
    <content type="html"><![CDATA[<p>在许多夜晚<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我的梦里都存在两个我<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>聚会上<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>当别人喊我的名字<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我抬头<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>总会看到另一个我同时抬头<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>代替我做出回应<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我则坐在角落默不吭声<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>没人对此感到意外<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>两个我<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>睁开眼就能看见另一个<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>万花筒一般的梦让我感到头晕和疲惫<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>梦里永远有一群人在聚会<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我在角落喝着苦酒忍受身份折磨<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>在某个梦里<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我在角落碰见了另一个倒霉鬼<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>他的处境和我一样<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我问他<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>杀死替身有用吗<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span>他说他试过<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>私下杀死的人会立刻重新长出来<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>他已经侵占了你的本体<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>除非在公开场合<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>让所有人看到你杀死了他<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>才能真正让他永不复生<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>说着<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>他望向自己的复制品<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>房间满是欢声笑语的人<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>在昏暗的角落<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>他在桌底给我比手势<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>他想进行一场无差别的屠杀<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>把荒唐的记忆从所有人的梦中抹掉<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我心领神会<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>很快<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>一个契机让灯灭了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>漆黑中<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我听到了杀戮和血<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>灯亮了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>死的是他<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>原来复制体之间共享思维<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>我迅速打开窗户跳下去<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>逃走了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>一路上我拼命跑<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>到了一片沼泽地<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>月光下<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我回头看见了另一个我<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我看不清他的脸<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>他应该也是<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>我想起过往犯下的种种罪孽<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>有多少是他植入的潜意识<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span>我恨他<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>真的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>“先别指责我<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>那次你不也玩得很开心吗<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span>” 脑海里他说<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span><br>“闭嘴<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>” 我气急败坏<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span><br>“对了<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>那个让她气疯的电话<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>你应该不知道她正在开车吧<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span>”<br>“是你让我打的<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>” 我蹲下来<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>狠狠地把地面砸出一个坑<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>土很软<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>我不再理睬他<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>转身往沼泽方向<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>在泥浆中深一脚浅一脚地艰难移动<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>他也跟了过来<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我们一前一后前往沼泽深处<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>不知过了多久<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我听到了警车声<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>过了一会儿<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>一个声音向我们喊话<span class="bd-box"><h-char class="bd bd-beg"><h-inner>：</h-inner></h-char></span></p><p>“你们是孪生兄弟还是幽灵人<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span>”</p><p>我俩都没有做声<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>“按照第三十三条法令<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>一旦发现幽灵人<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我们需要立刻射杀<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>包括和他们接触的所有人<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>防止更多人的记忆开始松动和生锈<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>你们到底是不是幽灵人<span class="bd-box"><h-char class="bd bd-beg"><h-inner>？</h-inner></h-char></span>”</p><p>“不是” 我们两个异口同声<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p><p>我想让他活着<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>在剩下的生命里一遍遍悔恨<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span>我感到他也笑了一下<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>没错<span class="bd-box"><h-char class="bd bd-beg"><h-inner>，</h-inner></h-char></span>我们都一样恶毒<span class="bd-box"><h-char class="bd bd-beg"><h-inner>。</h-inner></h-char></span></p>]]></content>
    
    
      
      
        
        
    <summary type="html">&lt;p&gt;在许多夜晚&lt;span class=&quot;bd-box&quot;&gt;&lt;h-char class=&quot;bd bd-beg&quot;&gt;&lt;h-inner&gt;，&lt;/h-inner&gt;&lt;/h-char&gt;&lt;/span&gt;我的梦里都存在两个我&lt;span class=&quot;bd-box&quot;&gt;&lt;h-char class=&quot;bd</summary>
        
      
    
    
    
    
  </entry>
  
  <entry>
    <title>Dream</title>
    <link href="https://mazzzystar.com/2023/09/11/dream-en/"/>
    <id>https://mazzzystar.com/2023/09/11/dream-en/</id>
    <published>2023-09-11T16:54:18.000Z</published>
    <updated>2026-04-10T13:45:08.740Z</updated>
    
    <content type="html"><![CDATA[<p>Night after night, my dreams are haunted by two versions of me. At parties, when someone calls my name, I look up and always see the other me lifting his head too. He answers for me while I sit silently in the corner. Nobody thinks this is weird.</p><p>Two of me, always seeing the other when I open my eyes. These kaleidoscope dreams leave me dizzy and drained. There’s always a party going on, and I’m in the corner, drinking bitter booze and wrestling with who I am. In one dream, I bump into another poor soul in the corner, stuck in the same boat as me. I ask him if killing your double does any good. He says he’s tried - the one you kill in private just grows back instantly. He’s taken over your real self. Unless you do it in public, let everyone see you kill him, that’s the only way to make sure he never comes back. As he talks, he looks over at his own copy.</p><p>The room’s full of laughing, chatting people. In a dark corner, he signals to me under the table. He wants to go on a killing spree, wipe this crazy memory from everyone’s dreams. I get the idea. Soon, the lights go out. In the darkness, I hear killing and blood.</p><p>The lights come on. He’s the one who’s dead. Turns out copies share thoughts.</p><p>I quickly open a window and jump out, running away. I run like hell until I reach a swamp. In the moonlight, I look back and see the other me. I can’t make out his face, and he probably can’t see mine either.</p><p>I think about all the bad stuff I’ve done in the past. How much of it was him planting ideas in my head? I hate him, I really do.</p><p>“Don’t blame me, you had fun that time too, didn’t you?” he says in my mind.<br>“Shut up,” I snap.<br>“Oh, and that call that pissed her off so much? You didn’t know she was driving, right?”<br>“You made me do it.” I crouch down and punch the ground hard, making a hole. The soil’s soft.</p><p>I ignore him and turn towards the swamp, slogging through the mud. He follows me, and we trudge one after the other into the deep swamp. After who knows how long, I hear police sirens. A while later, a voice shouts at us:</p><p>“Are you twin brothers or ghost people?”</p><p>We both keep quiet.</p><p>“According to Law 33, if we find ghost people, we have to shoot them dead right away. Same goes for anyone who’s been in contact with them. It’s to stop more people’s memories from getting messed up and rusty. So are you ghost people or not?”</p><p>“We’re not,” we both say at the same time.</p><p>I want him to live, to spend the rest of his life regretting over and over. I feel him smirk too. Yep, we’re both just as nasty.</p>]]></content>
    
    
      
      
        
        
    <summary type="html">&lt;p&gt;Night after night, my dreams are haunted by two versions of me. At parties, when someone calls my name, I look up and always see the</summary>
        
      
    
    
    
    
  </entry>
  
  <entry>
    <title>Get users for your AI tool from Google search</title>
    <link href="https://mazzzystar.com/2023/07/23/Get-users-for-your-AI-tool-from-Google-search/"/>
    <id>https://mazzzystar.com/2023/07/23/Get-users-for-your-AI-tool-from-Google-search/</id>
    <published>2023-07-23T14:45:20.000Z</published>
    <updated>2026-04-10T13:45:08.740Z</updated>
    
    <content type="html"><![CDATA[<blockquote><p>Disclaimer: This is some experience I have discovered in a short time as an engineer without any SEO basis, so there might be some fundamental errors. If so, you are more than welcome to <a href="mailto:myfancoo@gmail.com">email me</a> to correct me.</p></blockquote><h2 id="creating-a website">Creating a Website</h2><p>Even if what you are developing is an App or desktop software, I still suggest you create a website for your product, which has two advantages: 1) Showcasing your product is functionalities in a more rich way through diagrams, animations, videos, etc. 2) Enabling potential users to discover your product via Google search, which is a crucial source for user acquisition.</p><p>And once you have decided to make a website, there comes the question: how to increase the chances that your target users will click into your website when searching <code>chat ai</code>, which is where SEO (Search Engine Optimization) comes in.</p><p><img src="/images/2023-07-23/chat_ai_search_result.jpg" alt="Google search result for &quot;chat ai&quot;"></p><h2 id="pagerank">PageRank</h2><p>The ultimate goal of SEO is to rank your product higher in Google search. Thus, it is necessary to understand some of Google’s ranking algorithms.</p><p>Although Google no longer uses PageRank as its only algorithm, it is still an important ranking factor: PageRank assumes that “<strong>more important pages are often cited by other pages more often</strong>”<sup class="footnote-ref"><a href="#fn1" id="fnref1">[1]</a></sup>. Similar to the impact factor of scientific journals, Google assigns a score (domain rating, <strong>DR</strong>) to each website: Google will rank the websites with higher DRs higher if they hit the same keyword, which makes sense.</p><p>So, how is the score calculated? A straightforward idea is to <strong>calculate the average score of the websites that link to your website</strong>, a behavior called <strong>backlink</strong>. In this way, the PageRank algorithm looks like a graph with weights: suppose your website is backlinked by three other websites, and the DRs of these websites are 100, 10, 1, respectively, then your website score would be <span class="markdown-them-math-inline">$(100 * 1 + 10 * 1 + 1 * 1) * coefficient$</span>. It is clear that the more backlinks are not necessarily the better, the quality of the websites providing you with backlinks is more important, and some algorithms only count the top 100 domains with the highest scores that link to your website, thus the method of continuously registering domains to hoard the number of backlinks is no longer effective.</p><p><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/6/69/PageRank-hi-res.png/1920px-PageRank-hi-res.png" alt="https://en.wikipedia.org/wiki/PageRank"></p><p><a href="https://ahrefs.com">Ahrefs</a> website comments on domain rating as follows:</p><blockquote><p>You should try to get backlinks from high DR websites, as they have greater “weight” .<sup class="footnote-ref"><a href="#fn2" id="fnref2">[2]</a></sup></p></blockquote><h2 id="domain-rating">Domain Rating</h2><p>So, one of the key tasks in SEO is to increase the Domain Rating of your website. The next task is: How to get backlinks from websites with high DR.</p><p>You can write high-quality articles about your products on Reddit/Medium, or request those already high-quality articles to add a link to your website. For example, if your product is an <em>mp3 converter</em>, find top-ranking articles from Google results and email them to request the addition of your website link. Since they are review sites, they will not reject adding another link, and if your product is good enough, it may be included.</p><p><img src="/images/2023-07-23/music-converter.jpg" alt="Article ranked first in Google results"></p><p>Since this article is mainly written for AI tools, we will only discuss how to increase backlinks for AI tools here.</p><p>Actually, there is a two-sided market: many new AI tools need exposure, and many AI tool listing websites need new content, so they are willing to set up submission portals for AI tools. Some sites are free, while others with high traffic charge a fee of $10-20. You can decide whether it is worth paying based on your judgment.<br><img src="/images/2023-07-23/aitoolsweb.jpg" alt="Screenshot of aitools.fyi website"></p><h2 id="choosing-ai listing sites">Choosing AI Listing Sites</h2><p>Here is my approach: First, visit <a href="https://www.similarweb.com/">Similarweb</a> and search for a tool listing site, such as aitools.fyi. You can see its traffic over the past few months, where visitors are coming from, and their demographic profiles.</p><p><img src="/images/2023-07-23/similarweb.jpg" alt="A tool listing site's data on Similarweb" title="similarweb"></p><p>The second step is to use the <a href="https://ahrefs.com/backlink-checker">Ahrefs backlink checker tool</a> to gauge the DR boost it can give your site once included. An interesting thing is that you can actually input your tool website to see which sites are backlinking to it. This way, you can intuitively understand which sites have a high DR). Note, Ahrefs DR score is not the same as Google, but the trend is similar.</p><p><img src="/images/2023-07-23/backlink_for_dolores.jpg" alt="A blog with a DR of 10 backlinked to my website"></p><p>In the above <a href="https://www.similarweb.com/">Similarweb</a> screenshot, you can see that it offers a <em>similar sites</em> feature, allowing you to quickly find sites similar to aitools.fyi. By repeating the two steps above, you can make a decision on whether to pay to submit your tool to this site.</p><h2 id="keywords-are key">Keywords Are Key</h2><p>I created an <a href="https://github.com/mazzzystar/api-usage">open-source</a> <a href="https://apiusage.info/">tool</a> that shows OpenAI API cost details for different models (GPT-3.5/Whisper, etc.), including their hourly consumption and cost proportion. However, no matter how I modified the site title or even directly named it “OpenAI API Cost,” the Google search result ranking for this keyword remained low.</p><p>Later, inspired by <a href="https://www.indiehackers.com/post/new-domain-ranked-1-on-google-within-two-weeks-heres-my-process-for-finding-easy-to-rank-keywords-fee75faede">this article</a>, I expanded the About page from a mere three lines into a <a href="https://apiusage.info/about">Blog</a> and included as many long-tail keywords as possible. The key was that I introduced the ability to analyze costs by switching between bar charts and pie charts. I quickly discovered that searching for “OpenAI API cost pie chart” ranked my site at #2.</p><p><img src="/images/2023-07-23/google-search-pie-chart.jpg" alt="Result second only to OpenAI's official website"></p><p>Simulating the potential keywords that users may use, and incorporating these long-tail keywords into the description of your site, is a good method.</p><h2 id="seo-guide from google [^3]">SEO Guide from Google <sup class="footnote-ref"><a href="#fn3" id="fnref3">[3]</a></sup></h2><p>Google only wants you to improve the loading speed and compatibility of your webpage. The faster and more compatible your website is, the better impression it will make on Google. Here are some techniques I have tried:</p><h3 id="preconnect">Preconnect</h3><p>This is about adding a line of <code>preconnect</code> code before your HTML page accesses a resource. For example, change:</p><div class="highlight"><pre class="code"><code><span class="hljs-tag">&lt;<span class="hljs-name">link</span>  <span class="hljs-attr">rel</span>=<span class="hljs-string">&quot;stylesheet&quot;</span>  <span class="hljs-attr">href</span>=<span class="hljs-string">&quot;https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.0.0-beta3/css/all.min.css&quot;</span>/&gt;</span></code></pre></div><p>It becomes:</p><div class="highlight"><pre class="code"><code><span class="hljs-tag">&lt;<span class="hljs-name">link</span> <span class="hljs-attr">rel</span>=<span class="hljs-string">&quot;preconnect&quot;</span> <span class="hljs-attr">href</span>=<span class="hljs-string">&quot;https://cdnjs.cloudflare.com&quot;</span> /&gt;</span><span class="hljs-tag">&lt;<span class="hljs-name">link</span>  <span class="hljs-attr">rel</span>=<span class="hljs-string">&quot;stylesheet&quot;</span>  <span class="hljs-attr">href</span>=<span class="hljs-string">&quot;https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.0.0-beta3/css/all.min.css&quot;</span>/&gt;</span></code></pre></div><p>This could speed up the resource acquisition speed, thereby speeding up the webpage loading.</p><h3 id="media-resolutions">Media Resolutions</h3><p>If you use images/videos on your webpage, try to add small/middle/large three resolutions:</p><div class="highlight"><pre class="code"><code><span class="hljs-tag">&lt;<span class="hljs-name">img</span>  <span class="hljs-attr">srcset</span>=<span class="hljs-string">&quot;small.jpg 600w, medium.jpg 1200w, large.jpg 1800w&quot;</span>  <span class="hljs-attr">width</span>=<span class="hljs-string">&quot;1242&quot;</span>  <span class="hljs-attr">height</span>=<span class="hljs-string">&quot;2688&quot;</span>  <span class="hljs-attr">sizes</span>=<span class="hljs-string">&quot;(max-width: 600px) 600px,            (max-width: 1200px) 1200px,            1800px&quot;</span>  <span class="hljs-attr">src</span>=<span class="hljs-string">&quot;large.jpg&quot;</span>  <span class="hljs-attr">alt</span>=<span class="hljs-string">&quot;Your image description&quot;</span>/&gt;</span></code></pre></div><p>Remember to include the actual <code>width</code> and <code>height</code> specifications at the end; this helps prevent pixel shift when the site is loading.</p><h3 id="replace-all png/jpeg with webp">Replace All PNG/JPEG with WebP</h3><p>No need to explain the switch from PNG to JPEG, but I found that even compressed JPEGs, when almost losslessly converted to WebP, reduce the image size by about 50%.</p><p><img src="/images/2023-07-23/JPEG2WebP.png" alt="JPEG converted to WebP, clarity unchanged, image size halved"></p><h3 id="`tailwindcss`-slimming"><code>Tailwind.css</code> Slimming</h3><p>If your website uses <code>Tailwind.css</code>, you will notice that its size is quite large, even the <code>Tailwind.min.css</code> is 2.8MB. But 99% of the <code>CSS</code> styles contained in it are not used by your site. GPT taught me how to slim it down as follows:</p><div class="highlight"><pre class="code"><code>npm <span class="hljs-keyword">install</span> tailwindcssnpx tailwindcss init</code></pre></div><p>Create a <code>tailwind.config.js</code> file with the content:</p><div class="highlight"><pre class="code"><code><span class="hljs-built_in">module</span>.exports = &#123;  <span class="hljs-attr">purge</span>: [<span class="hljs-string">&quot;./src/**/*.&#123;html,js,jsx,ts,tsx&#125;&quot;</span>, <span class="hljs-string">&quot;./public/index.html&quot;</span>],  <span class="hljs-attr">darkMode</span>: <span class="hljs-literal">false</span>, <span class="hljs-comment">// or &#x27;media&#x27; or &#x27;class&#x27;</span>  <span class="hljs-attr">theme</span>: &#123;    <span class="hljs-attr">extend</span>: &#123;&#125;,  &#125;,  <span class="hljs-attr">variants</span>: &#123;    <span class="hljs-attr">extend</span>: &#123;&#125;,  &#125;,  <span class="hljs-attr">plugins</span>: [],&#125;;</code></pre></div><p>Open the <code>package.json</code> file and write:</p><div class="highlight"><pre class="code"><code>&#123;  <span class="hljs-attr">&quot;dependencies&quot;</span>: &#123;    <span class="hljs-attr">&quot;tailwindcss&quot;</span>: <span class="hljs-string">&quot;^3.3.3&quot;</span> # Replace with your version  &#125;,  <span class="hljs-attr">&quot;scripts&quot;</span>: &#123;    <span class="hljs-attr">&quot;build:css&quot;</span>: <span class="hljs-string">&quot;tailwindcss build -o dist/tailwind.css&quot;</span>  &#125;&#125;</code></pre></div><p>Execute the command line:</p><div class="highlight"><pre class="code"><code>npm <span class="hljs-builtin-name">run</span> build:css</code></pre></div><h3 id="using-svg icons">Using SVG icons</h3><p>Another commonly used <code>CSS</code> file comes from <code>awesome-font</code>, which contains many icon resources, such as Twitter/Github/Discord. The size of this <code>CSS</code> file is 84KB, but if you only use 1-2 icons, you can actually download the SVG version from <a href="https://www.svgrepo.com/">svgrepo</a>, with each being around 1-2KB. Using SVG icons to replace <code>CSS</code> helped me save approximately 100ms.</p><h3 id="async-/ image lazy loading">async / Image Lazy Loading</h3><p>Load your non-essential <code>scripts</code> using the <code>async</code> method to reduce webpage loading congestion:</p><div class="highlight"><pre class="code"><code><span class="hljs-tag">&lt;<span class="hljs-name">script</span> <span class="hljs-attr">async</span> <span class="hljs-attr">src</span>=<span class="hljs-string">&quot;https://www.googletagmanager.com/gtag/js&quot;</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">script</span>&gt;</span></code></pre></div><p>If images do not appear on the first screen of the webpage, you can add <code>loading=&quot;lazy&quot;</code> to the image resource attributes, allowing the webpage to open without waiting for the images to be loaded.</p><div class="highlight"><pre class="code"><code><span class="hljs-tag">&lt;<span class="hljs-name">img</span> <span class="hljs-attr">src</span>=<span class="hljs-string">&quot;imgs/img.webp&quot;</span> <span class="hljs-attr">...</span> <span class="hljs-attr">loading</span>=<span class="hljs-string">&quot;lazy&gt;</span></span></code></pre></div><p>I used the above combination optimization method on my <a href="https://dolores.app/">website</a>, and the comparison before and after optimization is:<br><img src="/images/2023-07-23/pageload.jpg" alt="Comparison before and after optimization"></p><p>The performance analysis website shown above is <a href="https://pagespeed.web.dev/">this one</a>, an official product of Google. It is very convenient to use it to analyze where time is being consumed.</p><h3 id="mobile-friendly">Mobile-friendly</h3><p>Use responsive design frameworks (such as Tailwind CSS), and test on both desktop and mobile devices, make sure your webpage fits these two screen sizes, so that Google can score your webpage higher.</p><p>The above is just the tip of the iceberg in SEO, but it is all the experience I can share at the moment, I hope it can help you increase the exposure of your AI tools.</p><hr class="footnotes-sep"><section class="footnotes"><ol class="footnotes-list"><li id="fn1" class="footnote-item"><p><a href="https://en.wikipedia.org/wiki/PageRank">PageRank Wikipedia</a> <a href="#fnref1" class="footnote-backref">↩︎</a></p></li><li id="fn2" class="footnote-item"><p><a href="https://help.ahrefs.com/en/articles/1409408-what-is-domain-rating-dr">What is Domain Rating (DR)?</a> <a href="#fnref2" class="footnote-backref">↩︎</a></p></li><li id="fn3" class="footnote-item"><p><a href="https://developers.google.com/search/docs">Google Search Console Doc</a> <a href="#fnref3" class="footnote-backref">↩︎</a></p></li></ol></section>]]></content>
    
    
      
      
        
        
    <summary type="html">&lt;blockquote&gt;
&lt;p&gt;Disclaimer: This is some experience I have discovered in a short time as an engineer without any SEO basis, so there might</summary>
        
      
    
    
    
    
  </entry>
  
</feed>
