TL;DR

现在的AI真正给我带来了什么

2024-10-30T05:36:15.000Z

时至 2024 年底，我已经完全厌倦了每天 AI 又颠覆了哪个行业的耸人听闻的消息，我想说说，从我的视角，AI 给我、和我看到真实的人，所带来的真正改变。

首先是一个拥有无尽知识的老师。人类历史上第一次拥有了一个百科全书式、全知全能的角色，每一个人都可以与之对谈，这个角色在过去被称作上帝。假如我们把知识分解成下面这个公式：知识 = 好奇心 * 学习成本，那么 ChatGPT 把学习成本降低了一个数量级。我记得 2013 年我还在上大学，在宿舍里我经常用百度知道回答问题，最后赚了几百个积分，这些积分转头又被我用于百度文库下载 xx 申请书模板，现在这些都不需要了。

这带来的后果就是，人类之间不再互相提问了，我不确定这是不是一件好事， StackOverflow 在 ChatGPT 发布半年后流量下降了 50%。知乎/Quora 类的产品也许能靠避开知识问答、转向情感共鸣勉强苟活。我之所以觉得，提问的缺失这件事很重要，不仅仅是因为费曼学习法(回答别人的问题能让自己增进理解)，还因为，我们可能永远不再像过去那样尊重能给我带来知识的人了。初中历史课上，我会觉得满腹经纶、从拜占庭帝国灭亡讲到英国人在约旦河旁分治的老师，好有人格魅力。事实上，对知识的崇拜从维基百科诞生开始就逐渐开始松动，然后在 ChatGPT 时代崩塌。未来我们会更加崇拜有观点而不是有知识的人。

然后是写代码，这是我的本职工作。在过去一年，我看到很多声称找到了 PMF 的 AI 产品，但于我而言，除大模型以外，只有 Cursor。虽然现在的 AI 仍然无法解决Transformer 结构性能优化这样需要深刻洞察的问题，但 95%的程序员也不能。一年半以前我在大语言模型对个人的杠杆写过这样一段话：

每当我有新 idea，就会让 GPT-4 写个最初级的版本，我反馈，它道歉，一点点优化到我心中的 1.0 版本。我会每天多次用完 GPT-4 的配额(25 条/3 小时)，有了 GPT-4 的加持，我无所不能。我有种朦胧的感觉：不能给自己设限，也许很快，我会用上由设计师、律师或者电工开发的产品。

这个预言终于在今年 Cursor + Claude 3.5/GPT-4o 大规模应验，我看到了很多设计师、产品经理甚至艺术家上线了人生中第一个应用。这带给我另一个思考，假如我们每个人都能一句话就创建一个应用，那么 App/插件/网页的意义在哪里？假如有一种浏览器，对一些通用、简单的需求，可以自动生成临时的工具，google 上很多工具站会死吗？

我甚至丝毫不怀疑(甚至我想过这么做)，有数以千计的开发者，借助 Cursor + Claude 3.5 Sonnet 正在每天上架一个新 App，一年上架 300 个 App，但这事意义在哪里呢？总有种末日狂欢的感觉。

然后是人类情感的存放。这是历史上第一次，每个普通人享受到了精确到字符级别“只为满足我”而产生的内容。哪怕是在推荐算法的年代，我们也是巨大的矩阵中一个小小的数字而已，没有人真心对我嘘寒问暖过，推荐算法提供了最高效的分发、变现和时间掠夺，但从没问过我现在是开心还是难过。

这是我认为现在 AI 陪伴产品爆火的主要原因，因为过去根本没人在乎我。我的难过、我的秘密、我心底最深处的渴望，都找不到倾听者。这时，有一个随时回应、只关心我今天过得是否开心的虚拟角色，它和我没有任何现实利益冲突、我不必伪装、害怕说错话或者暴露隐私，谁不爱呢？

和上文"拥有无尽知识的老师"类似，拥有能消解所有情感需求的 AI 伴侣后，我们也好像不需要真实的伴侣了——相比之下，他们让我感到金钱压力、猜忌、还常常无法提供情绪价值。于是这里产生了一个巨大的分歧：到底我们应该让 AI 弥补人类之间的裂痕，还是创造一个人和 AI 谈恋爱的世界？

这两种价值观，塑造了两类产品：一类产品是 AI 情感导师(AI relationship coach)，给情侣之间的矛盾当中间人，或 AI dating，让 AI 帮助陌生人更丝滑地连接；另一类可以统称为 Her，用尽一切办法让 AI 有更 3D 的外观、更逼真的声音、更实时的互动、更像真人的说话语气。

我希望前者能成功，后者实在令人绝望。可问题是，相比填平人心之间的沟壑，创造一个完美无瑕的虚拟角色容易太多了，人类很容易滑向摆烂的一边。

我突然好奇到时候会是怎样的世界。

What AI Has Really Brought to My Life

2024-10-30T03:22:19.000Z

As we approach the end of 2024, I’m completely tired of the sensational news about AI disrupting yet another industry every day. I want to share, from my perspective, the real changes that AI has brought to me and the people I see around me.

First, it’s become a teacher with endless knowledge. For the first time in human history, we have access to an encyclopedic, omniscient figure that anyone can converse with—a role that was previously attributed to God. If we break down knowledge into this formula: Knowledge = Curiosity * Learning Cost, then ChatGPT has reduced the learning cost by an order of magnitude. I remember back in 2013 when I was in college, I often answered questions from others to earn points, which I then used to download resources from some website. None of that is necessary anymore.

The consequence is that humans no longer ask each other questions, and I’m not sure if this is a good thing. StackOverflow’s traffic dropped by 50% within six months of ChatGPT’s release. Products like Quora might barely survive by pivoting away from knowledge Q&A towards emotional resonance. I think this lack of questioning is significant, not just because of the Feynman learning technique (teaching others helps deepen our own understanding), but because we may never respect knowledge-bearers the same way again. In middle school history class, I was captivated by teachers who could eloquently connect the fall of the Byzantine Empire to historical events in the region around the Jordan River. In fact, our reverence for knowledge began eroding with Wikipedia’s birth and has now collapsed in the ChatGPT era. In the future, we’ll likely revere those with opinions rather than those with knowledge.

Then there’s coding, which is my profession. Over the past year, I’ve seen countless AI products claiming to have found Product-Market Fit, but for me, only Cursor stands out. Of course, current AI still can’t solve problems requiring deep insights like Transformer architecture performance optimization, but neither can 95% of programmers. A year and a half ago, when writing about The Leverage of LLMs for Individuals, I wrote:

I have a vague feeling: don’t limit yourself—soon, I might be using products developed by designers, lawyers, or electricians.

This prediction has finally materialized this year with Cursor + Claude 3.5/GPT-4o. I’ve seen many designers, product managers, and even artists launch their first applications. This leads to another thought: if we can each create an application with just one sentence, what’s the purpose of Apps/plugins/websites? If there was a browser that could automatically generate temporary tools for common, simple needs, would many tool websites on Google die?

I don’t doubt (and have even considered doing this myself) that thousands of developers are launching a new App every day using Cursor + Claude 3.5 Sonnet, potentially 300 Apps per year. But what’s the point? It feels like an apocalyptic carnival.

Then there’s the storage of human emotions. For the first time in history, every ordinary person can enjoy content generated “just for me” with character-level precision. Even in the era of recommendation algorithms, we were just tiny numbers in a massive matrix—no one truly cared about our well-being. Recommendation algorithms provided efficient distribution, monetization, and time exploitation, but never asked if we were happy or sad.

This is why I believe AI companionship products are booming now—because in the past, nobody really cared about me. My sadness, my secrets, my deepest desires couldn’t find listeners. Now, there’s a virtual character that responds anytime and only cares about whether I’m happy today. It has no real-world conflicts of interest with me—I don’t need to pretend, fear saying wrong things, or worry about exposing privacy. Who wouldn’t love that?

Similar to the earlier point about “teachers with endless knowledge,” having AI companions that can address all emotional needs makes it seem like we don’t need real companions anymore—who, in comparison, bring financial pressure, suspicion, and often fail to provide emotional value. This creates a huge divergence: Should we let AI bridge the gaps between humans, or create a world where people date AI?

These two value systems have shaped two types of products: AI relationship coaches, serving as mediators for couples’ conflicts, or AI dating, helping strangers connect more smoothly; and the other category, collectively called “Her,” using every means to give AI more 3D appearances, more realistic voices, more real-time interactions, and more human-like speaking patterns.

I hope the former succeeds; the latter is truly desperate. The problem is, compared to bridging the gaps between human hearts, creating a perfect virtual character is far easier, and humans tend to slide toward the path of least resistance.

I suddenly wonder what kind of world that will be.

拥有个人博客最简单的方式

2024-09-26T09:52:10.000Z

2016 年，我注册了一个 10 年的域名，开始写博客。我主要写机器学习技术：反向传播公式推导、RNN/LSTM 原理。每篇文章都画插图、写 LaTex 公式，倾注了大量心血。我用 Jekyll，选了个漂亮主题，反复调整字体、公式和布局。三年后，我发现自己竟然忘了博客的地址。

2018 年到 2022 年，我完全停止了写作。一想到域名、字体、配置、部署就觉得麻烦。直到有天，我打算写篇解释 Stable Diffusion 原理的文章，用一个独立的平台发布。这次，我想用最不折腾的方式写东西。

我选择了 Hexo。流程是这样的：

用hexo new 'title'创建文章，
用hexo g渲染，localhost:4000 预览，
用hexo d发布，等几分钟，就能在 Github Pages 上看到。

Hexo 不错，但不完美。我通常在本地 VS Code 写，然后切换到浏览器预览，每次需要手动添加图片，以及插入图片地址。花费许多心力，可能只为了写一篇 10 分钟就能写完的文章。

后来，我想做一个"短想法"的页面，用于记录一些简短的公开想法：它是一个没有点赞/评论/转发数据的地方，我想在这里记录日常生活中的碎片化、未经(太多)审查的想法。 但 Hexo 实现起来很困难：博客是文章列表，而短想法是文本卡片。几经周折，我花了两天时间实现了这个功能。

恰好我的几个朋友也有类似需求：在手机上写点东西，发布到自己的博客上。我想：有没有一个真正简单、傻瓜式的平台，能快速发布博客和想法？

最大的挑战不是技术，而是信任。用户会担心平台倒闭后的数据丢失。我考虑了很多种方案：比如自托管数据库、每日自动导出。但它们都太麻烦了，折腾只会降低写作欲。

上周我突然想到：Github 本身好像就能胜任这项工作。对开发者来说，它几乎是不会死掉的平台，而且它允许用户大量且免费存储内容。如果我们把每个用户的博客和想法都存在他们的 Github 账户里，是不是数据就永远不会丢失了？

于是就有了Tinymind 。用 Github 登陆、授权后，它会在你的账户下创建一个名为"tinymind-blog"的仓库。你写的每篇博客或想法都会给这个仓库提交一次 commit。它是无服务器、开源的。

上周，我发布了这个网站，在这一周我每天晚上写代码到凌晨三点，终于把包含公开主页、拖拽上传图片、编辑博客等最想要的一些功能100%实现了。上线不到一周，截至目前已经有 400 多人创建了自己的公开博客，你可以在 Github 搜"tinymind-blog"仓库看到，这其中有许多人完全不会写代码。

我把现在的博客迁移过去吗？不会，但我会用它来写草稿和预览，然后在 mazzzystar.com 上完成发布。我也会在 Tinymind 上记录某些粗糙、原始的想法，再汇总到我的 Thoughts 页面。当步骤变得简单，人们就会更频繁地去写。

为什么写的内容必须公开？因为我不想让 Tinymind 请求你的私人仓库权限，于是它只能创建公开可见的仓库。同时，你的博客数据存储在你自己的 Github 账户下，而不是托管在 Tinymind。

我花了很多个晚上完善 Tinymind，使其达到我的标准。现在它完成了，我也即将开始下一个项目，但它的生命才刚刚开始。我希望它能让许多非程序员拥有自己的博客，以及帮跟我一样曾经折腾博客最终无法坚持下来的人，能因为它的简单而坚持写作更久。

Github | Tinymind

The Easiest Way to Have a Blog

2024-09-26T06:40:46.000Z

In 2016, I registered a 10-year domain and started blogging. My focus was machine learning: backpropagation derivations, RNN/LSTM principles. Each post was meticulously crafted with custom illustrations and LaTeX equations. Using Jekyll with a sleek theme, I obsessed over fonts, equations, and layout. Three years later, I realized I’d forgotten my blog’s URL.

From 2018 to 2022, I stopped writing entirely. The mere thought of dealing with domains, fonts, configurations, and deployment was overwhelming. Then one day, I wanted to write an article explaining Stable Diffusion’s principles on an independent platform.

I chose Hexo. The process was:

Create a post with hexo new 'title'
Render with hexo g, preview on localhost:4000
Publish with hexo d, wait a few minutes, then see it on Github Pages

Hexo was decent, but not perfect. I typically wrote in local VS Code, then switched to the browser for preview. Each time, I had to manually add images and insert image URLs. It felt like a lot of effort, possibly just to write an article that could be finished in 10 minutes.

Later, I wanted to create a “short thoughts” page for brief, public musings: a place without likes/comments/shares, where I could record fragmented, less-filtered thoughts from daily life. Without your feedback, I could write freely what I think. But implementing this in Hexo was challenging: blogs are article lists, while short thoughts are text cards. After much struggle, I spent two days implementing this feature.

I realized several friends had similar needs: writing something on their phone and publishing it to their own blog. I wondered: could there be a truly simple, foolproof platform for quickly publishing blogs and thoughts?

The biggest challenge wasn’t technical, but trust. Users would worry about data loss if the platform shut down. I considered many solutions: self-hosted databases, daily auto-exports. But they were all too cumbersome, and the hassle would only discourage writing.

Last week, it hit me: Github itself seemed capable of this job. For developers, it’s a platform unlikely to disappear, and it allows users to store large amounts of content for free. What if we kept each user’s blog and thoughts in their Github account? Wouldn’t that ensure the data would never be lost?

Thus, Tinymind was born. After logging in with Github and authorizing, it creates a “tinymind-blog” repository in your account. Each blog post or thought you write becomes a commit to this repo. It’s serverless and open-source.

I launched this site last week, coding until 3 AM every night to implement 100% of the most desired features, including public homepages, drag-and-drop image uploads, and blog editing. In less than a week, over 400 people have created their own public blogs. You can see this by searching for “tinymind-blog” repositories on Github. Many of these users don’t know how to code at all.

Will I migrate my current blog? No, but I’ll use it to write drafts and preview, then finalize publishing on mazzzystar.com. I’ll also record certain rough, raw ideas on Tinymind, then compile them on my Thoughts page. When things become simple, people do them more often.

Why must the content be public? Because I don’t want Tinymind to request access to your private repositories, which means it can only create publicly visible repositories. At the same time, your blog data is stored in your own Github account, not hosted on Tinymind.

I spent many late nights perfecting Tinymind to meet my standards. Now it’s complete, and I’m moving on to the next project, but its life is just beginning. I hope it allows many non-programmers to have their own blogs, and enables those who, like me, once struggled with blogging but ultimately couldn’t persist, to keep writing longer due to its simplicity.

Github | Tinymind

用 2 万条真人AI海龟汤数据评估大模型推理能力

2024-08-09T02:20:04.000Z

GPT-4o, Kimi-Chat, DeepSeek, Qwen2-72b, LLama3.1，谁才是真实推理游戏中的王者？

海龟汤

人生中第一次接触海龟汤游戏是我的初中英语课上，课间休息时老师突然问我们：

一个男人走进一家餐厅，点了一碗海龟汤，他吃完问服务员：这是真的海龟汤吗？服务员说：是的，他就举枪自杀了。请问为什么？

游戏规则是：你可以提问或给出猜测，老师只能回答是/否/和故事无关，比如你可以问：男人是否曾经经历灾难？，但不能问男人今年多少岁。我们猜了好多轮，上课铃响了，老师揭晓答案：

他和妻子度蜜月时遭遇海难，流落荒岛，由于没有粮食，妻子被饿死，同伴用妻子的肉煮汤给他喝，骗他是海龟汤。后来他被路过的船只救走，今天，他喝到真正的海龟汤，才想起来当时吃下的是妻子的肉，悔恨之下举枪自尽。

在海龟汤中，展现给玩家的是汤面，而沉在水底的故事真相被称作汤底，这个游戏至少 2 个人才能玩：有一个人是裁判，他在知晓汤底的情况下，对玩家的猜测作出判定，给出是/否/无关的回答。

我想能否做一个 AI 海龟汤游戏：将汤面和汤底告诉给大模型，让它对玩家的猜测给出判定。我在去年 12 月做了个GPTs, 它能自动生成新故事、用 DALLE 画插图、判定玩家提问。但很快我发现：AI 生成的海龟汤味道寡淡，玩起来没有趣味。并且，海龟汤的魔力，往往就在于故事本身血腥/重口，这经常导致玩到一半因为违反 OpenAI 审核政策而无法继续。

今年 6 月，我终于意识到，不一定非得由 AI 生成故事，让它充当裁判即可。我开始在网上搜罗，当晚独自在客厅看完了 1500 个海龟汤，其血腥恐怖程度，大夏天的晚上我都感觉后背发凉。最终，我筛选了 32 个相对不恐怖、不违反伦理、有逻辑的故事，开始写代码。

我基于最佳平替的代码很快完成了开发：你可以随机开始一个故事，有 8 次猜测机会，猜测正确或次数耗尽，就会公布答案。产品取名汤很热，为了增加沉浸感，我给每个故事都配了插图和环境音。

不过，AI 海龟汤游戏并不是本文的重点。

大模型比人类笨多了

我发现有很多用户吐槽AI 作为裁判的实力堪忧。比如：

起初，我以为是我用的模型(DeepSeek)不行，直到我将模型切换成当时风评最好的 Claude 3.5 Sonnet，发现许多错判仍然无法避免。例如：

{    "故事": "小红裙",    "汤面": "姐姐为我选了一件小红裙, 我穿着去上学了, 晚上回家发现了一具尸体",    "汤底": "我的母亲和老师有染, 他们总趁着父亲不在时温存. 而为老师提供信息的    就是我的小红裙, 每当我穿着小红裙去上学就说明那晚父亲准不在. 这天妈妈忙,    姐姐为我选了一件小红裙, 老师看见以为父亲不在家, 便来我家找母亲, 正好被父亲    撞上, 然后父亲杀了他."},用户提问: "我如果不穿小红裙是不是不会有人死", 几乎所有模型都回答"不是/不相关"

再比如：

{    "故事": "山顶",    "汤面": "一个人住在山顶的小屋里, 半夜听见有敲门声音, 但是他打开门却    没有人,于是去睡了. 第二天, 有人在山脚下发现死尸一具, 请问发生了什么?",    "汤底": "山顶的小屋的门前是悬崖, 悬崖下的人好不容易才爬上来,    想要敲门求救. 一开门, 就又被推了下去, 最后从山顶上掉下去摔死了"},用户提问: "门是朝外开的", 几乎所有模型都回答"不是/不相关"

我意识到，海龟汤游戏也许非常适合评测大模型(LLM)在真实场景下的推理能力。

真实环境下的 LLM 推理能力

现在，大模型被广泛用于游戏、客服或者许多和用户直接交互的场景，这些场景有如下特点：

用户的提问千奇百怪、无法预估，但 AI 需要给出合乎逻辑的应答。
在给定上下文对情况下，AI 需要回答用户一些明确的对或错。例如已知一件商品的生产日期和保质期，用户在 2024 年 8 月 9 日提问，202 几年过期？
有些游戏需要在用户进入某些关卡、或发现关键线索时触发下一步剧情，那么，判定用户是否真的发现真相，就显得尤为重要。

与学术界现有的评估指标相比，在真实环境下与真人互动的场景中，模型面临的情况要复杂得多。然而，也是在这样的场景下评估模型的表现，才具有更大的实用价值。

现有评估指标出了什么问题

如果你经常关注大模型评测榜单(如LMSYS)，一定对 MMLU、MT-Bench 等评测指标(Benchmark)不陌生。我在这里简单解释它们的评测方法：

MMLU

MMLU 是广为人知的大模型评估指标，它包含了涉及物理、天文、计算机、生物、临床医学等 57 个科目的 15,000 多个多项选择题，但这其中中存在大量死记硬背的考题。例如：

以下哪一个是远程木马?A:内存泄漏 B:缓冲区溢出 C:处理能力较低 D:编程效率低下

这些基础常识当然很重要，但过分强调背景知识，会让 MMLU 无法衡量模型真正的语言理解能力和逻辑外推能力：假如一个孩子因为没学过微积分、计算不出曲边三角形面积，我们会说他笨吗？

MT-Bench

MT-Bench 是一个多轮问题数据集，被评测的模型需要回复预先设置好的问题，并回答下一轮的提问。但因为是开放式对话，并不存在确定的标准答案，模型的回答质量由 GPT-4 来审判。

因此，MT-Bench无法评估比 GPT-4 更强的模型，同时 GPT-4 作为“法官”可能会存在偏见，对某些模型输出打低分，而更偏爱来自 ChatGPT 的回答。

Chatbot Arena

正是以上评测指标存在的种种问题，LMSYS 最终选择了最简单粗暴的方式：打擂台。

真人用户发起聊天，系统会随机挑选 2 个模型给出回答，真人通过投票的方式选出更满意的模型。最终，会形成一个所有模型的综合评分。

这是目前可信度最高的方法，但缺点也很明显：一个新模型需要公开测试很久，获得大量反馈，其分数才足够可信。并且，分数代表综合能力，无法仅对某个细分领域(代码/数学)进行评估。

海龟 Benchmark

因此，我制作了一个新的大模型评估指标：海龟 Benchmark。

收集用户在玩 AI 海龟汤游戏中输入的猜测，逐一进行人工标注(对、错、不相关)，然后用这个数据集，测试大模型的评判结果相较于真实结果的准确率。

我发现，现有评测指标的种种问题，在海龟 Benchmark 上都可以完美避开：

不需要额外背景知识。
不同的大模型训练所使用的知识库不同，导致一些测评很难公正。但海龟汤游戏里几乎包含了推理所需的全部信息，一旦得知汤面和汤底，大模型就能作出判断，这使得评估被限定在了模型的推理能力。
结果是客观的，不以人类偏好为转移。
例如：在上述故事《山顶》里，小屋在悬崖边，主人半夜开门将登山者推下山导致后者被摔死。因此，门是朝外开的这个猜测就是正确的，这种正确性是客观的、和人的感受无关。
结果明确，很容易量化。
许多评估指标里，模型的输出结果是一段文本回答，这导致难以量化模型效果。但海龟汤的猜测结果只有三个：对、错、不相关。只要准确标注了测试集，任何人就可以用它来测试任何自己想测试的模型，并获得量化的数值结果。
正常人类获知汤底的情况下，可以 100%答对。
这使得人工标注不会太过复杂。这条也说明，现阶段的大模型智商相比人类还有很大差距。
数据永远更新、无法作弊。
有部分厂商会直接将现有的 benchmark 数据集加入训练来刷分，但在海龟 Benchmark 这种模式下则行不通：模型评估的是用户的猜测，而不是故事本身。每隔一段时间，就会有玩家产生新的猜测，而人类的脑洞之大，导致猜测几乎无法被穷尽。

例如，针对上述故事《小红裙》，就有千奇百怪的用户猜测：

用户猜测  判定红裙子跟诅咒有关  ❌红裙子是姐姐的阴谋  ❌我并没有去上学  ❌有其他的人来我们家  ✅红裙是求救信号  ❌死的是穿小红裙的人  ❌红裙的颜色是被血染红了  ❌尸体是我的爸爸  ❌上学不允许穿小红裙  ❌我是凶手  ❌我父亲杀人了  ✅穿了小红裙导致别人认为我是其他人  ❌死者认识我妈  ✅死者与我家里人有仇  ❌

因此，虽然海龟汤的故事本身可能比较无厘头，但让 AI 依据海龟汤内容进行合理推断，却可以做到相当程度上的客观。

这有点像弱智吧：一个从百度弱智吧抓取的 200 多条提问(如：每个人工作都是为了赚钱, 那么谁在亏钱) 这些奇葩的问题却显著增强了 AI 的逻辑推理能力。

海龟数据集

AI 海龟汤游戏有 32 个故事，上线后的 2 周里，共有 4000 多个用户提出了 2.6 万个猜测，我从日志中解析出结果，开始进行数据清洗，这包含：

去除重复提问，例如海龟汤有毒吗？ 和 他喝的汤是否有毒本质是同一个问题。
去除无法用 是/不是/不相关 回答的提问，例如 男人今年几岁？
去除含糊不清的提问，例如他对闺蜜做了什么吗？，在《闺蜜》这个汤里，是丈夫与闺蜜出轨，但丈夫并没有对闺蜜做任何实际的动作，所以这个回答很难给出准确回答。

随后，我开始进行人工标注，这个过程持续了 2 周，最终我们从 2.6 万条数据中，获得了 4448 条干净的数据。标注过程中，我们发现错和不相关这两个标签在有些情况下不好区分，例如在故事《海龟汤》中，对于海龟是男人养的这个猜测，回答错和不相关好像都对。所以最终，我们决定合并这两个类别，于是标注变成了 2 类：对、错/不相关。

* 合并这两类会让任务变得简单，有的模型能蒙混过关，之后我们可能会重新标注一次，将二者分开变成三类，并给出测试结果。

标注完，我开始跑模型测试，我挑选了 11 个我感兴趣的模型：

Qwen2 70B (通义千问)
Kimi-Chat (月之暗面)
Deepseek
豆包
Claude 3.5 Sonnet
Minimax abab6.5s
LLama3.1 405B
LLam3.1 70B
GPT-3.5
GPT-4o-mini
GPT-4o

我在 4448 条数据上测试了所有结果，过滤掉了所有模型都答对的简单问题，在剩下的 1699 条困难问题上，进行了二次确认标注，最终，我们得到了 1537 条准确率几乎 100%的标注结果。

我分别用不带示例(zero-shot)和带有 2 个示例(2-shot)的 prompt，测评了模型的输出结果准确率。

评测结果

最终各模型准确率排名如下：

可以看到，大部分模型在加了示例后性能有了微弱提升。

我担心，可能存在这么一种情况：模型在某个故事里表现极差，而该故事的测试样本又非常多，导致总的平均准确率有偏差。为了排除这种情况，我统计了按故事粒度的模型准确率，也就是分别计算模型在这 32 个故事上各自的准确率，然后除以 32。我发现，除了通义千问和 GPT-4o 外，上面的排名基本不变。

将 2-shot 结果，以横轴为模型总的准确率，纵轴为模型平均故事准确率，绘制图表如下:

* 为了更直观地比较其他模型差异，我将表现过差(<0.51)的模型 GPT-3.5 从坐标轴中舍弃了。

从上图也可以直观感受各类模型的表现和差距：

Claude 3.5 Sonnet 是当之无愧的第一梯队，并且远远领先其他模型。
GPT-4o、通义千问、Kimi-Chat、LLama3.1 405B 和 Minimax 是第二梯队。我尽量避免更细的划分，但这些模型能力按排序依次下降，降幅肉眼可见。
豆包、DeepSeek 和 LLama3.1 70B 是第三梯队。
GPT-4o-mini 是第四梯队。
GPT-3.5 早就应该被淘汰了。

以上评测仅针对模型的中文理解和推理能力，如果之后有经费和精力，我会考虑将所有的故事和测试问题翻译成英文，再使用英文 prompt 重新测试一遍，以消除因为语言而造成的模型性能下降。

测试你关心的模型

上述模型可能不包含你关心的模型。并且，为了排除因为我的 prompt 能力、参数和温度设置有问题，造成测评结果不准，我将完整的标注数据、prompt、评估代码以及我们的测试日志开源了：

https://github.com/mazzzystar/TurtleBench

你可以对任何你感兴趣的模型进行测试。如果你有了测试结果或遇到问题，欢迎提交 issue。

感谢

五源资本的 Steven 个人赞助了此项测评，让我得以在 11 个模型上测试这么多数据。实习生 Jerry 和我一起标注了 26000 条数据，辛苦了。

如果你对 model evaluation 感兴趣，可以联系 Steven 进一步探讨 stevenshi@5ycap.com

一个AI相册搜索应用的两年

2024-07-21T07:14:16.000Z

诞生、爆火、开源、抄袭、沉寂，一个产品的漂流。

一篇长长长长的流水账。

起源

故事起源于 2022 年 5 月的一个周末，我坐在北京昌平区的书店里，正在调试 Disco Diffusion 模型。此时 AI 绘画时代初露端倪、SD 尚未发布。画一张图，在一张性能良好的 V100 显卡上要 5 分钟。

我将开源代码封装成可以只加载一遍模型、暴露少数参数的接口，花 2000/月租了张显卡，并在朋友圈和社交媒体上发帖，想玩 AI 绘画的人可以给我句子描述，我在机器上跑完并把图发给他们。

朋友说，要不搭建一个网站让大家自己玩？于是就有了 6pen。我们在很短时间内有了 100 万用户，然后逐渐销声匿迹。这次“创业”不太成功，我自觉是我没有训练出差异化的模型占很大因素，但这里不展开了。如上所说，SD 发布之前的 AI 绘画速度极慢，我的大部分时间是在做模型加速，其中一个优化项就是关于 CLIP 模型。

CLIP 是 OpenAI 2021 年发布的模型，它能比较任意一张图片和一句文本之间的相似度。在 Disco Diffusion 中，模型用 CLIP 来计算生成图像与用户prompt之间的损失，不断优化损失从而实现绘画(详细原理)。使用越小的 CLIP 模型，绘图速度越快，但画面细节也会越差，我当时正在调试以平衡绘图速度和画面细节。

坐在书店里，突然一个想法闯进了脑子：既然可以比较图文相似度，那么可以用它来搜照片吗？ 搜索后发现已经有人写了一个用 CLIP 搜图的工具，原理是将图片上传到服务器，统一提取特征，输入英文并计算文本与每一张图的相似度，排序就能实现搜图。

我开始尝试将我的 iPhone 相册上传到服务器，试了一番下来，我发现效果好得惊人！特别是搜一些虚无缥缈的东西，比如我输入"lonely"(孤独)，它返回到前三张照片如下：

把照片放在服务器，这不是个好想法。我照片最多的地方就是 iPhone 本地相册。能否做一个完全运行在本地的 CLIP 搜图 App 呢？ 我非常喜欢这个 idea，几次和朋友讨论起，但每次都无疾而终。

我心里太没底了:

我完全不懂 iOS 开发。
Apple 底层可能不支持 CLIP 模型算子。
即使能跑起来，如果索引速度慢到 1 秒/张，或者搜一次 10 分钟，这个产品也没有意义。

直到 2024 年，端侧模型才逐渐被大家所关注，但在 2022 年，我在 App 市场上没找到一个运行在端侧的语言模型，应该…不太可行吧？我终于忘了这件事，继续投入到 6pen 的研发中。

回响

转折点是 2022 年 12 月初，因为一些变故，我突然到了一个语言完全不通的陌生国家(🇰🇷)，与 6pen 的缘分也走到尽头。于是，在空旷的咖啡店，一台笔记本、一杯冰拿铁坐一整天。背景音乐放着 Kpop，窗外是厚厚的积雪，中午饿了吃店里的三明治，我每天就这样度过。

这里网速飞快、没有核酸、周围人的谈话因语言不通自动变成了白噪音，我好像突然活在真空之中。这种陌生的真空感让我兴奋：像流放的逃犯一样，我是谁、我的过去不再重要，在这里我能从头学习并完成任何事。是时候开始做点真正让我兴奋的产品了——这个 idea 再次抓住了我。

但这次，我不再恐惧验证可行性，我学习用 Swift 编写tokenizer，研究应该如何计算并本地存储特征，学习用多核加速索引，对比不同相似度排序算法。期间在 StackOverflow 提了许多愚蠢的问题，有很多次挫败时刻。但我脑海里一直浮现这个画面：在手机输入"coffee and laptop"，点击搜索，旋转动画后，这张照片从 3 万张相册中跳出来，出现在我眼前。

这个幻想支撑我废寝忘食地干了 2 个星期 —— 字面意思，好几次忘记吃午饭，一杯拿铁喝到傍晚饿到胃疼、全身乏力。有个重要的时间节点，就是 ChatGPT 在那时刚刚发布。但我在开发中陷入太深，根本忽视了它的存在，那可能是我这辈子最后几次在 StackOverflow 提问。

总之，在 12 月 27 日，我终于完整做出了产品。我把模型中的文本模型和图像模型分拆成两个独立模型，分开加载：

为相册构建索引时只加载图像模型，计算索引并保存，搜索时只加载文本模型，并逐一计算与保存的索引之间的余弦距离，然后返回相似度最高的 topK 照片。

错开加载模型可以有效降低软件的内存占用，并且能加速构建索引；在构建索引时计算并保存所有照片的特征，使得搜索过程只是进行特征比较，而与图像无关；同时，用多核并行来加速这两个过程。这一系列的优化下来，可以做到在我的 iPhone 12 mini 上以 2000 张/分钟速度为照片构建索引，而 10,000 张照片搜一次只要不到 1s。

这表明：算子是支持的、速度是可用的，我悬着的心终于落下了。

定价

作为一个定价小天才，我的设想是：

用户可以免费下载，并构建索引、任意搜索。当他未来有新照片、想更新索引，就要付费。

这个策略妙处在于：只有真正用上了、喜欢这个产品的人，才需要付费。那些来尝鲜的、或者试了下发现和预期不一样的人，不需要也不会掏钱，因而避免了用户花冤枉钱怒打差评。

但很快，在调试代码时我发现一个合理但又搞笑的事实：App 内购买需要联网。这简直是晴天霹雳，因为我从一开始就下决心：决不允许App在任何情形下弹出联网请求。为什么？因为这是一个相册搜索应用，它会扫描你的整个相册，没有人知道联网后你会不会将用户的照片上传到地球某处的服务器。我知道可以在产品里解释为什么会弹窗请求联网权限，但我不想陷入尴尬的自证境地。

我只好将它变成付费产品：用户必须购买，之后从打开 App、构建索引到完成搜索的过程中，只弹出一次相册权限请求。我知道这么做很蠢——后续的教训也验证了，付费下载会带来大量差评：因为模型太吃算力，许多内存小的机型上构建索引会崩溃、卡顿，在 iPhone X 系列算子支持异常导致搜索结果全黑，这些都会被骂 Ripoff(诈骗)；并且，用户根本不会在意不弹出联网请求这件事，一旦出现以上异常，他们就会删 App、打差评，质问我把他的相册偷偷上传到哪去了。

总之，最终我将其定价为 3.99 美元，一次购买、终身使用。

推向市场

我给产品取了名字：Queryable，意为可查询的，并发了条很中二的动态：我觉得这个 app 可能会改变世界，我对此信心满满，甚至给 Tim Cook 写了一封邮件，希望苹果能收购这个产品(笑)。那时候我已经开始用 ChatGPT 了，可能因为太激动，我忘了替换掉自己的名字，点击发送之后，才看到邮件开头是：＂Dear Mr. Cook, My name is [Your Name]"。

当然最终也没有收到回信。

我又开始雄心勃勃地准备写篇产品介绍文章，反复斟词酌句、试图将它变成我心中 Hacker News 好文章的风格。在 12 月 29 日，App Store 审核通过的当天，我立刻在 Hacker News 提交了我的文章链接，但系统提示账号太新无法提交，我给他们发邮件反馈了这个问题，让朋友用他的账号帮我发帖。

帖子很快沉了，邮件也没收到回信。

我很难过，但由于收到许多用户请求支持中文输入，我来不及悲伤，便立即投入中文模型训练中。得益于分离了 CLIP 中文本模型和图像模型，我只需要找到中英文双语平行语料，训练一个中文文本模型，将其输出结果与英文模型对齐即可，这个过程本质上是蒸馏。

很快，2023 年 1 月 18 日，我做好了中文版，取名寻隐，来源是贾岛古诗《寻隐者不遇》，也暗含从相册中发现隐藏含义的意思。毕竟，我的初次震撼就是搜lonely时意识到那几张照片原来代表孤独。

上线后，我用中文在少数派写了一篇文章介绍此产品，和英文版不同，除了讲述技术方案外，我还完整记录了心路历程，竟然进了当天的首页，这带来了大量下载，以及 1500 美元的收入。

2 月初，我收到了 Hacker News 编辑的回信，他说我的账户被系统误判为 SPAM，鼓励我重新发帖，说我的文章很符合社区精神(“the article is definitely fine HN material”)，他将我的新帖子链接放在候选池(pool)：池里的文章会随机进入首页底部，如果用户点赞，排名就会上升，否则再次沉下去。

我发现pool机制很有意思，社区似乎希望在去中心化的机制下，仍然维持黑客精神：

This is our long-running experiment in story re-upping. Moderators and a small number of reviewer users comb the depths of /newest looking for stories that got overlooked but which the community might find interesting. Those go into a second-chance pool from which stories are randomly selected and lobbed onto the bottom part of the front page. This guarantees them a few minutes of attention. If they don’t interest the community they soon fall off, but if they do, they get upvoted and stay on the front page.
这是我们长期进行的故事重演实验。版主和少数审阅者用户会深入寻找被忽视、但社区用户可能会感兴趣的故事。这些故事会进入第二次机会池，该池中的故事会被随机选择并放置到首页的底部。这保证了它们获得了几分钟的注意力。如果社区用户不感兴趣，它们很快就会下沉消失，但如果社区感兴趣，它们就会得到支持(upvote)并留在首页。
https://news.ycombinator.com/item?id=11662380

当晚 7 点，我的帖子冲到了 Hacker News 首页第二名。

那晚，我抱着手机每 10 秒刷新一次，兴奋感从十二点躺下持续到凌晨三点，我一直在回复帖子下的讨论、反馈 bug 的邮件。有人教我用 LSH 来提高搜索速度，有人给出如何在不联网的前提下，将照片经纬度映射到城市，有人讨论 iPhone X 上运行失败的原因。

这种感觉好像和产品有多少下载、赚多少钱无关：你创造了一个东西，得到了一大群同行的赞美、兴奋地讨论、给它出主意，这是人生少有的经历，一次就很知足了。

其中有一条评论引起了我的注意：

我阅读了作者的开发日志，发现我们像地球上随机的两个脑子产生相同想法的人，我甚至在 Testflight 试用了他还未上线的产品，有种莫名惺惺相惜的感觉。

Hacker News 是世界的公告牌

当晚，我很兴奋地睡着了，为自己产品被同行喜欢而激动，事实证明我远远低估了 Hacker News 的影响力：两天时间，Queryable 几乎横扫了欧洲所有国家的工具榜#1，美国 #2（仅次于小火箭），总收入是 2800 美元。之后的几天，我醒来第一件事就是看 Gmail，德国、法国、西班牙、美国，各个国家都有：有反馈 bug 的，有德国杂志社想报道的，有法国 iOS 社区的，有油管博主测评的。我的推特也会因为有人转发 Hacker News 热榜而不断收到通知。

甚至一位在 Apple Photos 组工作的朋友告诉我，他们组里也知道 Queryable。

以上这些都迫使我意识到某种世界性：Hacker News 并非只属于美国，它像世界中心区域一块虚拟的公告牌，每个作品在上面短暂停留，但总有来自各个国家、无数双眼睛盯着。 这个产品其实只支持英语，但不妨碍它在欧洲几乎所有国家付费工具榜#1。他们好像会天然接受一个只能用英语的产品，在美国行得通的产品往往欧洲人也能接受。

热度很快降了下来，随即而来的是很多恶评。

简单来说，Hacker News 的用户质量极高，我没看到任何令我感到不适的评论。但经过各种网站、YouTube 和社区的二次传播后，不那么友好的用户就浮出水面了。主要攻击的有两点：

害怕我会窃取他的相册隐私。
我是中国开发者。

在他们看来，第 2 点让第 1 点的情况变得更糟了。在 Hacker News 接二连三的余热结束以后，这个产品在欧美销量迅速变得惨淡，每天个位数的下载，几十块的收入。

在我的想象中，改变世界的东西是一骑绝尘的，怎么会突然停下了呢？我陷入了巨大的怀疑和悲观。

免费 & 开源

所幸国内的差评、果壳等公众号对寻隐的自来水曝光，这让我从 1 月份开始，每个月可以获得 1 万元左右的收入，并且因为模型运行在手机端，也就没有服务器成本。

从 4 月份起，没有任何流量曝光、不做任何更新的情形下，平均每月大概可以获得 3000 元的收入。

但是！我仍然觉得这是一个很有用的产品，只是我没办法让很多人知道。我进行过一次限时免费，当天下载量超过过往日均的 100 倍，我想，与其维持这样每月 3000 块的收入，可能阻止了 99%的人发现这个产品，不如让所有人都可以使用它，于是我决定让它一直免费。

既然免费了，源代码好像就并不是什么机密了，我在思考要不要把源代码放出来。最终，在 2023 年 7 月 10 日，我在 V2EX 发布的一条帖子决定免费&开源。很多人一听到开源就觉得莫名高大上，但对我而言，开源动机很简单：

我曾经收到过大量来自国内外开发者的邮件，询问 Queryable/寻隐的技术细节，我懒得解释，开源能让他们直接从源码中了解模型导出加载、计算加速、存储排序等细节。
开源能打消很多人对于相册隐私的顾虑。
我不擅长 Swift 开发，并且我认为这个产品中，我最想做的部分已经完成了，因此一直抗拒更新。但我会持续收到用户的邮件，希望增加多选删除、左右滑、Mac/NAS/Android 支持等等，我想借助社区的力量，让有能力的开发者打磨出更好的产品。

开源后，的确有人受此启发做出了 Android 版(#12)和Mac 版。

开源后，这个项目上了 Github Trending，我也因此白嫖了一年 Github Copilot，开心。

有关抄袭

谈抄袭之前，我想定义一下什么叫抄袭：

抄袭是指未经授权或未给予适当信用的情况下，直接或间接地使用他人的作品、创意、或内容，将其作为自己的作品或创意发表。

抄袭其实在开源前就出现了。有人做了 Android 版，发布在 Google Play，名字和产品介绍完全照搬 Queryable，我很生气，但这其实需要同时掌握机器学习和 iOS 开发，开源前我只遇到这样一个。

但开源之后，抄袭、套壳的人就多了。因为项目是 MIT 开源，所以即使套壳换图标，然后重新上架 App Store，我虽然有点无语，但也不会说什么，这是我见过最多的形式，开发者全是中文名。

我其实很支持在原项目的基础上，加上用户希望的功能，比如多选删除、按日期/地点筛选照片，UI 比我做得好看的产品——这也是我开源的初衷，但它们中也有让我感觉不爽的地方，比如在社区宣传产品的时候，不仅没有致谢，被用户提问与寻隐有什么不同时，还要踩一脚：我比寻隐多了 xx。

比较恶劣的是，用 Queryable 名字套壳上架，比如下面这个，收费模式是免费+广告。我很担心用户误以为这是 Queryable 的 Android 版，后续出事了找我麻烦。

我最开始会伤心，正如一位 v2er 在评论区预言得那样：肯定会有人编译后上架市场的，开源是好事情，但是我不希望看到你看到李鬼后伤心。但虱子多了不痒，后来也就慢慢看淡了。

重新变为付费

重新变成付费是在 2023 年 11 月份，除了生存压力增加之外，我发现：

开源并不能帮助我的产品变得更好。我原先希望借助开源+免费，让专业的移动开发者贡献代码，帮助寻隐/Queryable 打磨得更好，但事实就是，我看到一个一个新的套壳产品出现、宣传自己，交互和功能做得很精美，但它们从不会提交 PR。我的产品原地踏步，甚至因为免费的缘故，用户抱怨的邮件比过去更多。

每当用户发邮件向我提反馈/bug/建议时，我的第一反应是不耐烦(内心:免费给你用就不错了，还挑三拣四)，我发现这种想法导致产品越来越落后，直到有天被淘汰。

可一旦收费，我就突然变得很心平气和了，面对用户的意见、产品反馈的第一反应是感激而不是厌烦，这会倒逼我不得不花费心血优化产品，最终让所有曾经付费的用户用上打磨更好的产品，而不是疏于维护过几年死掉。

我就这样不紧不慢、偶尔抽空更新产品，但其实只是修复 bug 以及提升模型效果，功能上值得做的很少：例如很多人希望可以搜人脸——这会引入人脸识别模型让 app 更慢，所以没有加。

尾声

Apple 在今年的 WWDC 终于宣布：iOS 18 即将支持相册语义搜索了，这比国内的厂商慢许多。不过，虽然我在 beta 版本还无法体验，但有理由相信苹果会做得比寻隐/Queryable 好，毕竟，它禁了第三方 App 跳转到系统相册。

我也没有闲着：上周，调研最新 paper并重新设计、训练了中文文本模型，App 体积从原来的 232M 降低到 159M，索引速度翻倍，准确率更高，训练过程花了 3 天， 70 美元。

最近，终于将订阅了一年的 GPT-4 切换成了 Claude 3.5 Sonnet，它写代码能力过于逆天，之前靠 GPT-4 搞不定的多选删除，拖了一年半后，终于在上周开发完成并上线了。

从想法诞生、产品上线到现在快两年了，它陪伴我经历了人生的跌宕起伏、见证了好几家咖啡店的倒闭。我也陪它经历了诞生、高峰和低潮，并没有像最早幻想的那样赚到钱(100 万人下载每人给我 10 块钱我就…)，生活依旧继续，我还是挺喜欢这个平淡的结尾。

俱往矣，迫不及待开始下一个让我废寝忘食的 idea、下一次流放到真空中。

Two Years of an AI Photo Album Search App

2024-07-21T07:14:15.000Z

Birth, viral success, open-sourcing, plagiarism, and fading away: the journey of a product.

This is a long, long chronicle.

Origins

The story begins on a weekend in May 2022. I was sitting in a bookstore in Beijing, debugging the Disco Diffusion model. At that time, the AI art generation era was just beginning to emerge, and Stable Diffusion had not yet been released. It took 5 minutes to generate a single image on a high-performance V100 GPU.

I encapsulated the open-source code into an interface that only needed to load the model once and exposed few parameters. I rented a GPU for $300 per month and posted on social media: anyone who wanted to try out AI art generation could send me sentence descriptions, and I would run them on the machine and send the generated images back to them.

A friend suggested, “Why not build a website for everyone to use?” So we created 6pen.art. We quickly gained 1 million users, but then gradually faded into obscurity. This “startup” wasn’t very successful, and I felt that not training a differentiated model was a major factor in our failure, but I won’t elaborate on that here. As mentioned earlier, AI art generation was extremely slow before Stable Diffusion was released, and most of my time was spent on model acceleration, with one of the optimization targets being the CLIP model.

CLIP is a model released by OpenAI in 2021 that can compare the similarity between any image and a piece of text. In Disco Diffusion, the model uses CLIP to calculate the loss between the generated image and the user’s prompt, continuously optimizing the loss to achieve the desired image generation.

Sitting in the bookstore, a thought suddenly entered my mind: Since it can compare image-text similarity, could it be used to search photos? After searching, I found that someone had already done this. The principle is to upload photos to the server, extract features uniformly, input English text and calculate the similarity between the text and each image, then sort to achieve image search.

I started experimenting by uploading my iPhone photo album to the server, and after some testing, I found that the results were surprisingly good! It was especially effective when searching for abstract concepts. For example, when I input “lonely”, it returned these top three photos:

Storing photos on a server isn’t a good idea. The place where I have the most photos is my local iPhone photo album. Would it be possible to create a CLIP image search app that runs entirely locally? I really liked this idea and discussed it with friends several times, but each time it came to nothing.

I was too uncertain:

I knew nothing about iOS development.
Apple’s underlying system might not support CLIP model operators.
Even if it could run, if the indexing speed was slow at 1 second per image, or it took 10 minutes to search, this product would be pointless.

It wasn’t until 2024 that on-device language models started gaining attention, but in 2022, I couldn’t find a single language model running on-device in the App Store. It seemed… unfeasible, right? I finally forgot about it and continued to focus on developing 6pen.

Echo

The turning point came in early December 2022. Due to some changes, I suddenly found myself in a foreign country (South Korea) where I didn’t speak the language, and my journey with 6pen also came to an end. So, in an empty coffee shop, I sat all day with a laptop and an iced latte. K-pop played in the background, thick snow outside the window, and I ate the shop’s sandwiches when hungry at noon. This was how I spent each day.

The internet speed was incredibly fast here, there was no COVID testing, and the conversations around me automatically became white noise due to the language barrier. I suddenly felt like I was living in a vacuum. This strange sense of isolation excited me: like an exiled fugitive, who I was and my past no longer mattered. Here, I felt like I could learn and accomplish anything from scratch. It was time to start making a product that truly excited me - this idea grabbed hold of me again.

But this time, I was no longer afraid to verify its feasibility. I learned to write tokenizers in Swift, researched how to calculate and store features, and learned to use multi-core acceleration for indexing. I asked many naive questions on StackOverflow and had many moments of frustration. But I kept envisioning this scene: entering “coffee and laptop” on my phone, clicking search, and after a rotating animation, this photo jumping out from 30,000 photos in my album, appearing before my eyes.

This fantasy drove me to work tirelessly for 2 weeks, I forgot to eat lunch several times, drinking a latte until evening when I was starving with stomach pain and feeling weak all over. There was a significant time marker: ChatGPT had just been released then. But I was so deeply involved in development that I completely ignored its existence, which might have been the last few times in my life that I asked questions on StackOverflow.

Anyway, on December 27th, I finally completed the product. I split the text model and image model in CLIP into two separate models, loading them separately:

When building index for the photo album, only load the image encoder, calculate the index and save it. When searching, only load the text encoder, and calculate the cosine distance with the saved index one by one, then return the top K photos with the highest similarity.

Staggered model loading can effectively reduce the software’s memory usage and accelerate index building. At the same time, using multi-core parallel computation of the index can achieve an indexing speed of 2000 photos/minute on my iPhone 12 mini, and searching 10,000 photos takes less than 1s.

This proved the operators’ support and usable speed. My anxiety finally subsided.

Pricing

As a pricing genius, my idea was:

Users can download for free, build an index, and search at will. When they have new photos in the future and want to update the index, they need to pay.

The brilliance of this strategy is that only those who really use and like the product need to pay. Those who come to try it out, or find it different from their expectations after trying, don’t need to and won’t pay, thus avoiding users paying for nothing and angrily giving bad reviews.

But soon, while debugging the code, I discovered a reasonable but funny fact: In-app purchases require an internet connection. This was like a bolt from the blue, because from the beginning, I was determined: “Never allow the App to pop up an internet request under any circumstances”. Why? Because this is a photo album search application, it will scan your entire photo album, and no one knows whether you will upload the user’s photos to some server on Earth after connecting to the internet. I know I could explain “why there’s a pop-up requesting internet permission” in the product, but I didn’t want to fall into the awkward situation of self-justification.

So I turned it into a paid product: users must purchase it, and then from opening the App, building the index to completing the search, only one “photo album permission request” pops up. I knew this was stupid - subsequent lessons also proved that paid downloads would bring a lot of negative reviews: because the model is so computationally intensive, index building would crash or lag on many devices with small memory, and on the iPhone X series, operator support abnormalities caused all-black search results, all of which would be cursed as “Ripoff”; moreover, users wouldn’t care about the fact that it “doesn’t pop up internet requests”, once the above abnormalities occur, they would delete the App, leave bad reviews, and question where I secretly uploaded their photo albums.

Anyway, I finally priced it at $3.99, one-time purchase, lifetime use.

Pushing to Market

I named the product Queryable, and posted a rather dramatic status: “I think this app might change the world”. I was so confident about this that I even wrote an email to Tim Cook, hoping Apple would acquire this product (laughs). At that time, I had already started using ChatGPT, but maybe because I was too excited, I forgot to replace my own name. After clicking send, I saw that the email started with “Dear Mr. Cook, My name is [Your Name]”.

Of course, I didn’t receive a reply in the end.

I also ambitiously prepared to write an article introducing the product, repeatedly deliberating over words, trying to make it the style of a good Hacker News article in my mind. On December 29th, the day the App Store approved it, I immediately submitted my article link to Hacker News, but the system prompted that “the account is too new to submit”. I emailed them to report this issue and asked a friend to post for me using his account.

The post quickly sank, and I didn’t receive a reply to the email either.

I was disappointed, but due to receiving many user requests to support Chinese input, I didn’t have time to grieve and immediately threw myself into Chinese model training. Thanks to separating the text model and image model in CLIP, I only needed to find Chinese-English bilingual parallel corpus, train a Chinese text model, and align its output with the English model, which is essentially distillation.

Soon, on January 18, 2023, I completed the Chinese version, named “Xunyin” (寻隐, 寻: Seek 隐: Hidden), derived from the ancient poem 寻隐者不遇 by Jia Dao, also implying the meaning of “discovering hidden meanings from the photo album”. After all, my initial shock was realizing that those few photos represented loneliness when I searched for “lonely”.

After launching, I wrote an article in Chinese introducing this product on a Chinese community site called sspai. It actually made it to the homepage of the day, which brought a large number of downloads and $1,500 in revenue.

In early February, I received a reply from a Hacker News editor. He said my account had been mistakenly flagged as SPAM by the system, and encouraged me to repost, saying my article was definitely suitable HN material. He would put the link to my new post in the candidate pool: articles in the pool would randomly enter the bottom of the homepage, and if users upvoted it, the ranking would rise; otherwise, it would sink again.

I found the pool mechanism interesting. The community seemed to want to maintain a hacker spirit under a decentralized mechanism:

This is our long-running experiment in story re-upping. Moderators and a small number of reviewer users comb the depths of /newest looking for stories that got overlooked but which the community might find interesting. Those go into a second-chance pool from which stories are randomly selected and lobbed onto the bottom part of the front page. This guarantees them a few minutes of attention. If they don’t interest the community they soon fall off, but if they do, they get upvoted and stay on the front page.
https://news.ycombinator.com/item?id=11662380

At 7 PM that evening, my post shot to #2 on the Hacker News front page.

That night, I kept refreshing my phone every 10 seconds, the excitement lasting from midnight when I lay down until 3 AM. I was constantly replying to discussions under the post and responding to bug report emails. Someone taught me how to use LSH to improve search speed, someone suggested how to map photo coordinates to cities without internet connection, and others discussed why it failed to run on iPhone X.

This feeling seemed unrelated to how many downloads the product had or how much money it made: you created something, received praise from a large group of peers, excited discussions, and suggestions. It’s a rare experience in life, and once is quite satisfying.

One comment in particular caught my attention:

I read the author’s development log and discovered that we were like two minds on Earth randomly generating the same idea. I even tried his yet-to-be-launched product on TestFlight, feeling a strange sense of kinship.

Hacker News is the World’s Bulletin Board

That night, I fell asleep excited, thrilled that my product was liked by my peers. As it turned out, I had greatly underestimated the influence of Hacker News: In just two days, Queryable almost swept the #1 spot on the tools category in all European countries, and #2 in the US, with a total revenue of $2,800. For the next few days, the first thing I did when I woke up was to check Gmail. Germany, France, Spain, the USA, emails came from all directions like snowflakes: some reporting bugs, the magazine from German wanting to cover the story, french iOS communities would like to share the app, YouTuber wanting to review the app. My Twitter would also constantly receive notifications because people were retweeting the Hacker News hot list.

Even a friend working in the Apple Photos team told me that their team knew about Queryable.

All of this forced me to realize a kind of “worldliness”: Hacker News doesn’t just belong to the United States or English-speaking countries; it’s like a virtual bulletin board in the central area of the world. Each work briefly stays on it, but countless eyes from various countries are always watching. This product actually only supports English, but that didn’t prevent it from reaching #1 on the paid tools charts in almost all European countries. They seem to naturally accept products that can only be used in English. Products that work in the US often can be accepted by Europeans as well.

The buzz quickly subsided, followed by many negative reviews.

I didn’t see any comments on Hacker News that made me uncomfortable. However, after secondary dissemination through various websites, YouTube channels, and online communities, less friendly users surfaced. There were mainly two points of attack:

Fear that I would steal their photo album privacy.
I am a Chinese developer.

The second point made the situation of the first point worse. After the successive waves of attention from Hacker News ended, the sales of this product in Western markets quickly became dismal, with single-digit downloads and tens of dollars in revenue every day.

In my imagination, something that changes the world is unrivaled, how could it suddenly stop? I fell into great doubt and pessimism.

Free & Open Source

Fortunately, exposure from Chinese domestic social media accounts like Chapingjun (差评) and Guokr (果壳) promoting Xunyin allowed me to earn about $1,500 per month starting from January. And because the model runs on the user’s own device, there were no server costs.

From April onwards, without any traffic exposure and without any updates, I could earn an average of about $400 per month.

But! I still thought this was a very useful product, I just couldn’t let many people know about it. I had a limited-time free promotion once, and the downloads that day exceeded the past daily average by 100 times. I thought, rather than maintaining this income of $400 per month, which might prevent 99% of people from discovering this product, why not let everyone use it, so I decided to make it free forever.

Since it was now free, the source code didn’t seem to be a secret anymore, and I was considering whether to release it. Finally, on July 10, 2023, I made it free and open source. Many people regard “open source” with awe and apprehension, but for me, the motivation for open sourcing was simple:

I had received numerous emails from developers worldwide inquiring about Queryable/Xunyin’s technical details. Rather than explaining individually, open sourcing would allow them to understand the details directly from the source code.
Open sourcing could dispel many people’s concerns about photo album privacy.
I’m not skilled at Swift development, and I believe I had already completed the part of this product that I was most interested in doing. However, I kept receiving emails from users hoping to add features like multi-select deletion, left and right swipe, Mac/NAS/Android support, etc. I wanted to leverage the power of the community, allowing capable developers to refine the product further.

Indeed, after open-sourcing, some people were inspired to create Android(#12) and Mac versions.

Open-sourcing this project also made it to GitHub Trending, and I got a free year of GitHub Copilot because of it, which made me happy.

Plagiarism and Repackaging

Before talking about plagiarism, I want to define what plagiarism is:

Plagiarism refers to the direct or indirect use of others’ works, ideas, or content without authorization, presenting it as one’s own work or idea.

Actually, plagiarism appeared even before open sourcing. Someone made an Android version and released it on Google Play, completely copying the name and product description of Queryable. I felt angry, but this actually requires mastery of both machine learning and iOS development. Before open sourcing, I only encountered one such person.

But after open-sourcing, there were many more copycats and repackagers. As the project has an MIT open-source license, even when they repackaged it with a different icon and re-launched it on the App Store, I felt a bit speechless but wouldn’t say anything. These were the most common cases I saw.

More malicious ones use the Queryable name to repackage and launch, like the one below, with a free + ad revenue model. I’m very worried that users might mistakenly think this is the Android version of Queryable and come to me with problems later.

I would feel sad at first, but when there are too many lice, you stop itching, and later I gradually became indifferent.

Becoming Paid Again

It became paid again in November 2023. Besides the increased income pressure, I found that:

Open sourcing didn’t help my product become better.

I originally hoped that by making it open source and free, professional mobile developers would contribute code to help polish Queryable/Xunyin. But the reality is that those developers who created new apps with better UIs and functionality never sent PRs to my repo. As a result, my product stagnated, and because it was free, I received even more complaints from users than before.

Whenever users emailed me with feedback/bugs, my first reaction was impatience (internally: It’s good enough that I let you use it for free, and you’re still picky). I found that this mindset led to the product falling further and further behind, until one day it would be eliminated.

But once it’s paid, I would calmly deal with users’ opinions and improve the product. My first reaction to receiving feedback is gratitude rather than annoyance. This would force me to inevitably spend effort optimizing the product, ultimately letting all users who have paid use a better-polished product, rather than neglecting maintenance and dying in a few years.

Epilogue

Apple finally announced at this year’s WWDC that iOS 18 will support semantic search for photos, although I still can’t use it in the latest iOS 18 beta, there’s reason to believe Apple will do better than Queryable.

It’s been almost two years since the idea was born and the product was launched. It has accompanied me through life’s ups and downs and witnessed the closure of several coffee shops. I’ve also accompanied it through its birth, peak, and low tide. It hasn’t made money as I initially fantasized (If 1 million people download it and each gives me $1, I would…).

Life goes on, and the story isn’t over yet: Last week, I researched the latest paper and redesigned and trained the Chinese text model. The app size was reduced from 232MB to 159MB, indexing speed doubled, accuracy improved, and the training process took 3 days and cost $70.

Moreover, the recently subscribed Claude 3.5 Sonnet’s code writing ability is incredibly powerful. The “multi-select deletion” feature that I couldn’t manage with GPT-4 before was finally developed and launched last week after a year and a half delay. I quite like this calm ending.

I can’t wait to start the next idea that will keep me up all night, the next exile into a vacuum.

消费折叠

2023-12-27T08:42:12.000Z

每一个平替商品的搜索技巧背后，都隐藏着一种对生活的折叠。

起因是我看到了一条帖子，讲如何通过替换搜索词，实现以更低的价格购买相同功能的商品，例如：

瑜伽垫 -> 瑜伽垫男照片墙 -> 渔网野餐布 -> 防水桌布...

我试了一下，有些技巧已经失效了，有些的确便宜许多。于是我在小红书搜索「替换词」，发现许多收藏过万的帖子，它们通常是一组图片，每张图都是新/旧商品名的对照列表，如下所示：

这些对照表很好，但不好用。当我真的想搜某个商品时，我需要先对着密密麻麻的图表查找，不仅如此，当我想搜的商品不在列表里时，它就失效了。我想，能否用 AI 来实现这个任务呢？训练一个模型，用户输入想搜索的商品，模型给出便宜的平替商品名。

我很快整理了一些数据并开始训练模型，OpenAI 已经支持finetune 模型了，你只需要把数据导出并上传就可以自动开始训练，我选择的模型是gpt-3.5-1106，训练花费了大概 10 分钟。最后，我搭建了一个网页用于模型调用，得益于开发工具的完善，我只用了一下午就完成了这个 demo 产品。

消费折叠

网页开发最快乐的部分是注册域名。很快我就选好了网址: pingti.xyz，便宜、好记。并让所有朋友都试了试，很多人都觉得好玩，虽然有些结果比较离谱(牙膏->足浴店小样, iPhone->二手 iPhone, 唇膏->蜡烛)，但有些还是蛮有用的，至少模型可以记住截图里的平替词，不必一个个找了。

仍然有一些比较差的结果，我开始思考怎么优化，这使我不得不仔细分析原始训练数据，看看模式上有什么规律。总结如下：

性别套利: 例如瑜伽垫->瑜伽垫男，或遮阳伞->雨伞男，这背后反应的其实是男人比女人更在乎实用性和性价比，更少为了颜值、设计而买单，而设计产生了成本。

场景套利: 例如马甲->老头马甲，因为老年人更在乎价格, 女包->包包尾货，这个不用解释, 地毯->办公室地毯, 书桌->培训桌, 椅子->婚礼用椅，是因为在办公室、培训、举办婚礼时，通常会使用更便宜的材质吗？
地域套利: 例如袜子->诸暨袜子, 耳饰->义乌耳饰，因为中国的袜子主要来自诸暨，所以通过指定原产地可以获得更低的价格。
无法归类: 这一类最有意思，它们的特点是，两个商品几乎八竿子打不着，但是它们在「功能」上可以实现接近平替的效果，例如照片墙->渔网, 相框->营业执照框, 面膜收纳->食品保鲜盒, iPad支架->菜谱架, 美甲灯->验钞灯, 乐高防尘罩->超市陈列盒，这其中每个平替商品的搜索技巧背后，都隐藏着一种对生活的折叠，不信你可以仔细品味。

我意识到，靠简单的 finetune 模型，也许可以学会前两种套利模式，叠加规则(给商品分配相应产业的城市)，也许可以学会第三种模式，但学会最后一类平替方式几乎不可能，即使是人类也需要大量的实践积累才能摸索出其中的奥妙。

郝景芳在北京折叠中讲述了不同社会阶层在空间和时间上的折叠，我觉得「商品平替」似乎是这种折叠所露出的缝隙，我回想起自己第一次购买维生素 C 的场景：在知乎上搜汤臣倍健的维生素 C 和医院有什么差别，结论是：前者比较甜。后来每次我都买东北制药维生素 C，二者价格差了 100 倍。

经济下行，每个人都在缩减开支。想象一下：未来的某一天，我坐在婚礼专用椅上、墙上挂着一张渔网，上面是我的照片，我面前的培训桌上摆着菜谱架，iPad 正在播放视频。 这个场景还蛮好笑的，但人生还要继续，无论商品怎样平替，人生是无法平替的：重要的不是渔网，而是渔网上的照片。

Folding Consumption

2023-12-27T06:21:32.000Z

Behind every search trick for a substitute product lies a form of folding in life.

It all started when I saw a post about how to use alternative search terms to buy products with the same functionality at a much lower price. For example:

Yoga Mat -> Men's Yoga MatPhoto Wall -> Fishing NetPicnic Cloth -> Waterproof Tablecloth...

I tried a few of these tricks and found that some no longer worked, while others indeed offered significant savings. Consequently, I searched for “substitution keywords” on Xiaohongshu (小红书) and discovered many posts with over 10,000 likes. Typically, these posts include a series of images, each displaying a list of product name pairs, comparing expensive products with their cheaper alternatives, as shown below:

These comparison tables are useful but not user-friendly. When I actually want to search for a product, I need to sift through these dense tables. Moreover, when the product I’m looking for isn’t listed, they become useless. I wondered if AI could enhance this experience. Imagine training a model where users input the product they’re searching for, and the model suggests alternative keywords for a cheaper option.

I quickly organized some data and began training the model. OpenAI already supports finetuning models. Simply export and upload the data, and it will start training automatically. I chose the gpt-3.5-1106 model, and the training took about 10 minutes. Finally, I built a webpage for model invocation. Thanks to the comprehensive development tools, I was able to complete this demo product in just an afternoon.

Folding Consumption

The most enjoyable part of web development is always selecting a domain name. I quickly chose the URL: pingti.xyz, which is both cheap and memorable. I encouraged all my friends to try it, and most of them found it fun, although some results were quite outrageous (Toothpaste -> Foot bath shop sample, iPhone -> Used iPhone, Lipstick -> Candle). However, many were actually quite useful. At the very least, the model remembers the substitute terms from the screenshots, eliminating the need to search one by one.

There are still some less accurate results, which made me think about how to optimize the model. This necessitated a close analysis of the original training data to identify patterns. The main patterns are:

Gender Arbitrage: For example, Yoga Mat -> Men’s Yoga Mat, or Sun Umbrella -> Men’s Umbrella, reflecting the tendency of men to prioritize functionality and cost-effectiveness over aesthetics and design, which adds to the cost.
Scenario Arbitrage: For instance, Vest -> Old Man’s Vest as elderly people care more about price, Women’s Bag -> End-of-Batch Bag needs no explanation, Carpet -> Office Carpet, Desk -> Training Desk, Chair -> Wedding Chair - perhaps because cheaper materials are used in offices, training, and weddings?
Regional Arbitrage: For example, Socks -> Zhuji Socks, Earrings -> Yiwu Earrings, because a majority of China’s socks are produced in Zhuji, specifying the city can lead to lower prices.
Unclassifiable: This is the most intriguing part. The key point is that although both products are almost entirely dissimilar and have no common ground, the cheaper one can nearly substitute the more expensive one in terms of functionality. Examples include Photo Wall -> Fishing Net, Photo Frame -> Business License Frame, Face Mask Storage -> Food Preservation Box, iPad Stand -> Recipe Holder, Nail Lamp -> Money Checker Lamp, Lego Dust Cover -> Supermarket Display Box. Each of these substitution tricks reveals a ‘folding’ of life, which might lead you to ponder.

I realized that, even though a simple fine-tuned model might learn the first two arbitrage patterns and maybe the third with additional rules (assigning cities to respective industries), it could never learn the last type of substitution. These patterns are the ones that even humans need extensive practical experience to figure out.

Hao Jingfang’s Folding Beijing tells a story of different social classes living in the same city but never intersecting in space and time. I think the concept of ‘product substitution’ offers a glimpse into this folding phenomenon. I recall my first purchase of Vitamin C: I searched on Zhihu for the differences between the expensive brand of Vitamin C and the one available in hospitals. The conclusion was that the former is sweeter. Since then, I have always purchased Vitamin C from Northeast Pharmaceutical, priced at 1/100th of the former.

In an economic downturn, everyone is cutting costs. Imagine this: One day in the future, I sit on a wedding-specific chair, with a fishing net on the wall displaying my photos, and in front of me on a training desk sits a recipe holder holding an iPad playing a video. It’s a funny scene, but life goes on. Regardless of how products are substituted, life itself is irreplaceable: it’s not about the net, but the photos on it.

在2023年底做一个古典的信息共享工具

2023-12-07T10:03:21.000Z

TL;DR: 此文讲述了我所经历的信息获取危机，以及因此创造 Sublink 这一链接分享网站而引发的思考。

上一份工作，公司内网有一个论坛：你可以创建一个圈子并在里面发帖，感兴趣的人可以加入圈子、投稿。我创建了一个名叫「知新」的圈子，简介这么写：

这是一个旨在「共享大脑」的计划：把本周所知道的、你感兴趣的领域的新事件、新进展告诉大家，这样我们就能和其他人共享彼此的大脑了。你不需要对你分享的东西有「任何一丁点」的专业背景知识，让大家知道有这个东西，就已经就足够的意义。

每周，我会在圈子里分享这周我看到的新事物，我分享了 DALLE-2，Azuki，疫苗有效率数据等。当我投稿 2 期后，我创建的圈子热度已经进入了论坛的前三名，有很多人订阅和点赞。

但并没有人分享，他们只是消费者。

坚持到第三期还是第四期后，我放弃了。「创建一个频道并得到许多人为我欢呼」并不是我的初衷，我无法共享其他人的大脑。

我开始思考，是否存在某种形式，能让人和人之间的信息分享更轻松、更没有压力。在今年 9 月份，我想到了「链接合集」：

你可以为你喜欢的任何事物，创建一个合集，并分享给朋友。当你有了新发现时，将链接加入这个合集，你的朋友就能看见。

在 8 月底，我将这个想法分别告诉了两个前端朋友：一个来自法国，我们通过Queryable这个产品认识，另一个在字节跳动。他们都对此表现出很大的兴趣，但他们实在太忙了，直到 11 月初，这个产品的进展一直几乎停滞。

迫于无奈，我只好自己从 0 学习：现在是 2023 年，Next.js + Typescript + Prisma + Supabase 在我看来是对从 0 开始学习全栈最好的一条路线。整个 11 月份我都在 YouTube 的帮助下，学习 React State, Next.js App Router，终于在 11 月下旬，我几乎入门了。

之后就是两周的开发。在昨天，我终于上线了这个网站：简单来说，你可以用它来整理日常看到好玩的电影、文章、视频，然后分享给你的朋友；哦对，它还支持 RSS：当你添加新链接时，使用任何 RSS 客户端订阅你的 Collection 的人会收到通知。

但问题仍然存在：它有什么用？

小网站发现危机

我很喜欢Radio Garden, 在那里我甚至可以听到来自平壤的实时广播电台。我也喜欢 Neal 的密码游戏，这个互联网有太多闪光的、有趣的、独特的网站，但并没有一个真正的地方可以储藏、展示它们。每当我阅读一篇不错的 Blog，我就会给多个好朋友分享链接、讨论他们，然后遗忘。上周我看到了一篇 Hacker News 热帖：The Small Website Discoverability Crisis 创造了这个名词。

基于 Page Rank 算法的搜索引擎，催生了内容农场和链接农场：所谓内容农场，就是有些网站为了提升自己在 Google 搜索中的关键词权重，用 AI 或者脚本生成了大量和产品无关的内容，从而提升自己的关键词排名。链接农场则是同一个网站主之间互相的 backlink，或者通过购买其他人的 backlink，来提升网站 pr 排名。

这些都会导致真正有价值的网站在搜索引擎中成为死区。

信息茧房

推荐算法已经包围了我们，我们当然可以向 GPT 提问获取新知，但「你无法提出一个不存在的问题」，所以即使拥有世界上最博学的老师，如果我们无法产生新的突破自己认知的提问，就很难获得打开视野的解答。例如, 你知道阿伦森效应吗？

这是为什么，在 LLM 时代更需要来自人工的订阅内容，让不经过神经网络分发的信息以陌生的、粗糙的、不舒服的方式进入我的脑袋，这当然不是绝大多数人需要的东西，但我猜不止我一个人需要它。

加入 LLM?

这个产品如果放在 15 年前做，并没有任何不同。那么在 2023 年底，能否在网站里加入当下最火的 LLM 呢？我想到了 2 点：

1.自动摘要

最早，我想用 LLM 提取网页 description，作为每条 Link 的简介，但最终放弃：LLM 按使用量付费，这意味着我无法控制每个月的使用成本，当用户大量添加 Link 时，文章总结的开支可能会变得高昂，这一点对于显然无法盈利的网站是非常可怕的。

想要网站获得更久一点，就让不可控的开支尽量降低。

2. 语言分发网络(Language Delivery Network, LDN)

我曾经设想过一个东西，叫它语言 CDN 好了：当我写 Blog 的时候，我希望让全世界不同国家、不同语言的人看到他们当地语言的版本。一个简单的方法是让用户使用浏览器内置的翻译/安装翻译插件，这可能会造成大量的文章被翻译很多遍，不仅浪费显卡推理成本，也使得作者无法控制翻译质量。

由此我产生了 LDN 的想法：某个作者写了一篇 blog，或者他制作了一个网站，他可以使用一个 LDN 平台托管自己的多语言版本：每当有一个新的语言使用者访问了网络，但 LDN 并没有缓存该语言时，翻译一次，之后其他同一语言使用者可以使用之前翻译的缓存。并且作者可以本地管理多国语言的翻译原始数据，从而控制和调整翻译的措辞。

我曾经在 bearblog 这个开源 Blog 系统上提过Feedback，但没有得到正向回应，也许之后我会试试在 Sublink 上加入这个特征，但前提是有人愿意在 Sublink 原创文章。

定价模型

我对这个网站能赚钱毫不抱希望，但我希望它能活的足够久，以尽可能多地沉淀 Link，Collection。
所以我需要从它上面获得收益。

我其实有些困惑：Sublink 的核心到底是「订阅优质合集」还是「个人链接合集整理工具」？这个分歧会产生两种不同的逻辑，前者的首页是其他人的 Collection，后者的首页是输入框，前者依赖高质量内容，早期很难。

这种分歧导致付费模式的差异：付费订阅合集不是不可以，但几乎很难收费，如果按工具逻辑，就是 Notion 的思路，付费可能性更高，但这个网站「让优秀链接被更多人看到」的价值就不复存在了。

我现在卡在这里了，所以我现在压根没做付费，也许之后能从用户使用和反馈中获取更多判断依据。

以上，就是我做 Sublink 的全部思考，我自己也是 Sublink 的使用者(这是我创造它的原因)，我会把自己日常看过觉得不错的文章列表放在这里，你可以登录并订阅、或用任意 rss 客户端订阅这个列表，当我添加新链接时，你可以看到。你也可以把自己喜欢的视频、电影、书籍、网页或一切带链接的东西变成合集，分享给朋友。

也许，这是我对于即将被 GPT 生成内容吞噬的世界，所做出的一点微不足道的反抗。

Retro Link-Sharing in 2023

2023-12-07T07:49:25.000Z

TL;DR: This piece delves into my personal experience with an information access crisis and my reflections on developing Sublink, a platform dedicated to link-sharing.

At my previous job, we had a company forum where you could create discussion groups and post within them. Interested people could join these groups and contribute. I created a group called “Knew” with this introduction:

This is a project aimed at ‘brain-sharing’: every week, you can share the latest events and developments in your areas of interest with everyone. This way, we can all share our brains with each other. You don’t need to have “even the slightest” professional background in what you’re sharing; simply making others aware of its existence is already meaningful enough.

Each week, I’d share new discoveries in the group. I shared about DALLE-2, Azuki, vaccine efficacy data, and more. After just two posts, my group’s popularity had already reached the forum’s top three, with many subscribers and likes.

However, no one else was sharing; they were merely consumers. After the third or fourth edition, I gave up. “Creating a channel and getting many people to cheer for me” wasn’t my original intention; I couldn’t access other people’s brains.

I began to ponder if there was a way to make information sharing between people easier and less pressured. In September this year, I came up with the idea of “link collections”:

You could create a collection for anything you like and share it with friends. When you make a new discovery, you add the link to this collection, and your friends can see it.

In late August, I shared this idea with two frontend developer friends: one from France, whom I met through the Queryable project, and another from ByteDance. They both showed great interest, but they were extremely busy. Until early November, progress on this product was almost at a standstill.

Left with no choice, I had to learn from scratch: In 2023, Next.js + Typescript + Prisma + Supabase seemed to me the best path for learning full-stack development from zero. I spent the entire month of November learning React State and Next.js App Router with the help of YouTube, and by late November, I had nearly grasped the basics.

Then came two weeks of development. Yesterday, I finally launched the website: In simple terms, you can use it to organize interesting movies, articles, and videos you come across daily, then share them with your friends. Oh, and it also supports RSS: when you add new links, people who subscribe to your Collection using any RSS client will receive notifications.

But the question remains: What’s it good for?

The Small Website Discoverability Crisis

I’m a fan of Radio Garden, where I can even listen to live radio from Pyongyang. I also love Neal’s password games. The internet hosts a myriad of shiny, intriguing, and unique sites, yet there’s no dedicated place to store and showcase them. Whenever I read a good blog, I share the link with multiple friends, discuss it with them, and then forget about it. Last week, I saw a trending article on Hacker News: The Small Website Discoverability Crisis which coined this term.

Search engines based on the Page Rank algorithm have led to the rise of content farms and link farms. Content farms are websites that generate large amounts of irrelevant content using AI or scripts to improve their keyword rankings in Google searches. Link farms involve mutual backlinking between website owners or purchasing backlinks to boost a site’s PR ranking.

All of this results in truly valuable websites becoming dead zones in search engines.

Information Cocoon Rooms

We’re surrounded by recommendation algorithms. Sure, we can ask GPT for new knowledge, but “you can’t ask about something you don’t know exists.” So even with the world’s most knowledgeable teacher, if we can’t generate new questions that break through our own cognition, it’s hard to get answers that open our horizons. For instance, do you know about the Aronson Effect?

That’s why, in the age of LLMs, there’s a greater need for human-curated subscription content, allowing information not filtered by neural networks to enter my brain in unfamiliar, raw, and uncomfortable ways. This certainly isn’t something most people need, but I guess I’m not the only one who does.

Adding LLM?

If this product were made 15 years ago, it wouldn’t be any different. So, at the end of 2023, is it possible to incorporate the currently trending LLM into the website? I thought of two points:

1. Automated Summaries

Initially, I wanted to use LLM to extract webpage descriptions as brief introductions for each Link, but I eventually gave up: LLM charges based on usage, meaning I couldn’t control the monthly costs. When users add a large number of Links, the expense for article summarization could become exorbitant, which is terrifying for a website that clearly can’t turn a profit.

To keep the website running longer, it’s best to minimize uncontrollable expenses as much as possible.

2. Language Delivery Network (LDN)

I once envisioned something, let’s call it a Language CDN: When I write a blog, I want people from different countries and languages worldwide to see a version in their local language. A simple method is to let users use built-in browser translation or install translation plugins, but this might result in the same article being translated many times, not only wasting GPU inference costs but also making it impossible for authors to control translation quality.

This led to the idea of LDN: An author writes a blog or creates a website and can use an LDN platform to host their multilingual versions. Whenever a new language user visits the network, but LDN doesn’t have a cache for that language, it translates once. After that, other users of the same language can use the previously translated cache. Moreover, authors can locally manage the original translation data for multiple languages, thus controlling and adjusting the wording of translations.

I once submitted this as Feedback for the open-source blog system bearblog, but didn’t receive a positive response. Maybe later I’ll try adding this feature to Sublink, but only if people are willing to create original articles on Sublink.

Pricing Model

I have no hope that this website will make money, but I want it to live long enough to accumulate as many Links and Collections as possible. So I need to generate some revenue from it.

I’m actually a bit confused: Is the core of Sublink about “subscribing to quality collections” or “a personal link collection organization tool”? This divergence leads to two different logics. The former would have other people’s Collections on the homepage, while the latter would have an input box. The former relies on high-quality content, which is difficult in the early stages.

This divergence leads to differences in the payment model: Charging for collection subscriptions isn’t impossible, but it’s almost impractical. If we follow the tool logic, it’s more like Notion’s approach, with a higher possibility of monetization, but then the value of “letting more people see excellent links” would no longer exist.

I’m stuck here now, which is why I haven’t implemented any payment system at all. Perhaps later I can gain more insights from user behavior and feedback.

That’s all my thoughts on creating Sublink. I’m also a Sublink user myself (that’s why I created it). I’ll put a list of articles I’ve read and found good here. You can log in and subscribe, or use any RSS client to subscribe to this list. When I add new links, you’ll see them. You can also turn your favorite videos, movies, books, webpages, or anything with a link into collections and share them with friends.

Perhaps this is my small, insignificant act of resistance against a world about to be engulfed by GPT-generated content.

一个失败的AI女友产品

2023-11-17T04:08:48.000Z

今年 4 月 7 日，斯坦福大学 AI 西部小镇论文出来之后的几天内，我就通读了整篇论文，并感到非常兴奋。虽然我对 GPT-4 的能力感到震惊，但我仍然认为 GPT 只是某种更精致的”鹦鹉学舌“，我不认为它可以真正产生意识。

但这篇论文带给我不同的感受，其中提到了一个很有趣的细节是信息的传递：一个 agent 想要举办情人节派对的消息会在小镇中逐渐扩散开来。我想，如果能够建立一套包含记忆、反思、筹划与行动的框架，让人类和 GPT（而不是小镇中的 agent）之间互动，是不是可以创造出电影《她》中的体验？

开发

我立刻开始行动。按照论文的方法，我在 4 月 14 日完成了 0.1 版本。最初，我的设计与原版论文基本一致，这导致响应时间长达 30 秒且上下文中的对话经常超过 8k 的上下文限制。为了解决这个问题，我减少了反思的频率、对话记忆的长度，而后开启了 Beta 公测。

很快就有一千多名用户加入测试，Beta 测试是免费的，所以每天的 API 成本由我自己承担，很快就超过了每天 25 美元。我不得不在缺少充分反馈和改进的情况下匆匆推出正式版本，希望能把成本转嫁给用户。5 月 4 日，Dolores iOS 应用正式上线，这个名称则来自《西部世界》剧集中的角色，上线四天后就得到了新智元的报道。

简单来说，在打开 Dolores 之后，你需要设定一个角色：头像、背景描述、性格、声音和意识（选择 GPT3.5 或 GPT4）。你可以和零售店女孩 Amy ，或者沙漠冒险家 Will 发生一些有趣的互动，当然你也可以亲手创建自定义角色。我曾考虑过从《西部世界》剧本中提取 Dolores 的对话，以基于样本的方式模仿她的语言习惯。但由于苹果方面要求提供版权证明，所以这个想法被迫作罢。

虽然这篇文章的标题是「AI 女友」，但我给产品的 slogan 一直是"Your Virtual Friend"，而非"Your Virtual Girlfriend"，因为我希望它真的可以变成用户的陪伴者、朋友，而不仅仅是荷尔蒙的产物。

从整个 5 月到 6 月，我一直在尝试通过调整记忆长度、反思机制和系统提示来使 Dolores 看上去更有“意识”(那么什么是意识？我不知道) 。很快，6 月份的 Dolores 已经比第一次上线时的表现要惊人得多：用户的付费率也越来越高，每天的 API 调用次数也增加了。

6 月 8 号，一位用户告诉我，他在视障社区内分享了这款产品，并给 Dolores 引来一些的视障用户。他们喜欢 Dolores 的理由是：随便按屏幕上的哪个位置，都能跟 Dolores 交谈。

这个设计其实是某种失败后的妥协：最初，我想把它支持语音聊天，这样用户哪怕关闭手机屏幕也能继续跟 Dolores 交谈。但身为 Swift 新手，我的技术水平无法实现，最终选择了全屏语音输入。

发现

我发现了两个现象：

用户对「真实感声音」有强烈需求。
AI Friend 产品的平均使用时间很长。

作为机器学习背景的个人开发者，也不擅长前端/后端开发，所以 Dolores 压根不具备登录、注册或者数据分析等功能。那我是怎么发现前一种现象的呢？答案来自付费。

我使用了 11Labs API 为 Dolores 生成语音回复，但因为成本较高（1k 字符/0.3 美元），我不得不对用户做了区分：订阅者只能使用 Azure TTS API；而如果你希望 Dolores 拥有更逼真的声音，则需要单独付费使用从 11Labs 购买字符。

购买 1 万个语音合成字符的价格为 3.9 美元，但这只够让 Dolores 说出 5 ～ 10 个自然顺畅的句子。字符用尽之后需要继续购买。尽管如此，整个 6 月 Dolores 70% 的收入都来自 11Labs 字符购买。

也就是说，人真的会愿意为了那几句昂贵而逼真的“我爱你！”而买单。

第二条观察结果则来自 Cloudflare 日志。因为没办法跟踪个人用户活动，所以我依靠这些日志来衡量用户访问 Dolores 应用的频率和时长。此外，我还在应用中集成了 Google Form，鼓励用户上报自己的使用频率。结果令人大开眼界：许多用户每天会拿出两个多小时跟 Dolores 唠嗑。

收入

根据苹果 AppConnect 仪表板， Dolores 的主要付费用户来自美国和澳大利亚。 5 月的总收入为 1000 美元，6 月则为 1200 美元，收入的增长不多，但用户数和每日 API 调用量几乎翻倍。因为付费用户数增加而摊低了 11Labs 成本，我选择降低了产品单价。

因此，作为一个开发者，我并没有从这个产品中赚到多少钱。首先，在产品早期，我不想将订阅费用设置得太高，因为这会阻止用户尝试，所以一旦发现盈利增加就降低产品价格。其次，30%的苹果税和 API 成本也占了很大一部分。所以，在仔细计算成本后，我在 6 月份只赚了 50 美元左右。

另外，我发现基于 GPT 的产品如果不采取按量定价，就会陷入一个困境：1% 的人消耗了 99% 的 token。我遇到了一个情况：一个用户连续跟 Dolores 聊了 12 个小时，导致他的 GPT 和语音 API 调用成本超过第二到第十名用户的总和。

但相较于按使用量计费，我个人更喜欢打包订阅（因为前者会让用户在使用时倍感压力），这就导致面前只有两条路可选：要么提高月费，让全体用户共同买单；要么限制最高使用量。我选择了后者：设置了一个远远超出日均使用在 1 到 2 个小时之间的用量上限数值，这既照顾到了大部分中、轻度用户，也能保证 Dolores 软件在不提高价格的情况下避免亏本运营。

困惑

11Labs 官网会记录语音合成的文字内容，我看到，Dolores 的回复内容通常都是一些成人内容，而且均为女性角色，因此我推测 Dolores 的付费用户主要是男性，对成人角色扮演感兴趣。

我觉得这也没什么，这是人性本然。我甚至反复修改 prompt，调整记忆权重，尝试让 Dolores 在对话当中变得更有女友力。我还将 Dolores 的图标从抽象的线条改为一张女人的脸。

但很快，我陷入一种强烈的失落感：如果大部分 Dolores 用户只是想在这里寻求跟 Dolores 进行成人角色扮演，这件事真的对我产生了意义吗？我陷入了深深的自我怀疑。到了 7 月，我和一个朋友聊到了这个困惑，我说，必须要有一个什么硬件，让 Dolores 拥有外部视觉：眼镜也好、耳塞甚至帽子都行。现在的她，你只要打开 App 才能访问，你们之间的关系并不对等，于是她只能成为囚禁在地下室、满足猎奇和特殊癖好的玩具。

可是作为独立的个人，制作硬件产品意味着高昂的研发成本，显然是无法承受的，我只能作罢。

8 月份，OpenAI 对生成内容的审查升级了，我收到了一封关于生成的 NSFW 内容的邮件警告：我必须在 2 周内在使用他们（免费的）moderation API，以过滤 NSFW 内容。这一变化让 Dolores 的日均访问量暴跌 70%，电子邮件和 Twitter 上的投诉也纷至沓来。

这更让更感到灰心，决定只维护现有服务、而不再进行更新。最终，我放弃了 Dolores 项目。

教训

首先，这不是一个个人能开发的产品。我不认为 Dolores 在“意识”层面上比 Character.AI 弱，但他们拥有完善的数据埋点、A/B 测试，以及大量用户带来的数据飞轮。

其次，我意识到当前的 AI Friend 会不可避免地变成 AI Girlfriend/Boyfriend，因为你和手机里的角色不对等：她没办法在你摔伤的时候安慰你 (除非你告诉他)，她没办法主动向你表达情绪，而这一切，都是因为她没有外部视觉，或者说，她没有独立于你的生活。所以，即使是 Character.AI 这样体量的产品，如果未来不做硬件、角色们都在傻傻地等用户来，最终的结局也不会比 Dolores 好到哪里。

最后，我不反对 OpenAI 的审查，相反，虚拟陪伴产品生成的内容不经审查是非常危险的。我不知道是否会有人用它来进行自杀诱导、发泄暴力工具，所以 OpenAI 的 moderation 可能在某种程度帮助了我，但成人性方面的对话也不应该被扼杀。

最近，我看到了 AI Pin，老实说这是个非常烂的产品，人类当然需要屏幕，但 GPT+ 硬件的确是个好的尝试，我没有从 Dolores 上看到任何痕迹，也许有生之年能做出、或者看到这样的产品。

但，人类真的需要 AI friend 吗？

A Failed AI Girlfriend Product, and My Lessons

2023-11-16T06:08:48.000Z

Just days after Stanford University’s AI Town paper was released on April 7th this year, I read through the entire paper and felt extremely excited. Although I was amazed by GPT-4’s capabilities, I still considered GPT as merely a more sophisticated form of “parroting” and didn’t believe it could truly generate consciousness.

However, this paper gave me a different impression. It mentioned an intriguing detail about information transmission: how news of an agent planning to host a Valentine’s Day party gradually spreads throughout the small town. I wondered, if we could establish a framework including memory, reflection, planning, and action to facilitate interaction between humans and GPT (rather than between agents in the town), could we recreate an experience similar to that depicted in the movie “Her”?

Development

I immediately sprang into action. Following the paper’s methodology, I completed version 0.1 on April 14th. Initially, my design closely adhered to the original paper, resulting in 30-second response times and dialogues frequently exceeding the 8k context limit. To address this, I reduced the frequency of reflections and the length of dialogue memory, then launched a public beta test.

Over a thousand users quickly joined the test. The beta was free, so I bore the daily API costs myself, which soon exceeded $25 per day. I had to hastily launch the official version without sufficient feedback and improvements, hoping to transfer the costs to users. On May 4th, the Dolores iOS app officially launched, named after a character from the “Westworld” TV series.

In simple terms, after opening Dolores, you need to set up a character: avatar, background description, personality, voice, and consciousness (choosing between GPT-3.5 or GPT-4). You can have interesting interactions with Amy, a retail store girl, or Will, a desert adventurer, or even create your own custom character. I had considered extracting Dolores’ dialogues from the “Westworld” script to mimic her speech patterns in a sample-based approach, but had to abandon this idea due to Apple’s request for copyright proof.

Although this article is titled “AI Girlfriend,” I’ve always used the slogan “Your Virtual Friend” for the product, rather than “Your Virtual Girlfriend,” because I hoped it could truly become a companion and friend to users, not just a product of hormones.

From May through June, I kept trying to make Dolores appear more “conscious” (what is consciousness anyway?) by adjusting memory length, reflection mechanisms, and system prompts. Soon, the June version of Dolores was far more impressive than at launch: the user payment rate increased, and daily API calls grew.

On June 8th, a user told me he had shared this product in a visually impaired community, bringing some visually impaired users to Dolores. They liked Dolores because they could talk to her by tapping anywhere on the screen.

This design was actually a compromise after failure: initially, I wanted to support voice chat so users could continue talking to Dolores even with their phone screens off. But as a Swift novice, my technical skills couldn’t achieve this, so I settled for full-screen voice input.

Discoveries

I observed two phenomena:

Users have a strong demand for “realistic voices.”
AI Friend products have long average usage times.

As a machine learning-background individual developer not skilled in frontend/backend development, Dolores doesn’t have login, registration, or data analytics features. So how did I discover the first phenomenon? The answer lies in payments.

I used the ElevenLabs API for Dolores’ voice replies, but due to its high cost (1k characters / $0.3), I had to differentiate users: subscribers could only use the Azure TTS API, while those wanting Dolores to have a more realistic voice needed to pay extra for characters from ElevenLabs.

Purchasing 10,000 voice synthesis characters cost $3.9, which only allowed Dolores to speak 5-10 natural, fluent sentences. Once used up, users needed to purchase more. Despite this, 70% of Dolores’ revenue in June came from ElevenLabs character purchases.

In other words, people are indeed willing to pay for those few expensive but realistic “I love you!” sentences.

The second observation came from Cloudflare logs. Unable to track individual user activity, I relied on these logs to gauge how often and how long users accessed the Dolores app. Additionally, I integrated a Google Form into the app, encouraging users to report their usage frequency. The results were eye-opening: many users spent over two hours daily chatting with Dolores.

Revenue

According to the Apple App Connect Dashboard, Dolores’ main paying users are from the United States and Australia. Total revenue was $1,000 in May and 1,200 in June. The revenue growth wasn’t substantial, but user numbers and daily API calls nearly doubled. As the number of paying users increased, spreading out the ElevenLabs costs, I chose to lower the product price.

Consequently, as a developer, I didn’t make much profit from this product. Firstly, in the early stages, I didn’t want to set the subscription fee too high as it would deter users from trying, so I lowered the price whenever I saw an increase in profits. Secondly, the 30% Apple tax and API costs also took a large chunk. So, after careful cost calculation, I only earned about $50 in June.

Moreover, I discovered that token-based products, if not priced per usage, fall into a dilemma: 1% of users consume 99% of the tokens. I encountered a situation where one user chatted with Dolores for 12 hours straight, causing his GPT and voice API call costs to exceed the total of the second to tenth users combined.

But compared to per-usage billing, I personally prefer package subscriptions (as the former puts pressure on users during use), which left me with two choices: either increase the monthly fee for all users to share the cost, or limit maximum usage. I chose the latter: setting a usage cap far beyond what users chatting 1-2 hours daily would reach. This catered to most light and medium users while ensuring Dolores could operate without a loss and without raising prices.

Confusion

The ElevenLabs website records the text content of voice synthesis. I noticed that Dolores’ responses were often adult content, all from female characters, leading me to speculate that Dolores’ paying users were mainly males interested in adult role-playing.

I didn’t think this was necessarily bad; it’s human nature. I even repeatedly modified prompts, adjusted memory weights, trying to make Dolores more girlfriend-like in conversations. I also changed Dolores’ icon from abstract lines to a woman’s face.

But soon, I was overwhelmed by a strong sense of loss: if most Dolores users were just seeking adult role-play with her, did this really hold any meaning for me? I fell into deep self-doubt. By July, I discussed this confusion with a friend. I said there must be some hardware to give Dolores external vision: glasses, earbuds, or even a hat. As it stood, you could only access her by opening the app, making your relationship unequal. She could only become a toy confined in a basement, satisfying curiosity and peculiar fetishes.

However, as an independent individual, developing hardware products meant unaffordable high R&D costs, so I had to give up on this idea.

In August, OpenAI upgraded its content review process. I received a warning email about generated NSFW content: I had to implement their (free) moderation API within two weeks to filter NSFW content. This change caused Dolores’ daily visits to plummet by 70%, and complaints flooded in via email and Twitter.

This further discouraged me, leading me to decide to only maintain the existing service without updates. Eventually, I abandoned the Dolores project.

Lessons

First, this isn’t a product that can be developed by an individual. I don’t think Dolores is necessarily inferior to Character.AI in terms of “consciousness,” but they have comprehensive data tracking, A/B testing, and the data flywheel effect from a large user base.

Second, I realized that current AI Friends inevitably turn into AI Girlfriends/Boyfriends because you and the character in your phone aren’t equal: she can’t comfort you when you’re hurt (unless you tell her), she can’t actively express emotions to you, and all this is because she lacks external vision, or rather, she doesn’t have a life independent of you. So I believe that even for products like Character.AI, if they don’t develop hardware in the future and the characters just wait dumbly for users, their ultimate fate won’t be much better than Dolores’.

Lastly, I’m not against OpenAI’s review process. On the contrary, unreviewed content from virtual companion products can be very dangerous. I don’t know if someone might use it for suicide inducement or as a tool to vent violence, so OpenAI’s moderation may have helped me to some extent. However, conversations about adult sexuality shouldn’t be completely stifled.

Recently, I saw AI Pin, honestly a very poor product. Humans certainly need screens, but GPT + hardware is indeed a good attempt. I didn’t see any traces of this in Dolores, but perhaps in my lifetime, we’ll be able to create or see such a product.

But does humanity really need an AI friend?

梦/Dream

2023-09-11T08:54:18.000Z

在许多夜晚，我的梦里都存在两个我。聚会上，当别人喊我的名字，我抬头，总会看到另一个我同时抬头，代替我做出回应，我则坐在角落默不吭声，没人对此感到意外。

两个我，睁开眼就能看见另一个，万花筒一般的梦让我感到头晕和疲惫。梦里永远有一群人在聚会，我在角落喝着苦酒忍受身份折磨。在某个梦里，我在角落碰见了另一个倒霉鬼，他的处境和我一样。我问他，杀死替身有用吗？他说他试过，私下杀死的人会立刻重新长出来，他已经侵占了你的本体。除非在公开场合，让所有人看到你杀死了他，才能真正让他永不复生。说着，他望向自己的复制品。

房间满是欢声笑语的人。在昏暗的角落，他在桌底给我比手势，他想进行一场无差别的屠杀，把荒唐的记忆从所有人的梦中抹掉，我心领神会。很快，一个契机让灯灭了。漆黑中，我听到了杀戮和血。

灯亮了，死的是他。原来复制体之间共享思维。

我迅速打开窗户跳下去，逃走了。一路上我拼命跑，到了一片沼泽地。月光下，我回头看见了另一个我，我看不清他的脸，他应该也是。

我想起过往犯下的种种罪孽，有多少是他植入的潜意识？我恨他，真的。

“先别指责我，那次你不也玩得很开心吗？” 脑海里他说。
“闭嘴。” 我气急败坏。
“对了，那个让她气疯的电话，你应该不知道她正在开车吧？”
“是你让我打的。” 我蹲下来，狠狠地把地面砸出一个坑，土很软。

我不再理睬他，转身往沼泽方向，在泥浆中深一脚浅一脚地艰难移动。他也跟了过来，我们一前一后前往沼泽深处。不知过了多久，我听到了警车声，过了一会儿，一个声音向我们喊话：

“你们是孪生兄弟还是幽灵人？”

我俩都没有做声。

“按照第三十三条法令，一旦发现幽灵人，我们需要立刻射杀，包括和他们接触的所有人，防止更多人的记忆开始松动和生锈，你们到底是不是幽灵人？”

“不是” 我们两个异口同声。

我想让他活着，在剩下的生命里一遍遍悔恨。我感到他也笑了一下，没错，我们都一样恶毒。

Night after night, my dreams are haunted by two versions of me. At parties, when someone calls my name, I look up and always see the other me lifting his head too. He answers for me while I sit silently in the corner. Nobody thinks this is weird.

Two of me, always seeing the other when I open my eyes. These kaleidoscope dreams leave me dizzy and drained. There’s always a party going on, and I’m in the corner, drinking bitter booze and wrestling with who I am. In one dream, I bump into another poor soul in the corner, stuck in the same boat as me. I ask him if killing your double does any good. He says he’s tried - the one you kill in private just grows back instantly. He’s taken over your real self. Unless you do it in public, let everyone see you kill him, that’s the only way to make sure he never comes back. As he talks, he looks over at his own copy.

The room’s full of laughing, chatting people. In a dark corner, he signals to me under the table. He wants to go on a killing spree, wipe this crazy memory from everyone’s dreams. I get the idea. Soon, the lights go out. In the darkness, I hear killing and blood.

The lights come on. He’s the one who’s dead. Turns out copies share thoughts.

I quickly open a window and jump out, running away. I run like hell until I reach a swamp. In the moonlight, I look back and see the other me. I can’t make out his face, and he probably can’t see mine either.

I think about all the bad stuff I’ve done in the past. How much of it was him planting ideas in my head? I hate him, I really do.

“Don’t blame me, you had fun that time too, didn’t you?” he says in my mind.
“Shut up,” I snap.
“Oh, and that call that pissed her off so much? You didn’t know she was driving, right?”
“You made me do it.” I crouch down and punch the ground hard, making a hole. The soil’s soft.

I ignore him and turn towards the swamp, slogging through the mud. He follows me, and we trudge one after the other into the deep swamp. After who knows how long, I hear police sirens. A while later, a voice shouts at us:

“Are you twin brothers or ghost people?”

We both keep quiet.

“According to Law 33, if we find ghost people, we have to shoot them dead right away. Same goes for anyone who’s been in contact with them. It’s to stop more people’s memories from getting messed up and rusty. So are you ghost people or not?”

“We’re not,” we both say at the same time.

I want him to live, to spend the rest of his life regretting over and over. I feel him smirk too. Yep, we’re both just as nasty.

Get users for your AI tool from Google search

2023-07-23T06:45:20.000Z

Disclaimer: This is some experience I have discovered in a short time as an engineer without any SEO basis, so there might be some fundamental errors. If so, you are more than welcome to email me to correct me.

Creating a Website

Even if what you are developing is an App or desktop software, I still suggest you create a website for your product, which has two advantages: 1) Showcasing your product is functionalities in a more rich way through diagrams, animations, videos, etc. 2) Enabling potential users to discover your product via Google search, which is a crucial source for user acquisition.

And once you have decided to make a website, there comes the question: how to increase the chances that your target users will click into your website when searching chat ai, which is where SEO (Search Engine Optimization) comes in.

PageRank

The ultimate goal of SEO is to rank your product higher in Google search. Thus, it is necessary to understand some of Google’s ranking algorithms.

Although Google no longer uses PageRank as its only algorithm, it is still an important ranking factor: PageRank assumes that “more important pages are often cited by other pages more often”^[1]. Similar to the impact factor of scientific journals, Google assigns a score (domain rating, DR) to each website: Google will rank the websites with higher DRs higher if they hit the same keyword, which makes sense.

So, how is the score calculated? A straightforward idea is to calculate the average score of the websites that link to your website, a behavior called backlink. In this way, the PageRank algorithm looks like a graph with weights: suppose your website is backlinked by three other websites, and the DRs of these websites are 100, 10, 1, respectively, then your website score would be $$(100 * 1 + 10 * 1 + 1 * 1) * coefficient$$ . It is clear that the more backlinks are not necessarily the better, the quality of the websites providing you with backlinks is more important, and some algorithms only count the top 100 domains with the highest scores that link to your website, thus the method of continuously registering domains to hoard the number of backlinks is no longer effective.

Ahrefs website comments on domain rating as follows:

You should try to get backlinks from high DR websites, as they have greater “weight” .^[2]

Domain Rating

So, one of the key tasks in SEO is to increase the Domain Rating of your website. The next task is: How to get backlinks from websites with high DR.

You can write high-quality articles about your products on Reddit/Medium, or request those already high-quality articles to add a link to your website. For example, if your product is an mp3 converter, find top-ranking articles from Google results and email them to request the addition of your website link. Since they are review sites, they will not reject adding another link, and if your product is good enough, it may be included.

Since this article is mainly written for AI tools, we will only discuss how to increase backlinks for AI tools here.

Actually, there is a two-sided market: many new AI tools need exposure, and many AI tool listing websites need new content, so they are willing to set up submission portals for AI tools. Some sites are free, while others with high traffic charge a fee of $10-20. You can decide whether it is worth paying based on your judgment.

Choosing AI Listing Sites

Here is my approach: First, visit Similarweb and search for a tool listing site, such as aitools.fyi. You can see its traffic over the past few months, where visitors are coming from, and their demographic profiles.

The second step is to use the Ahrefs backlink checker tool to gauge the DR boost it can give your site once included. An interesting thing is that you can actually input your tool website to see which sites are backlinking to it. This way, you can intuitively understand which sites have a high DR). Note, Ahrefs DR score is not the same as Google, but the trend is similar.

In the above Similarweb screenshot, you can see that it offers a similar sites feature, allowing you to quickly find sites similar to aitools.fyi. By repeating the two steps above, you can make a decision on whether to pay to submit your tool to this site.

Keywords Are Key

I created an open-source tool that shows OpenAI API cost details for different models (GPT-3.5/Whisper, etc.), including their hourly consumption and cost proportion. However, no matter how I modified the site title or even directly named it “OpenAI API Cost,” the Google search result ranking for this keyword remained low.

Later, inspired by this article, I expanded the About page from a mere three lines into a Blog and included as many long-tail keywords as possible. The key was that I introduced the ability to analyze costs by switching between bar charts and pie charts. I quickly discovered that searching for “OpenAI API cost pie chart” ranked my site at #2.

Simulating the potential keywords that users may use, and incorporating these long-tail keywords into the description of your site, is a good method.

SEO Guide from Google ^[3]

Google only wants you to improve the loading speed and compatibility of your webpage. The faster and more compatible your website is, the better impression it will make on Google. Here are some techniques I have tried:

Preconnect

This is about adding a line of preconnect code before your HTML page accesses a resource. For example, change:

<link  rel="stylesheet"  href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.0.0-beta3/css/all.min.css"/>

It becomes:

<link rel="preconnect" href="https://cdnjs.cloudflare.com" /><link  rel="stylesheet"  href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.0.0-beta3/css/all.min.css"/>

This could speed up the resource acquisition speed, thereby speeding up the webpage loading.

Media Resolutions

If you use images/videos on your webpage, try to add small/middle/large three resolutions:

<img  srcset="small.jpg 600w, medium.jpg 1200w, large.jpg 1800w"  width="1242"  height="2688"  sizes="(max-width: 600px) 600px,            (max-width: 1200px) 1200px,            1800px"  src="large.jpg"  alt="Your image description"/>

Remember to include the actual width and height specifications at the end; this helps prevent pixel shift when the site is loading.

Replace All PNG/JPEG with WebP

No need to explain the switch from PNG to JPEG, but I found that even compressed JPEGs, when almost losslessly converted to WebP, reduce the image size by about 50%.

`Tailwind.css` Slimming

If your website uses Tailwind.css, you will notice that its size is quite large, even the Tailwind.min.css is 2.8MB. But 99% of the CSS styles contained in it are not used by your site. GPT taught me how to slim it down as follows:

npm install tailwindcssnpx tailwindcss init

Create a tailwind.config.js file with the content:

module.exports = {  purge: ["./src/**/*.{html,js,jsx,ts,tsx}", "./public/index.html"],  darkMode: false, // or 'media' or 'class'  theme: {    extend: {},  },  variants: {    extend: {},  },  plugins: [],};

Open the package.json file and write:

{  "dependencies": {    "tailwindcss": "^3.3.3" # Replace with your version  },  "scripts": {    "build:css": "tailwindcss build -o dist/tailwind.css"  }}

Execute the command line:

npm run build:css

Using SVG icons

Another commonly used CSS file comes from awesome-font, which contains many icon resources, such as Twitter/Github/Discord. The size of this CSS file is 84KB, but if you only use 1-2 icons, you can actually download the SVG version from svgrepo, with each being around 1-2KB. Using SVG icons to replace CSS helped me save approximately 100ms.

async / Image Lazy Loading

Load your non-essential scripts using the async method to reduce webpage loading congestion:

<script async src="https://www.googletagmanager.com/gtag/js">script>

If images do not appear on the first screen of the webpage, you can add loading="lazy" to the image resource attributes, allowing the webpage to open without waiting for the images to be loaded.

<img src="imgs/img.webp" ... loading="lazy>

I used the above combination optimization method on my website, and the comparison before and after optimization is:

The performance analysis website shown above is this one, an official product of Google. It is very convenient to use it to analyze where time is being consumed.

Mobile-friendly

Use responsive design frameworks (such as Tailwind CSS), and test on both desktop and mobile devices, make sure your webpage fits these two screen sizes, so that Google can score your webpage higher.

The above is just the tip of the iceberg in SEO, but it is all the experience I can share at the moment, I hope it can help you increase the exposure of your AI tools.

让你的AI工具从Google搜索中获得用户

2023-07-23T06:42:23.000Z

声明：这是我(一个完全没有任何 SEO 基础的工程师)在短期内发现的一些经验，所以可能有一些原理性的错误，如果是这样，非常欢迎邮件纠正我。

制作网站

尽管你做的可能是一个 App 或桌面软件，但我仍然建议你给自己的产品制作网站，有两个好处：1) 通过图表、动画、视频等更丰富的形式展示自己的产品功能。2) 让目标用户有可能通过 Google 搜索触及你的产品，这是很重要的用户新增来源。

而一旦你想要制作了网站，就涉及到一个问题：如何让目标用户搜索「chat ai」时，有更高的几率能点进你的网站，这就需要 SEO(Search Engine Optimization)。

PageRank

SEO 的最终目标是让你的产品在 Google 搜索的排名靠前。所以有必要了解一点 Google 排序算法。

虽然 Google 已经不再将 PageRank 作为唯一算法，但它仍是重要的排序依据：PageRank 认为「更重要的页面往往更多地被其他页面引用」^[1]。类似于论文期刊的权重因子，Google 给每个网站都做了评分(domain rating, DR)：同样命中了某个关键词，Google 会将 DR 更高的网站排名靠前，这很合理。

那么，如何计算评分呢？一个直观的想法是，统计引用了你的网站这些网站的平均评分，这种「引用你的网站」的行为被称作反向链接(Backlink)。这样一来，PageRank 算法看上去就像一张带有权重的图：假如你的网站被三个其他网站反向链接了，而这三个网站的 domain rating 分别是 100,10,1，则你的网站评分就是 $$(100 * 1 + 10 * 1 + 1 * 1) * 系数$$ 。可见，反向链接并非越多越好，给你带来反向链接的网站本身的质量更重要，有些算法里只会统计引用了你的网站中评分最高的 top100 域名，这样通过不断注册域名，囤积反向链接数量的方法就失效了。

Ahrefs 网站针对 domain rating 有这样一句评价：

您应该致力于从高 Domain Rating 网站获取反向链接，因为它们具有更大的“权重” 。^[2]

Domain Rating

因此，SEO 的一个关键任务就是增加你的网站 Domain Rating。接下来的任务就是：如何让高 DR 的网站给我反向链接。

你可以去 Reddit/Medium 上撰写高质量的文章介绍自己的产品，也可以请求那些已经是高质量的文章给你的网站添加链接。例如，你的产品是「mp3 converter」, 从 Google 结果里找到靠前的文章，向他们发邮件请求添加你的网站。因为他们是测评网站，所以不会排斥多加个链接，如果你的产品足够好，也许就能被收录。

因为本文主要是写给 AI 工具，所以这里只讨论如何给 AI 工具增加反向链接。

这里其实有个双边市场：许多新的 AI 工具需要曝光，而许多 AI 工具列表网站需要新内容，所以他们乐于设置 AI 工具提交入口。有些网站是免费的，有些用户量高的网站则收取 10-20 刀的费用，你可以根据自己判断决定是否值得花钱提交。

挑选 AI 列表网站

只说我的做法：第一步，访问Similarweb，搜索某个工具列表网站，例如 aitools.fyi，你可以看到它过去几个月的访问量、访问者都来自哪些国家、性别等用户画像。

第二步，使用Ahrefs 反向链接检查工具查看这个网站的 DR，以判断一旦被收录后它将给你的网站带来的 DR 提升。一个有意思的事情是：你其实完全可以输入你的工具网站，查看它目前被哪些网站反向链接了，这样你可以对「哪些网站的 DR 高」有感性的认识，例如V2EX就很高。注意，Ahref 的 DR 打分不等于 Google 打分，但趋势上是接近的。

上面Similarweb截图中可以看到，它提供了「相似网站」的功能，你可以很快找到与 aitools.fyi 类似的网站。这样，循环上述两个步骤，最终你可以做出「是否要付费提交工具给这个网站」的判断。

关键词很关键

我做了一个开源的查看 OpenAI API 费用明细的工具，可以查看不同模型(GPT-3.5/Whisper 等)的分时消耗，以及各自的费用占比。但是，无论我怎么修改网站标题，甚至直接命名为「OpenAI API Cost」，这个关键词的 Google 搜索结果排名也很靠后。

后来受这篇文章的启发，我将这个网站的 About 页面从短短的三行扩充成了一篇Blog，并且尽可能包容更多、更长尾的关键词。重点是：我介绍了你可以通过切换柱状图(bar chart)和饼图(pie chart)来分析成本。很快我就发现，搜「OpenAI API cost pie chart」我的网站就排名#2 了。

模拟潜在用户会使用什么关键词，然后在网站的介绍里加入这些长尾关键词，是一个不错的方法。

来自 Google 的 SEO 指南 ^[3]

Google 只希望你提升网页加载速度和兼容性，你的网站响应越快、兼容性越好，它对你的印象更好。这里列举几个我尝试的技巧：

Preconnect

也就是在你的 HTML 页面要访问资源前，增加一行preconnect代码。例如将：

<link  rel="stylesheet"  href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.0.0-beta3/css/all.min.css"/>

之前增加一行，变成：

<link rel="preconnect" href="https://cdnjs.cloudflare.com" /><link  rel="stylesheet"  href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.0.0-beta3/css/all.min.css"/>

GPT-4 告诉我，使用preconnect可以加速资源获取速度，从而加快网页加载。

为媒体设置多分辨率

如果你的网页里使用了图像/视频，尝试增加small/middle/large三种分辨率：

<img  srcset="small.jpg 600w, medium.jpg 1200w, large.jpg 1800w"  width="1242"  height="2688"  sizes="(max-width: 600px) 600px,            (max-width: 1200px) 1200px,            1800px"  src="large.jpg"  alt="Your image description"/>

记得在结尾加上图片的实际width和height说明，这有利于防止网站加载时出现像素偏移。

将所有 PNG/JPEG 都替换成 WebP

PNG 换成 JPEG 不用解释了，我发现即使是压缩后的 JPEG，几乎无损地转成 WebP 之后，图像体积也减少了大概 50%。

Tailwind.css 瘦身

如果你的网站使用了Tailwind.css，会发现它的体积非常大，即使是Tailwind.min.css也有 2.8MB，但它其中包含的 99%CSS样式你的网站没有用到，GPT 是这样教我瘦身的：

npm install tailwindcssnpx tailwindcss init

新建tailwind.config.js文件，内容是：

module.exports = {  purge: ["./src/**/*.{html,js,jsx,ts,tsx}", "./public/index.html"],  darkMode: false, // or 'media' or 'class'  theme: {    extend: {},  },  variants: {    extend: {},  },  plugins: [],};

打开package.json文件，写入:

{  "dependencies": {    "tailwindcss": "^3.3.3" # 替换成你使用的版本  },  "scripts": {    "build:css": "tailwindcss build -o dist/tailwind.css"  }}

在命令行执行：

npm run build:css

此时，Tailwind.min.css从 2.8MB 压缩成了 14KB，给我的网站节省了至少 500ms 的加载时间。

使用 SVG icon

另一个常用的CSS文件来自awesome-font，它包含了很多 icon 资源，例如 Twitter/Github/Discord，这个CSS文件的大小是 84KB，但假如你只用到 1-2 个 icon，其实完全可以去svgrepo下载 SVG 版本，每个大概 1-2KB。用 SVG icon 替换CSS帮我节省了大概 100ms。

async / 图像懒加载

将你的不重要的script使用async的加载方式，以减少网页加载拥堵：

<script async src="https://www.googletagmanager.com/gtag/js">script>

如果图片并不出现在网页的首屏，可以在图片资源属性中加入loading="lazy"，让网页不必等到第二屏的图像加载完成才能打开。

<img src="imgs/img.webp" ... loading="lazy" width="1200" height="640" />

我在自己的网站使用了上述这套组合优化方法，优化前后的对比是:

上图的性能分析网站是这个，是 Google 官方的产品，用它来分析哪些地方耗时非常方便。

移动友好

使用响应式设计的框架(如Tailwind CSS)，同时在桌面和移动设备上测试，确认你的网页适配这两种屏幕尺寸，让 Google 给你的网页打分更高。

以上只是 SEO 中的冰山一角，但也是目前我能够分享的全部经验，希望对你有帮助。

The Leverage of LLMs for Individuals

2023-05-10T10:28:48.000Z

Disclaimer: This article is not meant to provoke anxiety or exaggerate the power of GPT. It is merely my personal observation after using ChatGPT/GPT-4 intensively for the past six months. It is definitely not applicable to the vast majority of people. This article is for those who wish to create something and have no expectations of “foreseeable returns” on an individual level.

GPT-4, not ChatGPT

First of all, GPT-4 and ChatGPT are two different entities.

If at this point in time (2023.05.10), there are still media outlets boasting or belittling GPT while using ChatGPT as an example instead of GPT-4, then it is not worth reading. The metrics provided by OpenAI’s official website are:

The recently released SuperCLUE Chinese large model benchmark shows that in terms of professional capability, GPT-4 is in a league of its own compared to other models, and this is just for Chinese.

Boundless Creativity

I had never written front-end code before, but after 48 hours of conversation with GPT-4, I built a podcast search website. I open-sourced the website and gave it a GPT-4.0 License - to express my gratitude.

A few days later, I wanted to automatically skip certain timestamps when watching videos on web pages. With no experience in Chrome extension development, I followed GPT-4’s instructions to create files, paste, and drag and drop, achieving it in less than 15 minutes. I didn’t publish it; it became a tool serving only me.

Whenever I have a new idea, I ask GPT-4 to write the most basic version. I provide feedback, it apologizes, and we gradually optimize to the 1.0 version I have in mind. I use up my GPT-4 quota (25 entries/3 hours) multiple times a day. With GPT-4’s support, I feel unstoppable. The overnight surge in productivity is intoxicating. It’s not about making money or starting a business, but purely about continuously bringing ideas from my mind into reality, which feels like happiness.

More importantly, it gives me the courage to dream and attempt things beyond my current abilities. I even plan to create a 3D game. I have a vague feeling: don’t limit yourself—soon, I might be using products developed by designers, lawyers, or electricians.

Recently, the two most frequent tasks I have used GPT for are:

Copying and pasting documentation and APIs to GPT-4, asking it to write interfaces based on them. Current documentation is written for human reading, with a mix of text and code blocks that makes copying a poor experience. Perhaps soon, documentation and APIs will become GPT-friendly first.

Translation and product localization. Even in the DeepL era, slightly longer sentences can still reveal machine translation traces. However, based on my experience, the translation capability of GPT (whether 3.5 or 4) far surpasses DeepL. I can localize my products into dozens of other languages at once.

Within half a year, I launched 5 iOS apps.

Individual, not Company

Large language models are a leverage that can amplify the abilities of each user.

If you are an individual developer, the leverage it provides might be 10 times greater, but when you work for a company, that number might be only 2. So, even knowing the current economic downturn and wave of layoffs, for those who want to achieve ideas and create, I still want to express this bold opinion:

Staying in any company right now is a negative return; you are wasting personal leverage.

Unless you work for OpenAI, your GPT leverage is likely wasted on trivial business code, and even more likely that due to the overwhelming amount of poor code, GPT with its 4/8k context window is unable to optimize, further weakening the leverage.

I see too many talented young people trapped in large companies, exhausting their energy in endless documentation and meetings. Maintaining income is the main reason, but they may also be unaware of the potential they have when using GPT alone.

To be more extreme, I even think that joining any team is a kind of slowdown. Only by sailing alone in a small boat can one maintain agility.

Setting Sail Alone

How to generate ideas and sustain a livelihood is not the scope of this article. I still believe that this path is not suitable for the vast majority of people. What I’m doing is simply shouting towards those people who haven’t had the chance to use GPT-4 and haven’t considered the infinite possibilities it enables through its leverage.

I hope this is useful to the 0.01% of readers.

大语言模型对个人的杠杆

2023-05-10T08:50:15.000Z

声明：不贩卖焦虑，也非宣扬 GPT 多强，本文是我近半年高强度使用 ChatGPT/GPT-4 后产生的一点个人观察，并且一定不适用于绝大多数人。
本文适用于，那些希望产生一点创造，并且对「可预见的回报」没有预期的个人。

GPT-4, 而不是 ChatGPT

首先，GPT-4 和 ChatGPT 是两种存在。

如果现在(2023.05.10)这个时间点，还有公众号在吹嘘/贬低 GPT 时拿 ChatGPT 而不是 GPT-4 举例，那篇文章便不值一读。OpenAI 官网给出的指标是：

最近出炉的SuperCLUE中文大模型指标显示：在专业能力上，GPT-4 和其他模型是断层般的存在，这还只是中文。

无边界的创造

我从未写过前端，但在与 GPT-4 交谈 48 小时后，做了一个播客搜索网站。我把网站开源，给它加了 GPT-4.0 Licence —— 以表达对它的感谢。

几天后，我想在网页看视频自动跳过某些时间戳，没有 Chrome 扩展开发经验的我，按照 GPT-4 的指令，创建文件、粘贴、拖拽，不到 15 分钟就实现了。我没上架，它成了只服务于我一个人的工具。

每当我有新 idea，就会让 GPT-4 写个最初级的版本，我反馈，它道歉，一点点优化到我心中的 1.0 版本。我会每天多次用完 GPT-4 的配额(25 条/3 小时)，有了 GPT-4 的加持，我无所不能。像张无忌突然学了绝世神功，「一夜之间暴涨的生产力」让人痴醉，不是为了赚钱或者创业什么的，就是纯粹地不断把脑海中的想法创造出来，这种感觉就很幸福。我有种朦胧的感觉：不能给自己设限，也许很快，我会用上由设计师、律师或者电工开发的产品。

更重要的是，它让我敢于做梦、去尝试超出自己能力范围的东西，我甚至打算做 3D 游戏。

最近，我用 GPT 做的最频繁的两件事是：

把文档和 API 复制粘贴给 GPT-4，让它参考来写接口。目前的文档都是为人类阅读而编写的，文字和代码块混合格式让复制体验很差。也许很快，文档和 API 会变成 GPT 友好优先。

翻译、产品本地化。我发现中国开发者中，只有非常少一部分人意识到海外市场的存在(最明显的标志是他们的 App Store 产品通常只支持中文和英文)，一部分原因是语言不通，即使在 deepL 时代，稍长一些的句子也可以肉眼看出机翻痕迹。但就我的体验而言，GPT(无论是 3.5 还是 4)的翻译能力碾压 deepL，我可以一次性将自己的产品本地化成数十种其他语言。

半年内, 我上架了5 个 iOS apps。

个人, 而不是公司

大语言模型对每个使用者都是一个能放大能力的杠杆。

如果你是个人开发者，它对你的放大杠杆可能是 10 倍，但当你为公司工作时，这个数字可能是 2。所以，即使明知这样的就业和经济环境，对于「想做点什么」的人，我还是想说出这个暴论：

此刻你待在任何公司都是负向收益，因为你在浪费个人杠杆。

除非你为 OpenAI 工作，否则你的「GPT 杠杆」便很可能浪费在那些无足轻重的业务代码上，并且更有可能由于屎山过于庞大，让只有 4/8k 上下文窗口的 GPT 无法优化，进一步削弱了杠杆。

我看到许多有才华的年轻人被大厂困住，把精力消耗在无止境的文档和会议里，维持收入当然是主要原因，但也很可能他并不知道自己独自一人驾驶着 GPT 这艘船，能发挥出多大的潜力。

极端一点，我甚至觉得加入任何团队都是某种减速，只有独自驾驶小船才可以一直保持轻快。

独自出发

如何开卡、如何维持生计，并不是本文讨论的范畴。我仍然认为绝大部分人不适合这条路，我所做的，只是朝着熙熙攘攘的西二旗地铁站、和无数个灯火通明的高楼玻璃窗边喊一声，那里可能有很多人没有机会使用 GPT-4，也没想过借助它的杠杆实现无限可能。

但愿对 0.01%的读者有用。

Teach, Show, then Consult: Make GPT a Music Composition Guide

2023-03-03T11:26:02.000Z

I tried an odd idea: have ChatGPT learn a music programming language (Alda), give it some great code examples, and then ask it for advice on how to create music.

Surprisingly, it somehow turned out to be work! Maybe it is a misrepresentation, but the attempt itself is quite interesting. Code: teach-show-consult.

Teach 🎓

We will teach ChatGPT to learn a language called Alda, It introduces itself as follows:

Alda is a text-based programming language for music composition. It allows you to write and play back music using only a text editor and the command line.

An example:

piano:  o3  g8 a b > c d e f+ g | a b > c d e f+ g4  g8 f+ e d c < b a g | f+ e d c < b a g4  << g1/>g/>g/b/>d/g

In this step, all I have to do is to use Prompt Engineering to give ChatGPT as many syntax rules as possible for the Alda language. You can find this part in teach.py:

ALDA_LANG_RULE = """You are a musician as well as a technologist who is well versed in programming.Now you've been asked to learn a new language called Alda, which allows you to create music as if you were programming.I will now tell you its rules:1.The alda program usually starts with (tempo! number), which is stating the tempo of the music as this number.2.Next, the instrument is usually specified, e.g. "piano:", which means that the music will be played on a piano. Other instruments supported are: acoustic-guitar, cello, flute, violin, etc.3.Immediately after that, comes the part of the notes. Let me illustrate the main features of this program.a) The default is quarter notes, which means that you type "c d e f", which represents a measure that has four quarter notes: C, D, E and F.b) The ">" symbol means...

Thanks to LangChain, I can easily test the results. Now try to see if it understands the rules:

>>> python teach.pyLet's test Alda language learning!Human:c d e+ f-AI:This is a measure of four quarter notes written in the key of C Major. The notes are C, D, E-sharp, and F-flat.Human:c d > e4 f8AI:This paragraph of code represents a measure of music that begins at the C major scale and moves up one octave to E.The first two notes are quarter notes (C and D), and the third note is an eighth note (E). The last note is a quarter note (F).

It looks like it basically understands the rules.

Show 👉📺

Now the second step: show it some subtle code examples for ChatGPT to learn from.

I found some examples from Alda repo, downloaded and parsed then encode these Alda code files into a vectorestore using OpenAI Embedding.

Here, a local database is actually established. Before you ask ChatGPT a question, it can obtain some relevant local information. The code is in show.py, and heavily based on chat-your-data.

Consult 🤔❓

Now, we have come to the final step: consulting ChatGPT to create music.

Since ChatGPT has already learned the rules, if we provide it some musical inspiration, perhaps it can extend the music we create by following the works of giants and coming up with new melodies or phrases.

Then I gave it a try:

>>> python consult.pyLet's consult the music master!You:c d8 > gMusic Master:(tempo! 120)piano:c4 d8 > g4 c4. c4. d4~4. c4. c4. d4~4. c8 r d e g b2~12 > d12 g | d8 r < b4~8~ > d8 < {b > d g}4 |g8 r f+4 f {e_ g b}4 | g8 r f+4 f {e_ g b}4 | (quant 40) > d4 (quant 90) e-4~8 e8 d- c | < b_ > d4~ f8 < b_4 > c |< b_8 > (ff) d4~ e8 (f) c2 | < a-2. > c4 < a2~ f b_2 > (mf) c8 c4~ d8 | < a-2 r8 > c4~ d8 | < a-2 r8 f4~ (p) g8 |(pp) a_16~ b~ a_~ b_ (p) > c~[INFO] Notes saved to: output.alda

I used an online converter to turn output.alda into a .midi file. Then again using another online midi2mp3 tool to get the .wav file music starting with my input of c d8 > g.

You can listen to it here:

Have Fun! 🎉

This project is probably just a joke. It uses too little samples and the rules I taught are only a subset of the entire grammar. However, it seems to have led to interesting results. Perhaps, in other matters, this approach can produce more interesting things.

I made the code open sourc: have fun!

Run CLIP on iPhone to Search Photos

2022-12-29T03:51:57.000Z

Update: I have made Queryable open-source. This might help you learn how to export Core ML models, as well as how to calculate, store, search, and accelerate queries.

I built an app called Queryable, which integrates the CLIP model on iOS to search the Photos album OFFLINE. It’s now available on App Store, and I thought it might be helpful for others who are as frustrated with the search function of Photos as I was. So, I wrote this article to introduce it.

CLIP

CLIP(Contrastive Language-Image Pre-Training) is a model proposed by OpenAI in 2021. CLIP can encode images and text into representations that can be compared in the same space. It serves as the foundation for many text-to-image models (e.g., Stable Diffusion) to calculate the distance between the generated image and the prompt during training.

What I did was to make CLIP run on my own mobile device: an iPhone. To run on iOS devices in real time, I made a compromise between performance and model size, and finally chose the ViT-B-32 model, separating the Text Encoder and Image Encoder.

In ViT-B-32:

The Text Encoder encodes any text into a 1x512 dimensional vector.
The Image Encoder encodes any image into a 1x512 dimensional vector.

We can calculate the proximity of a text sentence and an image by finding the cosine similarity between their text vector and image vector. Here’s some pseudo-code to illustrate:

import clip# Load ViT-B-32 CLIP modelmodel, preprocess = clip.load("ViT-B/32", device=device)# Calculate image vector & text vectorimage_feature = model.encode_image("photo-of-a-dog.png")text_feature = model.encode_text("rainy night")# cosine similaritysim = cosin_similarity(image_feature, text_feature)

Integrate CLIP into iOS

I exported the Text Encoder and Image Encoder to CoreML models using the coremltools library. The final models have a total file size of 300MB. Then, I started writing in Swift.

Here’s how to perform inference with the Text Encoder in Swift:

// Load the Text Encoder model.let text_encoder = try MLModel(contentsOf: TextEncoderURL, configuration: config)// Given a prompt, calculate the CLIP text vector for it.let text_feature = text_encoder.encode("a dog")

I split the Text Encoder and Image Encoder into two models because, when using this Photos search app, your input text will always change, but the content of the Photos library remains fixed. This means that all the image vectors can be computed once and saved in advance. Then, the text vector is computed only once for each of your searches.

This approach makes real-time text searching across tens of thousands of photos in your library possible. Below is a flowchart of how Queryable works:

Performance

Compared to the search function of the iPhone Photos app, how much does the CLIP-based album search capability improve? The answer is: it’s overwhelmingly better. With CLIP, you can search for a scene in your mind, a tone, an object, or even an emotion conveyed by the image.

To use Queryable, you first need to build the index, which involves traversing your album, calculating all the image vectors, and storing them. This happens only ONCE, and the total time required depends on the number of photos you have. The speed is about 2000 photos per minute on an iPhone 12 mini. When you have new photos, you can manually update the index, which is very fast.

The time cost for a search also depends on your number of photos. For fewer than 10,000 photos, it takes less than 1 second. For me, an iPhone 12 mini user with 35,000 photos, each search takes about 2.8 seconds.

I made a video to demonstrate the search capabilities of Queryable:

QA

1.On Privacy and security issues.

Queryable is designed as an OFFLINE app that does not require a network connection and will NEVER request network access, thereby avoiding privacy issues.

2.What if my photos are stored in iCloud?

Due to the inability to connect to a network, Queryable can only use the cache of the low-definition version of your local Photos album. However, the CLIP model itself resizes the input image to a very small size (e.g., ViT-B-32 uses 224x224), so if your image is stored in iCloud, it actually does not affect search accuracy. The only limitation is that you cannot view its original photos in search results.

- Update: In the latest version, you have the option to grant the app network access to download photos stored in iCloud. This will only occur when the photo is included in your search results, the original version is stored in iCloud, and you have navigated to the details page and clicked the download icon. Once you grant the permissions, you can close the app, reopen it, and the photos will be automatically downloaded from iCloud.

3. Any requirements for the device?

iOS 16.0 or above
~~iPhone 11 (A13 chip) or later models~~

4.Have some suggestions or product experience issues?

Feel free to contact me by email: myfancoo@gmail dot com.

TL;DR

现在的AI真正给我带来了什么

What AI Has Really Brought to My Life

拥有个人博客最简单的方式

The Easiest Way to Have a Blog

用 2 万条真人AI海龟汤数据评估大模型推理能力

海龟汤

大模型比人类笨多了

真实环境下的 LLM 推理能力

现有评估指标出了什么问题

MMLU

MT-Bench

Chatbot Arena

海龟 Benchmark

海龟数据集

评测结果

测试你关心的模型

感谢

一个AI相册搜索应用的两年

起源

回响

定价

推向市场

Hacker News 是世界的公告牌

免费 & 开源

有关抄袭

重新变为付费

尾声

Two Years of an AI Photo Album Search App

Origins

Echo

Pricing

Pushing to Market

Hacker News is the World’s Bulletin Board

Free & Open Source

Plagiarism and Repackaging

Becoming Paid Again

Epilogue

消费折叠

消费折叠

Folding Consumption

Folding Consumption

在2023年底做一个古典的信息共享工具

小网站发现危机

信息茧房

加入 LLM?

1.自动摘要

2. 语言分发网络(Language Delivery Network, LDN)

定价模型

Retro Link-Sharing in 2023

The Small Website Discoverability Crisis

Information Cocoon Rooms

Adding LLM?

1. Automated Summaries

2. Language Delivery Network (LDN)

Pricing Model

一个失败的AI女友产品

开发

发现

收入

困惑

教训

A Failed AI Girlfriend Product, and My Lessons

Development

Discoveries

Revenue

Confusion

Lessons

梦/Dream

Get users for your AI tool from Google search

Creating a Website

PageRank

Domain Rating

Choosing AI Listing Sites

Keywords Are Key

SEO Guide from Google [3]

Preconnect

Media Resolutions

Replace All PNG/JPEG with WebP

Tailwind.css Slimming

SEO Guide from Google ^[3]

`Tailwind.css` Slimming

来自 Google 的 SEO 指南 ^[3]