Geoffrey Litt is a design engineer at Notion working on malleable software: computing environments where anyone can adapt their software to meet their needs and their lives. Before joining Notion, he was a researcher at the independent lab, Ink & Switch, where he explored the future of computing. He did his PhD at MIT on programming interfaces. Most of his work circles around a very simple but powerful question: how can everyday people shape the software they use like clay so that humans can have more power and agency in the world?
In this conversation, Geoffrey and Kanjun discuss:
Barriers to malleable software
Inventing new UI components for the AI age
Principles for agent-human collaboration
How AI affects the creative process
…and more!
Timestamps
05:59 Barriers to software malleability: technical, economic, and infrastructural
08:57 Real-time collaboration and version control
15:01 Common Source: between open and closed source
20:54 Navigating divergence in software development
34:04 Data structure and universal formats
39:10 Local developers and collaborative software
42:57 Learning curves and tailorability in end user programming
50:55 How AI shapes creative work
52:07 Making agent-human collaboration like human-human collaboration
01:03:44 Mental bandwidth and parallel agents
01:08:50 Exploring design spaces through generated options
01:11:11 Visualizing code quality and malleability
01:13:45 Review as part of the creative medium
01:17:59 Infrastructure needs for malleable personal software
01:30:47 Rekindling the vision of personal computing
Transcript
Kanjun Qiu (00:30)
Welcome back to Generally Intelligent, a podcast by Imbue on the economic, societal, political, and very human impacts of AI. Today, I’m joined by Geoffrey Litt. Geoffrey is a researcher working on malleable software, computing environments where anyone can adapt their software to meet their needs and their lives. Geoffrey’s just joined Notion, and recently he was a researcher at the independent lab, Ink & Switch, where he explored the future of computing.
He did his PhD at MIT on programming interfaces and most of his work circles around a very simple but powerful question, which is how can everyday people shape the software they use like clay so that humans can have more power and agency in the world? And that’s a lot of what we’ll be exploring in our conversation today.
Welcome, Geoffrey. It’s really good to have you here. So I’m really curious, we always start with, tell us a bit about how you developed your initial research interests. You you went to MIT, you did your PhD in human-computer interfaces. What sparked your interests? What happened and how did your thinking evolve over time?
Geoffrey Litt (01:30)
Thanks, it’s great to be here.
Way before I got into research, actually, I was just working on a startup shipping product. I worked at an edtech startup out of college. And that was where this all kind of started. We were a team in Boston shipping software to thousands of schools across the country. And every school is different, right? From time to time, we would try our best to make the one best report for data that works for every school, whether it’s a rural elementary school or an urban high school or whatever. And then we would get on these calls and some teacher or principal would be like, you know, actually, I don’t use your product. I just hit export to CSV and then I use Excel. And it was really sad for me as a designer. But then you look at what they did and it’s like, oh man, this is ugly, it’s buggy, but it does exactly what you wanted. And sometimes what they would change would be the tiniest thing. Like, I didn’t like that color, that color made our kids feel bad. Or like, this word in your product touches a political nerve. It could be tiny, tiny details, but having the people on the ground in the classrooms having the agency change that stuff was really interesting. This aspect of spreadsheets that just kind of captured me.
And so I started thinking, why doesn’t more software feel that way? Why is it that, you know, there are some things you can do in Excel, but so much of software feels like it’s decided thousands of miles away and you’re stuck with however it was decided, you know?
That sent me down a really, really deep rabbit hole trying to figure that out. And that’s how I got into this question.
Kanjun Qiu (03:19)
That’s really cool. Yeah, I’ve heard from someone, like, Excel is the first and maybe only successful end-user programming tool. And you have all of this interesting use of Excel, but that doesn’t work for everything else. When you were exploring this question, why isn’t all software that way? Where did that lead you? Why isn’t all software that way?
Geoffrey Litt (03:41)
Oh man, yeah. There’s a lot of reasons, some of them are technical. I think historically, a lot of people have tried to make programming easier and more accessible, but ultimately there have been these kind of barriers of needing to think in really abstract ways that are not natural to most people. So that’s been one chunk of the challenge. And I think AI is changing that state of play a lot, and we can talk about that.
But there’s also lot of other barriers that are bigger and in some ways harder to tackle. There’s economic barriers, like, you know, how do people get paid to make stuff? There’s kind of like infrastructural ones, like a lot of our computing environment and ecosystem has kind of calcified around the assumption that people aren’t editing their software.
If you think about it,when I send you an Excel spreadsheet and you open it, you’re opening it in the editor for the spreadsheet. It’s not just a spreadsheet viewer, right? You actually have the editor and you can do whatever you want to it because it’s a file that you control.
And when we look at a lot of how software is shipped through app stores, the assumption is that, no, what are you talking about? The user would never edit the code. In fact, we do a lot of things so that they can’t edit the code. And so I think there’s a lot of factors that interlock around this core assumption. And that’s part of what I think makes this problem challenging to make progress on, is that you have to address a lot of these together.
Kanjun Qiu (05:03)
That’s a really interesting observation that Excel is the editor and most software isn’t. You open the software, it’s view only, it’s not the editor. You maybe can edit the data, but not the UI elements.
Geoffrey Litt (05:15)
Yeah, I think that’s one of the biggest principles around malleability is we want to be removing friction and barriers between being a quote-unquote user who’s passively using something and getting deeper and deeper into actively modding it. A really important point is that I don’t think that everyone should be modding software all the time. I’m a nerd, and I don’t want to mod most of my software most of the time, right? But it’s just about having the ability to go there if you want.
That’s where, you know, in an environment like Excel or spreadsheets, having the editor at least available to you always is a key principle.
Kanjun Qiu (05:53)
Right. Because sometimes you only want to view the Excel spreadsheet and not change the super complex financial model.
Geoffrey Litt (05:59)
Exactly. And there might be cells that say, don’t touch this unless you’re really sure you know what you’re doing. I think that’s something that people often miss too, is sometimes having more explicit guardrails can actually free people up to feel safer and more creative editing stuff. If we go back in the history of malleable environments, HyperCard is a system from, I think it started in the 80s and shipped on Macs. And basically, it was kind of like a precursor to PowerPoint in a way. You could kind of make these kind of like slideshows with these index cards basically. But what was really neat about HyperCard is that you could start out by just drawing pictures or writing text and they had these different levels or modes and level one or level two I think was like just editing text and drawing stuff. You weren’t even able to code in that level. And then when you wanted to, you could go to level five, let’s say, which was the deepest level where you can do anything, but you’re only going there if you know you’re ready and you know you want to.
And I think in a lot of spreadsheets, you actually have folk practices around this stuff, like you just mentioned where maybe you’re walling off part of it that’s dangerous to touch. I think sometimes paradoxically, boundaries like that can create freedom for people.
Kanjun Qiu (07:15)
That’s super interesting. Diving into that a little bit and as part of the infrastructural barriers you were talking about, like our computing ecosystem, maybe the infrastructure we have, the UI elements that we have, they are all calcified around this assumption that people are not editing their software. What kind of infrastructure, constraints, different UI components, guardrails, et cetera, do you think could… let’s say we rewound back to the 80s or, you know, we’re here today and we ended up calcifying around a different ecosystem. What elements of that ecosystem might exist such that you end up getting malleable software?
Geoffrey Litt (07:57)
Yeah, so maybe it’s best to talk about this concretely. I can tell you about some experiments we’ve done at Ink & Switch where I used to work and do research and some of the environments that we developed there that we used heavily internally to do our own work that kind of enabled the sorts of malleability we were seeking. One system that we developed really deeply was a system called Patchwork. And the core idea of Patchwork was basically, it starts out as just a document editor, like the flagship feature is just a markdown editor that’s collaborative. But then you can go deeper and deeper into modifying it. And what it ends up being is actually an environment where you can make your own tools and share them with people and edit whatever tools you’re using on the fly. It kind of achieves some of the goals we wanted.
So how do we get that? Well, a few things. You need the ability to live edit your tools as you use them. This is a really important thing we realized is most of the time when I realize I want to change something, I don’t have an hour to go do it. I might have five minutes though. And so we found that there’s this kind of magical combination of, use AI to do the coding, so that solves how do you get the new code. But then you also have this question of like, how do you ship that to yourself and to your colleagues? And the starting point there is really treat the code of your app just like the documents you’re editing together. What I mean by that is like when we open a Google Doc together, we’re just editing it live, right? Just make your software like that.
This is not how people typically think. Typically you think, you have to push to GitHub and there’s some CI pipeline that runs and it deploys and it’s like this industrial process that’s arranged around kind of like preventing screw ups and shipping to millions and millions of people. But no, just make it like Google Docs, okay? Once you do that, you instantly run into a bunch of other problems. So one is, if we’re using a piece of software and you’re editing it live, it’s not going to be fun for me because you’re going to be breaking it all the time, right? So then we realized, actually, this is why programmers have Git. We need Git for normal people. So we invested a lot in Patchwork and ideas around version, we call it universal version control. The idea is systems that achieve the goals of programming version control, but for any kind of data and for any kind of user, even someone who’s not super, super nerdy. Basically, what we found was that when you combine code as just documents that you’re sharing with people and you can edit them, you have AI helping you, and you have powerful version control that lets you create copies of things and merge things back together in good ways, then you start getting a really interesting set of ingredients where you can start remixing and mashing up and feeling more playful with your tools and your software. So I think that’s one starting point.
Kanjun Qiu (10:47)
That’s super interesting. Treating code as a live shared document, like text file, between two people, and then having a way to version that text and data, I imagine, as kind of the main things that you’re versioning.
Geoffrey Litt (11:03)
Those are two things.
Kanjun Qiu (11:05)
What else do you need to version?
Geoffrey Litt (11:19)
Patchwork has whiteboards. Patchwork has spreadsheets. And in fact, because you can add new tools to the system too, actually you end up with kind of the ability to store like the system needs the ability to store and share arbitrary data. This is another thing I’ll get into, which is when we think about, OK, what are the barriers to shipping software? A lot of the barriers are that the main ways that we deploy software for people to use assume industrial scale. So you need a back end. You need a database. You need load balancing, blah, blah, blah, nonsense, Kubernetes stuff.
There’s a lot of stuff and the gap between I have a working prototype that I can run on my computer and I can send you a link and we can collaborate in my new piece of software I made tends to be a lot of work. And so one of our goals in Patchwork was how much of that infrastructure could be offload to the operating system or the environment, so to speak, so that if you have an idea and you vibe-code a UI, how can you then share that with me and we’re instantly working together in that tool you just made?
And there’s a lot of layers to kind of figuring out data persistence and sync and all that stuff to make that a reality. I think you’re seeing this to some extent in a lot of platform-as-a-service startups out there are trying to figure out how do we become the best backend for vibe-coded apps in a sense. I think that’s part of it, but I think we can push even further than most startups are going.
Kanjun Qiu (12:40)
Yeah, I think one of the really interesting things here. So Imbue recently shipped Sculptor, which is a tool for you to run parallel agents to write code. And one thing that we’ve been thinking about is sync and collaboration, real-time collaboration. And something that we made is this thing called Pairing Mode. So all of the agents run in containers, which means they don’t have your code. They’re not running locally. They run in containers because you don’t want them to delete your files accidentally or things like that.
I actually really resonate with what you said about versioning. We’ve had to think about the Git workflow of the developer and the agent as two separate things. And how does the agent’s version and the developer’s version mesh together? We’re just using Git right now, but Pairing Mode basically copies the agent’s files and rsyncs it to your local environment. And then now you’re real-time editing with the agent. What you said is really interesting, because we’ve been thinking about, like, I’m real-time editing with the agent, but actually sometimes I want to real-time edit with someone else. In the normal existing today’s software engineering industrial process, nobody wants to real time edit with anyone else. That’s actually really rare. So it’s been an open question for us. I resonate with you a lot on things built for scale, Kubernetes, actually wouldn’t it be better if everything were local and just running on the compute that you have in your hands and you don’t have to handle all these scale problems.
I’m really curious, when you were working on Patchwork together, when did you want to collaborate real-time when coding versus when you want to do this more like industrial, independent two people merging branches into the same main workflow? Did you ever want to do one versus the other? Or did you always want to live edit?
Geoffrey Litt (14:33)
Yeah, I’m really glad you asked. I’m fascinated by this area. I have this opinion that collaboration between humans and AI is essentially a version control problem. What I mean by that is when you think about the problems that a version control system like Git is meant to solve, you have a bunch of people working together. They might be working concurrently on different stuff. And you need ways to go off and try stuff and be experimental. You need ways to review work that other people are coming to you with and talk about, I want to do this, what do you think? Let’s go back and forth and discuss. And then you want to track, okay, we decided it’s good, let’s do it. And you want to see that in your history. And when you think about working with AI actually, and you look at the needs, a lot of these things map really directly. So I have an unreliable alien intelligence out there doing stuff for me. How do I know if I like it? I need some way to review what it did. I need some way to talk with it and with other people about what it’s proposing. And then when we like it, you know, we can accept it.
Kanjun Qiu (15:35)
Like accept its changes and discard the changes that don’t matter.
Geoffrey Litt (15:38)
Exactly. And I think actually one of the underrated reasons that coding has taken off as a use case for AI actually is the prior existence of mature tooling, like pull requests, for doing this workflow. I think in a lot of other domains, if you don’t have this stuff built up yet, you can’t just let an AI agent go do stuff to a really important shared workspace without any ability to see what it did, talk with it about what it’s proposing.
There are ceilings on what you can do there, right? And I think the more version control you have the more you can just kind of let the agent go do stuff. And so I think it’s a fascinating area. Now to get to your question, I would say when working on Patchwork, we mostly weren’t live editing together and coding. We were probably mostly actually working async. But definitely we were leaning heavily on branches. And I think, you know, what you were talking about reminded me of, I think a lot of products are struggling right now to reconcile the old way of Git with the new requirements. Parallel agents, more real time stuff. And I think it’s going to be interesting to see what does it look like. Do we reinvent version control from scratch for the new requirements? Do we layer on top of Git as a lot of products are doing?
Kanjun Qiu (16:59)
One thing that I’ve been sitting with is on the idea of version control. This may be not obvious from our website, but at Imbue, we really care about making software modifiable by the end user, because we think that basically it’s a question of control as we go into this AI future. Like today, we’re kind of controlled by our software, actually. Our attention is controlled. Our actions are controlled. We’re controlled by other people who are building these systems. Sometimes inadvertently, they’re trying their best. Sometimes very explicitly, they’re trying to maximize profit or engagement. AI makes this problem worse. But it also gives us opportunity because AI can write code. So a question I’ve been sitting with to your point—you mostly weren’t live editing with Patchwork, you are mostly working async, but you also want to be able to change things. And maybe sometimes those changes make it back in, and maybe sometimes they’re just for your local system. I think it’s really rare, systems like this, except for open source projects, not many systems like this exist. In your lived experience, why did you build live editing if you mostly weren’t using it? What was it? I feel like there’s something interesting in live editing and I don’t fully understand what it is and I’m really curious for your thoughts.
Geoffrey Litt (18:30)
Oh man, I think there’s like three separate topics to unpack there. I’ll start with the last one. So why live editing? I think it’s just what people expect. In some sense, it’s the most straightforward model. We get on a link, we’re looking at the same thing. Every kid expects that now in all of their software. They don’t know what files are, they don’t know about emailing. It’s just, everything’s live. And I actually think that’s a really lovely starting point for remote collaboration. When we get on a whiteboard, we can just draw. It feels really fluid and nice, you know? My view, and I think what we explored largely at Ink & Switch is like, it’s a yes and where you want that and you want the ability to go off in a corner and think about something privately without having your manager come in and stare at you, right? We call this creative privacy. I did a bunch of user interviews with writers talking about they feel observed in Google Docs basically, right? And so I think that’s the simple answer is that live editing is how the world works now. And so we got to meet people where they are.
I want to get back to something else you said, though, which is about this question of values and what software is trying to do to us, essentially. And I think that is a deeper undercurrent of malleability that we haven’t really addressed yet.
Cory Doctorow has this phrase, adversarial interoperability, which I love. He talks about things like ad blockers that are browser extensions, right? What’s happening there is that there’s this adversarial relationship where a website’s trying to push ads on you and you’re pushing back and using this technological capability to basically set up an environment that’s more in keeping with the way you want it to be or your own values. I think ideas like Bluesky algorithms being less centralized are also in this vein. And I think that is a very important part of the equation to consider when we think about barriers.
There are incentives that big corporations have to not let us change stuff because that’s how their business works. One analogy that I sometimes like to use is it’s more of a food court than a kitchen. There are these big companies that have their own agendas pushing a menu of choices at you. And in your kitchen, you have a lot more control over what am I trying to do with my food? What cuisine style, what health criteria am I trying to meet? And you have more of an ability to mold it to be in keeping with your values. So I think of the software app stores as kind of these food courts. I think that’s another big piece we have to solve.
Kanjun Qiu (21:09)
I agree. Yeah, it’s really resonant because Glenn on our team, he’s a prototype engineer, and he wrote about how it feels like we are in a world of vending machines right now. We get all these vended products, but in a truly open kitchen, we can change the kitchen layout itself and cook the food that we want. Earlier you talked about three barriers: technical, economic, infrastructure. We started out talking about infrastructure, but the economic barriers are ones that we think about a lot.
I’m curious at Ink & Switch and I’m happy to talk more about how we think about the economic barriers, but at Inc and Switch, did you think at all about the economic barriers or like kind of what’s your perspective on that?
Geoffrey Litt (21:53)
Frankly, we mostly didn’t yet, I would say. We were focused on how can we make a really awesome, malleable system that we wanted to work in. I think in some ways, the economic barriers are some of the hardest ones to work on in a research context, because I think ultimately companies with commercial incentives have to solve the business model piece. And I think my view of the world is that the technical and infrastructural barriers are big enough that they still really matter and researchers can make progress on that piece somewhat separately. I don’t know, I think the thing that comes to mind for me is I once did a deep dive into this system called OpenDoc, which Apple had in the early mid-90s, which is a cousin of a related system on Microsoft called OLE. And the idea was it was very malleable software-esque.
You could have these mix and match widgets in your documents instead of monolithic applications. And you could kind of buy these smaller widgets from companies and combine them with your existing software. And apparently one of the challenges they hit was, when something breaks, who do you call? A really nice thing about applications is there’s a box on your screen. If something’s wrong with that box and you have an enterprise support contract, call them and they’re on the hook. And the more you break things down into small units, you know, there’s basic questions of like, are you willing to pay for a tiny, tiny feature on its own and have a separate procurement for that? But also, who’s on the hook for integration work, you know? A lot of users value things working and paying for that. So I think those are some of the big challenges.
Kanjun Qiu (23:41)
Mm-hmm. Yeah, one of the things that we’ve been thinking about is, what LLMs do is they make code easy to write and replicate. In theory, in theory. At some point they will. And so to your question of like, you know, how do we get malleability but also software that people support? I think there’s actually some interesting space between closed source software that people pay for and open source software that is fully volunteer supported. Because kind of the point behind malleable software, one of the requirements is that you need to be able to modify the source code, probably. And so in that sense, malleable software has to be open source by default, or source available by default. But today’s open source environment is like free. There’s free software. And so like who’s going to support it? It’s like a team of really overworked developers, and they’re like maintainers of this project, and they’re all volunteers, and that sucks.
And so I think we’ve been playing with this idea we’re calling Common Source, which is between open source and closed source, and this idea that actually probably most of the important software we run should be run from a public commons of code, of common source code. And in Common Source, what we’re toying with is this idea of a license that actually you can get the source code, but you still have to pay the creator or the group of people who are creating the code. And so then that starts to answer some of these questions potentially of like, OK, well, you’re paying for maintenance, really. You may be paying a SaaS fee for maintenance and getting the things you want. Stuff breaks. You can stop paying them. So the incentives are aligned in this way. But at the same time, you still get the source code, so you can make your own changes. And if you diverge too far from the original project, well then maybe they can’t help you anymore. I think we have to make some changes to our assumptions around open source and the philosophy behind open source to get to valuable software.
Geoffrey Litt (25:51)
That’s a really fascinating idea. I love that. I totally agree that open source seems to be a natural prereq and that it raises these questions. I think it’s tricky because I love that perspective you bring. At the same time, I think the history of open source business models has been fraught with a lot of failures, and when we think about, okay, code is now much easier to copy. I mean, probably if you have the code, you can easily make a copy that is legally different, so I don’t know, it seems tricky.
I also think to your last point around, you know, divergence, I think this is a huge, huge challenge to figure out. If there is an ongoing software project that is shipping updates and I have my own version of it where I did my own thing, I mean, software developers know this as sort of like the fork maintenance problem, and it can be a huge pain in the ass depending on what you’re doing.
There are companies that maintain forks that have teams of engineers just keeping up with what’s happening upstream, so to speak. And I think this is something that I’ve thought a lot about in malleability. I think the root problem is if you treat diversions as kind of arbitrarily editing the code in any way, the problem of fork maintenance is really hard. Whereas there are other ways to factor it out. Like if you have plug-in APIs, you can say, okay, anyone can make a plugin, and we’re gonna try to keep this plugin boundary stable is one way to do things. Now, there are trade-offs, like often there are things you can’t do through the plugin API, so you wanna dig deeper. I think it does get tricky, but new ways of organizing software to be modular and compositional in different ways can lead to different abilities of people to mod it.
Something I’m really curious about is, if we progress towards a world where you have a lot of AI coding happening and you have people wanting to maintain forks with heavy divergence, maybe we just start structuring our code bases differently to treat that as the number one goal, basically. There have been some wacky research systems that have thought about programming with this as the number one goal and they get to very different structures than we’re used to. There’s a great idea called behavioral programming from David Harrell where basically his idea is what if a program is like a rule book, just a list of rules. And you just add rules and rules can cause exceptions to previous rules. So I might say, the red square can always move to that square, but then you could come along and add a rule that’s like unless that thing says five. And then you see how like we just keep adding and adding and adding to this ball more and more rules. And we never have to reach into existing rules and modify them. Maybe there are ideas like that that could change the game.
Kanjun Qiu (28:47)
That’s really interesting, so append only as the solution to divergence. So you actually don’t diverge. Yeah, that’s interesting. What other ideas are there around divergence?
Geoffrey Litt (28:57)
Another inspiration, there’s a common pattern in software of middlewares where you basically stack up these layers and you can always add more. I think maybe it’s the same principle in the end of just like the more that you can have additive modification without reaching in to touch the existing stuff, the better. I also frankly would just throw the AI hammer at it and say to some extent when you reach in and intrusively modify something, it’s gonna get messy, but probably 80% of the fixes that happen in fork maintenance are routine and don’t require anyone to think that much. They’re just icky. And so I’m very optimistic that AI will get to the point where it can mostly automate the easy stuff and then at least only leave the tricky hard stuff.
Kanjun Qiu (29:44)
Mm-hmm. Yeah, that’s really interesting. Principally, when it comes to handling divergence, we really have two options, I think. Okay, conjecture, made this up on the spot. So principally we have two options. One is maintain the internals of the system and then add more stuff such that the end behavior changes, so rules create exceptions to previous rules and the internals don’t change. Or like middleware, or more layers of abstraction also kind of does the same thing. You don’t really change the underlying stuff, but you’re adding more abstractions on top. Now you can do different things and more things. Plugins are another similar thing. Like you’re not changing the center, but you’ve got this API and you can add stuff. So, one paradigm is like don’t change the middle to deal with divergence, actually just have modular pieces on top. And then the second paradigm is actually, change the middle, and then use AI to solve it somehow. Something that we do a lot internally with Sculptor is we do test-driven development, where we write a bunch of tests. And it’s not magical like this yet, because when we try to write the test, the tests actually don’t capture the full behavior of the system. But in theory, you would have tests that capture much of the behavior of the system, kind of do a full refactor rewrite of the middle and then have it abide by some rules. And then that does let you modify the center.
Geoffrey Litt (31:17)
I like those options. I’ll throw in one more complication maybe, which is that once you’re talking about collaborating on shared software, think things get more essentially complicated. Single player software, whatever, I can have my own weird version and you don’t care. As long as I have a smart enough AI to keep up with updates, it’s not your problem. But now imagine we have team software.
So now it’s fundamentally a different problem where there are compromises that have to be made. People have to have shared practices around working. I’m fascinated by the question of how far can different people’s tooling setups diverge while still retaining the ability to collaborate and what kind of layering promotes that. Concrete example, many software engineers have a preferred development environment. And when you join a software team, you get to bring your favorite editor, typically.
And that works because code is stored in this very universal plain text file format. There’s a universal version control layer that most people use Git or whatever. You pick your system. And that’s just a file-based thing. So then whatever tools you want to use to edit your files, whether it’s Sculptor or like Vim or whatever, not my problem, right? And so there’s this really nice distraction boundary and we can still work together. That’s first of all, is not the case for most SaaS software.
There’s a deep, deep coupling between the data that you’re sharing with your teammates and the one editor that is allowed to edit that data. And secondly, I think it’s often tricky to even tell like, where could we draw that boundary? Could you use like Asana and I use Trello? Would that work? Could we sync them? I don’t know. There’s probably stuff that doesn’t fit, right? At Ink & Switch, we did this project called Cambria where we kind of took on this challenge at the data layers. We thought about if you were synchronizing data across really different apps, which want to store their data in different shapes, could you make some sort of glue that shuffles the data back and forth live as people collaborate? So you’re always kind of seeing as much as possible on both sides, even if it’s not 100%. I think there’s a lot to consider there.
Kanjun Qiu (33:36)
That’s super interesting. This is super interesting because on Sculptor, we’ve been thinking about apps as being separate from data. Like code is not data. And actually data has to be treated fundamentally differently. And with Cambria, you’re kind of like synchronizing data across these really different apps. And one thing I’m curious about, this question of like, can I use Asana and you use Trello is what is universal about data? Is a Postgres database that is structured with infinite columns somewhat universal? Is it documents that are universal with plain text? In all of your experimentation, what have you learned about data?
Geoffrey Litt (34:29)
This is a fantastic and very difficult question. What is the elemental material that if we just stored everything in X shape, then everything would work? I don’t think there’s silver bullet, unfortunately. I do think, though, the essential quality to think about is how structured and specific the data representation is. The idea of files generally is a pretty low level abstraction. It’s really just a sequence of bytes. That’s all you know. But you can layer ideas on top of that, like file formats, which have their own constraints, right? You can store like, for example, JSON as a file, which then adds more constraints, but JSON is also pretty general. And then you could say, I have this JSON schema, which says like only JSON of this shape. And I think you can have this progressive more and more specific layers and you can’t get everyone to agree on really specific schemas. It’s never gonna happen. At the same time, really, really low level abstractions, like, it’s just a sequence of bytes, good luck, are very open-ended and I think allow people to do too much different stuff that make it hard to work together. So I think we’re aiming for something in the middle. The Ink & Switch systems all run on this system called AutoMerge, which is a library for synchronizing JSON documents. That was like, JSON is the universal shape. There are different options. I think a Postgres database is perfectly reasonable other option. But I think that’s roughly the challenge.
Kanjun Qiu (36:04)
Mm-hmm. That’s really interesting. I have a bunch of thoughts here. One, it depends on the structure of your data a little bit. A Postgres database is good for data that is stored with identifiers and attributes of those identifiers. And JSON blobs are good for a slightly different type of data. What you said makes me wonder, though, if everything we’re playing text and people could actually diverge quite a lot. We now have these universal data processors, which are LLMs.
And so can we turn that underlying file into really almost any abstraction along this spectrum of like, you know, from like sequence of bytes to like JSON key values to like JSON with a schema to database to something else.
Geoffrey Litt (36:56)
I’m super optimistic about that direction of thinking. Many of the data interrupt problems in the world are just like the same information being represented slightly differently. And for those, LLMS, slam dunk. That said, you know, there are also essential differences. Like, in Cambria, the example we gave is like, if one to-do list app can assign multiple people to a task, but another one can only assign one person, there’s nowhere to show it. It doesn’t work. And maybe then you don’t realize that I’m also working on your task. So I think there are these tricky, essential things to keep in mind when we’re working on shared information together with divergent tooling.
Kanjun Qiu (37:41)
One thing that’s interesting about that example is it really separates the concerns at the data layer versus the UI layer. In theory, you could just store whatever data you want. You can store this task having many, many people assigned to it. And then at the UI layer, or at the app layer, you do something, some kind of post-processing to figure out who you want it to be for your app.
Geoffrey Litt (38:12)
Yeah, but if the app can’t show multiple people, you still don’t, you know…
Kanjun Qiu (38:17)
You still can’t see the underlying data.
Geoffrey Litt (38:20)
Yeah, and I think maybe you show more of the underlying data or maybe, I don’t know, like an AI comes in and modifies the other to do app and makes it show multiple people because you just need that. And if the other person doesn’t mind you, roll with that. I think it’s the essence of it is like when we’re collaborating together, we actually have to make compromises about how we’re going to do stuff. And there’s always going to be like a collective element there that like software, software can’t let us be infinitely individualistic, you know, and I think this actually gets to a broader malleability theme. We talk about this in the essay we published this summer at Ink & Switch around malleable software. The goal is not that everyone develops the full skillset needed to do anything to their software. There’s a long history of people working together with others.
With spreadsheets, for example, there’s been some really nice ethnographic research by Bonnie Nardi, who’s kind of a legend in the end user programming research community, looking at, how do people use spreadsheets in offices? And it turns out usually there’s someone in the office who’s really good at Excel. And when you don’t know how to do a complicated formula, you go ask them, right? But you can still do a lot of stuff yourself. And maybe you pick up a bit on what that person did and watch them work and you level up gradually.
And crucially, that person doesn’t work at Microsoft. They are in your context. They can sit with you. They know your problems. And so they’re much, much closer to the site of use than the site of original platform production. They call this pattern local developers. I think this is a really, really, really important pattern to think about and build around for these kinds of systems. I mean, we see it at Notion. There’s often someone in a company who is really good at Notion and sets stuff up for people, right?
That layer always exists. And it’s not a bad thing that it exists. AI might be able to help fill that role sometimes for some people, but I think assuming that people are working together to create shared software environments should be the goal.
Kanjun Qiu (40:26)
Yeah, that’s really interesting. In software development, there’s the same thing. Our CTO, my co-founder, Josh, is the expert in a bunch of different ways and helps people figure out how to build on top of the system that we have. Regardless of how good LLMs are, you, you probably want some kind of like expert. There’s still, maybe what you’re saying is like, there’s still this idea of levels of expertise with a tool, even if it’s an end user program tool like Excel or programming or like Notion. There’s levels of expertise and someone who has a lot of that expertise can actually do a lot of the quote unquote programming and set up the system for other people in their context to modify. it’s not just like a, to our data point, it’s not just like blob of data and then everyone does their own entire full stack thing on top of this blob of data. It’s like, okay, blob of data and then like, different people take it and mold it into what’s useful for their own context.
Geoffrey Litt (41:29)
The mental model that I really like there is this idea of a smooth slope from user to creator. It’s not that deep modifications aren’t hard. It’s that whatever you want to do, you should have to do the least amount of work possible to do that thing. And you can get slowly pulled into deeper stuff if and only if you want to. And you stop where you want to. I think this is very distinct from like our existing computing ecosystem. You can basically like use the thing, tweak some settings.
And then, if it’s an open source project, guess you could download the entire code base, compile it for five hours, learn to code, and it’s this insurmountable cliff. No one’s doing that, right? It’s just some approximation. And so how do we smooth out that cliff is one way to think about it. And I’m curious for your thoughts on, I think AI can help pull people up that cliff if the system is designed correctly. I also think it might actively prevent people from going up the cliff if it’s arranged a certain way. And what I mean by that is, if I ask my coworker, who’s the Excel wizard to teach me formulas, and they sit with me for an hour and we do it together and I see them doing stuff and we talk about it, maybe that’s a learning moment for me. Whereas if I ask the whatever Excel formula wizard to do it and it spits out something in five seconds and that’s wrong, but I don’t notice or, even maybe even if it’s right, you know, if it does the thing for me, what did I learn?
I actually lost learning them, you know? I think a lot about how can we set things up to be closer to that former, but I’m curious how you think about that.
Kanjun Qiu (43:05)
I think this is really one of the key insights about end user programming, is that there is a skill curve. There’s kind of this learning curve. And you had the gentle slope to tailorability, is what you called it. And in LLMs, something that we think about is there are kind of two pieces to tailorability. One is how how much the user understands. And then the other is how tailorable the system is. And you can modify both. Modifying how much the user understands is about education in a lot of ways. It’s about how do we make it so that it’s easy for the user to understand how to go closer to what they’re trying to do.
A concrete example of us experimenting with this in Sculptor is that there’s a beta feature called Suggestions. And it’s still very early, but it basically looks at your code base and suggests fixes, improvements, and refactors, directions you can go based on what it looks like you’re trying to do. And in theory, the suggestions, they’re proactive. And so they’re kind of telling you things about your code base that you might not know about. And they’re telling you things that you might end up learning. So we’ve had some users who are like, I didn’t realize that I shouldn’t expose my API key in plain text. Cool. Didn’t know that was a security best practice. Or like, I didn’t realize that I had like five copies of this function that were slightly different from each other. And actually, there was this better way of doing things that’s like the default standard. So that kind of proactive teaching, I think, could be part of a system that is an environment, is like an end user programming environment.
The ambitious way I think about Sculptor is like, if we could make this into an end user programming environment, that would be awesome. On the system side, how do you make the system, outside of user education, how do you make the system actually more tailorable? I’m curious for your thoughts here, but I was thinking about interfaces and how some interfaces feel like they might be more amenable to tailorability than others.
For example, this might be a terrible example, and might not actually satisfy this requirement. I’m going to try it anyway. And I’m curious what you think about more tailorable interfaces. But for example, the other day I was using MailChimp, and I was trying to send a plain text email. And I could not figure out how to send a plain text email in MailChimp. This is extremely difficult. And I was like, man, it would be really nice if I had a retrieval UI where I could send some messages in chat and it finds the API endpoint that is the plaintext email function and then gives me a UI that is the plaintext email. That would be really nice. Then I could learn, OK, do you have a plaintext email endpoint first. And second, if you don’t, then maybe that would be an entry point for me to build one, something like that. So if an app is no navigation, none of these other dependencies, it’s like retrieval only, like only retrieve API endpoints that take actions. Like maybe that lets me like build more actions on top of the systems, the system. I don’t know. What do you think?
Geoffrey Litt (46:37)
I think you’re getting at a really big question, which is how are UIs going to evolve in this new age? I think we might have talked a bit on Twitter about this too, like navigation-free apps or whatever. Yeah. So let me get at your question indirectly. So I promise I’ll come back to it. So I think command lines are really interesting. We’ve left them behind for good reasons. GUIs are better in a lot of ways. But there was a really interesting quality that command lines had, which was that when you do stuff manually one time, it’s the same way you do it if you want to automate it or build on top of it. While you’re in the course of normal use, you’re kind of picking up this underlying structure that ends up being really useful if you ever want to build on top of the thing. Someone had this great phrase, like a CLI is like a mediocre GUI and a mediocre API both. And that’s what makes it great, which I think is really lovely. What you’re talking about, I think a big problem with GUIs is that they lack a lot of hooks and compositionality for building on top of and going further with. They kind of tend to actually not really expose you to what are the underlying things I can actually do in the system and how could I recompose those in different ways? And so I think that’s a big question and challenge for me is, can we retain the benefits of graphical interfaces, things like discoverability, things like data visualization, which I think is really underused in a lot of LLM interfaces showing this stuff. But can we also figure out how to make it obvious that you can go further than what this one GUI lets you do and let you in on the internal structure? One concrete starting point could be, in a lot of power user apps like Photoshop, when you do stuff, there’s an undo stack that shows you everything you’ve done in a list. So it’s reifying actions you’re taking as steps. And then that’s the building off point for macro recordings and kind of like automations. And I think I wonder, could we have more computing environments where as you do stuff, you see the things you did as things? And then you go from there.
Kanjun Qiu (49:02)
Mm-hmm. That’s really interesting. Building on that a little bit, something that we think a lot, we work on AI agents, agents take actions, and I think there’s a difference between displaying information and taking actions. And the description, actually what you described about a CLI as a mediocre GUI and a mediocre API is really interesting because CLI tools are primarily for taking actions. They’re not very good for displaying information.
GUIs are really good for displaying information and they can be good for discovering actions, maybe like taking action sometimes like if you’re trying to figure out what action to take then maybe you can kind of like play around with the information until you figure out what action to take. But the taking of the action in the GUI is not great. It’s not very composable. It’s like, you know, not very like automatable.
And so if we think about displaying information and taking actions as two separate things, then it makes me wonder, OK, your point about the undo stack is interesting because that’s a sequence of actions which could be turned into a CLI tool, in theory. And the question really is, OK, what’s the input into the CLI tool? Unfortunately, sometimes the input involves looking at a bunch of data and analyzing it and visualizing it in a GUI form or something like that.
But there’s some processing that goes into the input, but the action itself can be a CLI tool.
Geoffrey Litt (50:27)
Yeah, totally. I think, you know, now I wrote most I write most of my CLI commands by just telling an AI what I want to do. And then it writes some really long command that I don’t fully understand. I hit enter, right. Which I should pay more attention to. But I think you’re really getting us something. It usually works, right?
Kanjun Qiu (50:46)
Yeah, exactly.
Geoffrey Litt (50:55)
I think like we are all figuring out one interaction models make sense right now. And I think you’re getting at a couple of important things, which is that for commands and actions, think language is actually really good for saying what to do for the most part. And then for the return path from the agent, I think for some things, maybe two-way voice conversation feels good, but for a lot of things, having visual aids helps. So deploying the full field of graphic design and data vis to show things.
When you ask Siri for the weather and it shows you a weather card, this is a version of this loop. So think that’s a really powerful basic loop. And then the one thing I want beyond that for some use cases is a shared locus of attention, like a desk we can both point at and work on. So that might be as simple as telling the agent, edit this code in Sculptor. Conversely the agent saying like, did you notice that this line is weird? You you’re kind of sharing this this thing.
Kanjun Qiu (51:59)
Yeah, you have like a shared space you’re both looking at.
Geoffrey Litt (52:02)
And you can point at it. That combination, I think, ends up being pretty good.
Kanjun Qiu (52:07)
Hmm, interesting. An idea we’ve been toying with is that agent-human collaboration and human-human collaboration might not be such different things. Perhaps you can design for both of them. One of the principles in Sculptor, one of the design principles, is everything you can see, the agent should also be able to see. It should understand how it works. It should see your entire UI. If you tell it something and you’re referencing a part of the UI that’s not the chat, it should know what you’re talking about.
And same with like human-human collaboration. To your point earlier about like real time is just what people expect. Like I think maybe like two humans want to both be looking at the same surface and the same information. Otherwise it’s actually quite hard to communicate.
Geoffrey Litt (52:52)
I really like that principle you just brought up. I think to a large extent aiming for human-human collaboration as a gold standard for a lot of stuff is actually a great goal. I think there are other patterns that can make sense sometimes, but even just looking at human-human, like if you and I are sitting next to each other pair programming, there’s a lot going on. You know, very simple stuff, like you can point at things and see my screen and I know that you can see my screen and there’s not any weird question of what can you see. So there’s a lot of good theory of mind going on.
But also I think there’s much deeper stuff. Something I’ve been thinking about lately and I’m curious for your thoughts on is you can tell if I’m really busy and stressed because we have a launch tomorrow and I just want this fricking button to work. And I’m like, Hey, Kanjun, can you fix this button for me, please? You’re not going to launch until like an hour long lecture about like the philosophy of like how we think about buttons, right? You’re just going to like help me out because I’m in a bind. And you know, it might be a totally different situation. Like it might be my first day at a new job. And I’m like, man, like I’ve never used this programming language before. Can you like show me around a bit? And I’ve always felt like computers by defaulting to having so little context about us and our environments compared to human interactions are at a real disadvantage where they can’t sense these things. And so they rely on us to give them that context through our prompting in the AI era, but we’re not very good at giving them all the context that they need. And so we end up in these weird mismatches. Particularly along that dimension, I just brought up, how do you know how much to bring the person along and help them learn themselves versus just do it for them? When should they be brought along? When does it matter?
I don’t even know myself and as a programmer, I’m often very unsure how much I should be getting in the details of the thing. Even if the AI can do it perfectly, there’s some intangible benefit to me being in the details. When I’m UI prototyping, for example, I might have new different ideas from knowing how it works, for example. And so I don’t even trust myself to know how much I should be in the details.
How do we do this?
Kanjun Qiu (55:20)
That’s a really interesting question and direction. When people talk about AI slop, AI slop is this lack of taste in a way. What you’re pointing at that’s really interesting is the more you understand how something works, how your system works, how what you’re trying to build, the UI you’re trying to build works, the more taste you have for where it can go. there’s this taste comes from depth and a depth of understanding of me as the human that, and like, it’s so weird because like AI systems have no taste and yet they know everything. So it’s not about knowing the thing. It’s something about like preferential attention based on the details that we’re seeing that like serve what we’re trying to get at or something. I’m quite confused about this topic.
Geoffrey Litt (56:13)
I think, yeah, I think a lot of people have a very incorrect mental model of how creative work happens, which is something like there’s an idea in your head, you just got to somehow like get it into the world as it is in your head. And if you could just do that, then it’s done. And so in that model, like all you need to do quote unquote is like describe the idea perfectly. And then someone else, something else can just go to it. Right. That’s not how it works.
Kanjun Qiu (56:38)
That’s not how it works. Not at all.
Geoffrey Litt (56:43)
Creative work, open-ended work, like anyone who’s really deep in it knows that there’s this conversation happening between you and some medium that you work in where the idea is being shaped as you go. Working with the medium is changing your conception of what you want. There might even be like accidents that happen that are cool, know, spark new ideas. And, you know, to some extent, maybe some of it’s even muscle memory, right? So it’s possible, for example, that like a guitarist composing a new song might not know what chords they’re about to play. Their fingers just do something and then they hear it and they’re like, that’s cool. Right. So when you, when you start digging into that, I think it, it raises a lot of questions about the role of AI in that process. And I think a lot about this in my own creative practice, which is mostly professionally UI prototyping. Like, I use AI coding a lot. And I think, at its best, can really speed up feedback loops that weren’t essential for me to be in. And that lets me make progress faster in this exploration. At worst, it cuts off a whole process that I would have been in myself of creative exploration because it just, I say what I want and it makes one bad thing. And I’m like, oh man, that’s not good, but like you made the whole thing and now I just, can’t unsee it, you know?
Kanjun Qiu (58:12)
Do you really feel that? That’s really interesting.
Geoffrey Litt (58:18)
Yeah, totally. That’s happened to me. I’m like, oh, well, it’s done and it’s terrible. Like, whatever. And I think it’s a very sensitive. I’m an AI coding optimist in the sense that I think it can be a huge accelerant. I use it a lot, but I think we have to be very clear about like, we’re all changing our creative media and that’s going to do something to our creative practice. And I think the people who are worried about, especially AI art are totally on to something there.
Kanjun Qiu (58:41)
Mm-hmm. This is really interesting. The thing that I’m afraid to say when I talk about Imbue or AI agents is I think of us as trying to upend the economic system in a way and something about the way that things are working. Because fundamentally, AI systems and AI agents are a source of power. And they become that source of power by basically being the way thinking happens. And so what you just said is like, the system thought for me and now I have this thought, but like I didn’t have the intermediary thoughts before I got to this thought. And because I didn’t have those intermediary thoughts, I couldn’t get to a different thought. I only got to the end thought. So like, it’s quite concerning.
Geoffrey Litt (59:30)
It’s very concerning to me.
Kanjun Qiu (59:39)
I’m curious after you got the end thought, if you went back to the intermediary thoughts, like, okay, you got this bad UI from your LLM. Can you go back to the intermediary things and like try and understand it better, or can you truly not unsee it? Is there some property? Like I’m curious about your personal experience here.
Geoffrey Litt (59:55)
Yeah. I would say that for me, it’s a very emotional process. So it’s not just like a logical thing. There’s like an excitement factor, a momentum factor, you know, like, oh yeah, like we’re getting somewhere factor. And again, the tricky thing is that AI often really helps with this. Like it preserves momentum and avoids roadblocks that would have killed the vibe, you know? So like it’s great when it works.
But when it doesn’t, yeah, it’s not really about literally being able to unsee it. It’s that it changes my emotional relationship to the process in a way that makes me not as excited about doing it anymore.
Kanjun Qiu (1:00:42)
Yeah, okay, so maybe what happened is like it came up, it came out with a bad idea and it killed your momentum. And you’re like, I thought that there was something interesting here, but I guess not. Maybe I’ll move on to something else.
Geoffrey Litt (1:00:52)
Exactly. Yeah. And I’ll never know what if I had done it myself, would I have come up with something. And in fact, even when it comes to, in some ways, when it comes up with something good, it’s even worse because sometimes I’m like, this is pretty good. Like I have a few little tweaks, but good job. And then I’m like, wait, what would I have done, like would I have done better? I don’t know. And I’m not going to spend the time to figure out anymore. So, you know, I think one, one like mental model, I try to use is there are kind of things that I care more and less about that I see as more and less core to like who I am or what I work on. And so the less core it is to me, like some disposable secondary tool or something that I wouldn’t have built without AI in the first place, I’m okay just like being very free with it. But then the closer it gets to like my core practice, I feel like more of a urgency to be really critically reflecting on what’s going on and kind of not going too far, yeah.
Kanjun Qiu (1:01:57)
Yeah, it feels sometimes to me, I resonate with this a lot. I tried using like GPT 4.5 for writing and 4.5 was like the first model that was actually quite good at writing. And for like a month I was like really happy. I was like, oh my God, my writing process is amazing. Like I’m getting so many more ideas through like all in flow. There’s no writer’s block. And then I like zoomed out for a week, stopped working on the piece I was working on. I came back and I was like, wow, this is like not me at all. And I rewrote the whole thing, no LLMs. And the really interesting reflection, I feel like there are almost kind of like thought Schelling points. And because these systems are distributions, they actually produce the thought Schelling points that are highest likelihood. And because they’re high likelihood and they’re Schelling points, you kind of like just end up there and it’s like really hard to get out of them. They’re really tempting. Yeah, they pull you in exactly. And so now you kind of like end up in this like weird groove and it’s really actually hard to get out of it and be creative like takes stepping away.
Geoffrey Litt (1:03:14)
Yeah. I mean, another manifestation of this that I’m curious about with Sculptor is, I’ve been playing with parallel coding agents a bit and finding them really interesting. I’m still learning how to use them. Something I found recently, I went overboard for a day. So was like, oh my God, this is amazing. I had two projects that I was working on and I had to pick one to work on for the day. And I was like, you know what? I can do both. And so all day I was just kind of flip-flopping between these two projects.
You know, it kind of worked, but I felt really off at the end of the day. And what I realized was, man, like, I’m not sure that I did great work on either, because actually, even with perfect implementers doing stuff on both projects, it’s not like an AI coding quality challenge. It’s like a my mental bandwidth challenge. Like, if I’m really creatively leading these things, I can’t multitask, actually. And so there’s a different bottleneck, which is me and my brain. And I think like I’ve been trying to reflect on, so what do I do with that? You know, maybe I only parallelize within the same project or on the same area. Maybe I have one main thing I’m thinking about and then I have armies of bots doing all the maintenance and bug fixing and stuff I don’t have time for. I don’t know, but I’m curious your thoughts.
Kanjun Qiu (1:04:35)
That’s super interesting because yeah, I always recommend never work on two projects at the same time if you’re trying to do something creative with these agents, with these parallel agents. It’s interesting because I resonate with you, what you’re saying a lot. I think when you’re doing something creative or researchy with software, maybe at least for me, like what I’m trying to do is explore the space and evolve my thinking and understanding of the problem as I’m building. That’s like a very abstract, you know, yeah. But I’m evolving my understanding of the problem, of what I’m trying to do. And so with parallel agents, like, I don’t know, in Sculptor, one thing we’re trying to optimize for is divergence instead of convergence. So recently we shipped a feature. This is in beta right now. You can turn it on in the settings called Forking. You can Fork an agent. And so now you can like, you know, you like had this agent build a UI. You didn’t like the UI. Go back to where you started and be like try something totally different, like try this instead and then try also this. And it’ll like snapshot your agents current state all the context and like fork it into a bunch of different tasks And one thing I really like about this…
Geoffrey Litt (1:05:50)
Yeah, no, I’m excited. Keep going.
Kanjun Qiu (1:06:01)
One thing I really like about this is it kind of gets me out of this groove we were just talking about. There are many ways to end up in that groove. One way to end up in that groove is I’ve built up some context. I went down this path. I’m like maybe debugging some minor detail. Now I’m really annoyed because like all of this debugging context is in the context. I need to like get it back somehow. And like, I’m like in this weird groove. Another way to get into a weird groove is, it had a bad idea and now I can’t get it out. Like I can’t get back to the place where it could have generated good ideas. And so yeah, the forking thing is really about like, how do I help the user get out of grooves so that they can do really divergent thinking and divergent things and not have to try to wrestle with the agent to get the agent out of these grooves?
Geoffrey Litt (1:06:44)
I love that idea. I’m a huge fan of that way of thinking. And know, it’s funny, it’s coming back to version control actually. Like these questions of like, how do you structure divergence to work and like even see it and how do you encourage it is really, it’s like a tooling problem, I think. Yeah, I mean, something I’ve wondered about is also having more structure to the divergence. So what I mean by that is not just like try three random things, but let’s say I’m like, I want a to-do app. What if an agent said, okay, so one thing you can have the agent do is give you a big questionnaire, right? Like, should it be really simple or really complicated? Like, should it be for work or for personal life? And you just go through and answer these 10 questions or whatever. I think that’s what a lot of, that’s like the current state of the art, I would say, specification is answering a bunch of questions.
And it’s fine, but it’s pretty tedious. And it’s also, I think, not how real design processes often work best. Often the way things work best is by looking at a few and saying, like, I like that one, and then talking about why. So something I’ve thought about is, once the tokens are free, can you just generate like 100 to-do apps? But not randomly. So the agent would first think about, OK, what are the dimensions that the user might care about? Let’s set up a design space along those, you know, three, five, eight dimensions, whatever. Let’s take some guesses on what they might want and maybe pick some, bunch of app points in that space around there. And then it’s also try some wild cards, you know, really crazy options, pre-generate a hundred apps. And then when we come back to the user, we’re like, okay, let’s just start a conversation. Let’s show you some options. You want it to be more X, we had that ready already, you know? It sort of would be more like, jamming with like a design consultancy except the feedback loop is like in seconds and not weeks, you know? But like more playing with options.
Kanjun Qiu (1:08:50)
Yeah, I think this is really interesting. There are LLM tools out there. When you ask deep research to go do some research, the first thing it’ll do is ask you some questions about the query that you asked. And whenever it asks me these questions, my answer to all of the questions is like, yes to all. The questions are useless. And I was reflecting on why are these questions useless? It’s because it’s not actually questions I want to answer. It’s more, I want to see some output and be like, I didn’t like this part. I want more of this. What we want is for the LLM to like help us like understand the problem better. This goes back to what you and I were just saying about like the creative process is about understanding the problem space, what we’re trying to solve for or trying to do better as we go and like to be able to create and like move in that direction easily through the medium.
Geoffrey Litt (1:09:44)
Yeah, I love that you’re thinking about encouraging this in the tool because I totally agree with you that often, even if it’s technically possible to do this somehow, once you’re in the groove, kind of, you feel stuck unless it’s easy. There’s a couple, there’s a couple of beautiful systems out there that play with ideas of spatial canvases as ways to visualize that branching. So like in, not just the LLM chats, but also like creative media, there’s a system I love called Spellburst. My friend Tyler Angert and some folks at Stanford worked on it. It’s just a spatial canvas and you make these little like art sketches and then you can hit a button that makes a bunch of forks off from that one and you basically try a bunch of things and then you’re like, I like that one and then let’s diverge from there, right? And so you can kind of like explore but you see all the variations spreading out in this tree and I think that sort of thinking can be very generative.
Kanjun Qiu (1:10:39)
That’s super interesting. Yeah, we’ve struggled to figure out how to represent forking, like forked agents. And I don’t know if a canvas works when it’s not so visual. You can’t really visualize code very well. I want to see at a glance where each fork is going, but it’s really hard to do that with code. I’m curious when you thought about malleable software and helping the user not only make changes, but this goes back to the question of what kind of infrastructure, technical infrastructure allows the user to actually explore? I’m curious if you have any thoughts on this.
Geoffrey Litt (1:11:11)
The fascinating question. I agree with you that inherently visual media are a much more obvious fit for like a visual canvas or something. What you’re getting at really is how do you give the right feel for what a piece of code is in a concise visual way? I think one dimension that I’m always thinking about is it’s really hard to tell from looking at code how solid it is.
This is a big problem in malleable software we found in Patchwork because when a company ships a piece of software in the app store, there’s like a minimum bar it’s hitting, right? You hope. That might not be true anymore with LLMs honestly, but there’s someone’s charging money for it. It should be at a certain quality bar. If I just made a tool for myself and I vibe-coded it in five minutes and it works for me, what do you make of that? Do you want to use it? Like, it depends. Probably you wouldn’t want to just wholesale adopt if it’s really important. You might be okay playing with it. But it’s hard to tell often from the outside which one it is. And is this thing even maintained? You know, I think people look at like GitHub stars and commit histories, for example, as like a these sorts of signals of life. right. I think if we have way more software and it can be produced way more readily by people who don’t know what they’re doing, you know, there’s gonna be more bad software out there, which is not necessarily a problem. There’s a lot of bad spreadsheets out there and it’s fine, but I think you need to be able to tell. I kind of wish you could see like, you know, an analogy I like is, you know, is this like a balsa wood model of a bridge? Or is this like the Golden Gate frickin’ bridge? Like in physical media, it’s really, really obvious and you can never get confused, but with software, it’s not as clear and could we make that clearer somehow?
Kanjun Qiu (1:13:03)
This is really interesting. One of the prototypes we’ve been playing around with in Sculptor is this idea of a report card for your code. So can we give you, yeah, like, I think it’s actually a really hard, I mean, having worked on it, it’s a really hard problem to answer. What goes into the report card? How do know if a piece of code is robust? Depending on what you want to do with it, like, you might want it to be robust in different ways. Maybe you want it to be more extensible, or maybe you want it to be, like, really well tested. so, but, you know, to your question of, if I’m in this world of malleable software, there’s a lot of forking, there’s a lot of divergence. How do know where to build from? What is safe to build from? What are the proxies for that? I think that’s a really good question.
Geoffrey Litt (1:13:45)
I’ll throw another idea, which is I wonder, one of my beliefs around divergence and versioning is that there’s a lot of meta work around the work that humans find tedious, like writing pull request descriptions. And getting a pull request that has a really, really good description is much easier to review, but it’s a lot of work to produce that.
I think AIs, we should be pushing much harder than we are to produce amazing review experiences and artifacts. It’s the kind of thing, you know, we should be the most spoiled managers in the world where like our reports are coming to us with like these, they’ve spent like weeks preparing this presentation about this tiny bug fix. That doesn’t really matter, you know, is the feeling we can have, like we can spend the virtual time for them to do that.
And whether that’s, you know, coming up with like maybe, yeah, if it if it makes 100 apps for me, maybe it should make like a 3D world where I can like browse the different apps and see how they’re different from one another. Maybe it’s just like a really, really good deck that like walks me through them and explains the differences as like a PDF. I don’t know. Like if we gave like a high school or college intern the task of explaining these 100 apps, what would they do? I don’t know. But I think they can get extremely creative.
There’s one project I love called Quickpose, I think it’s called that by Eric Rawn, where like they did versioning on a spatial canvas, but it was more like a whiteboard that you got to draw on and arrange the versions yourself free form. So they found the artists would use it. I think they did some tests with artists and you it would be like a cluster of versions over in this corner is like the ones that had this intriguing property. And then there was this offshoot over there was really weird. And then this is like our mainline exploration and you can like actually arrange them and label them and describe them yourself. So imagine like if the AIs is diverging, could it like make a poster for you of what it tried and how it all fits together?
Kanjun Qiu (1:15:48)
That’s super interesting. Two really interesting things in that, one is the malleability of this canvas where artists could arrange the versions themselves is in itself a malleable software property where you want the end user to be able to explore by arranging the explorations and reasoning and thinking through them. So that’s really interesting. The other thing that you said that I thought was really interesting is this idea that we’re really under focused on presenting results from the AI. Like the LLM just dumps a bunch of text at you. Why has it not thought about how to present it as a slideshow or as like a, you know, as a better presentation. That’s really weird.
Geoffrey Litt (1:16:33)
It’s wild. I also think, you know, so I think this is going to become much, much more important very rapidly because so there’s one argument you could make, which is like, you know, the answer is to get so good, we don’t need to review. But I think that’s totally false. Because what I’ve observed in coding is they get better. So I give them harder stuff. And in fact, it’s almost the opposite problem where I’m giving them more and more stuff that’s more and more important that review becomes and they’re going off and doing stuff that I’m not even in the details on anymore. So the review step becomes more more critical over time. And I think it’s headed towards the world where most of my time is reviewing. So really, the quality of that review experience, whatever it is, to let me quickly, happily, and correctly tell, is this good? And what do I want to change? I think that’s the whole ballgame for interfaces for using these things.
Kanjun Qiu (1:17:29)
I think that’s really interesting, and I really agree. I would actually, a way I’ve been thinking about it in my mind is like switching from the term review, which is part of this like industrial process of software, as you said earlier, I love that term, switching from the idea of this as review to the idea of this as like being part of the medium of working with coding agents or AI agents. Like review is part of a powerful medium for agents because the medium doesn’t really work without this step of understanding what is going on and where to steer it next. And as you said, like, yeah, the more we use it, the more critical things we give it. And the more important this piece of the medium is, it feels like.
Geoffrey Litt (1:18:14)
Yeah, I guess I love that reframe. I think like you could think of it two ways, right? One is a human to human analogy of like we’re jamming and like it’s not like a quality assurance step. It’s more like we’re working together. You know, I wouldn’t like frame like if you brought me some work and we were working, I wouldn’t feel okay, like time to like check if you did what I said, it’d be like, you know, we’re riffing, right? So maybe that’s one way to think about it. But another is something about the ideas we’re talking about are visualizing, having thinking about non-human interactions. If a potter is forming a piece of clay into a pot, they’re not reviewing suggested pots. There’s just a loop going where the clay is becoming something, and they’re reacting live, and there’s a loop going. And I almost wonder if we could get to the point where crafting software feels that way, where there’s some representation you’re working with that feels like you can just directly touch it.
And it’s not a language interaction. You more see it coming together and you’re like, yeah, you’re plucking it into a pot. And I think that’s very, very obviously possible for shallow UI design. like, I should be able to move UI elements around. I shouldn’t be telling a model like, please move that box three pixels. You know, that’s ridiculous. And that hopefully will get solved. Although our civilization has taken backwards progress on that since the 90s. But the harder question is like, What is that for logic? I don’t know.
Kanjun Qiu (1:19:52)
Ugh, this is like the thing I want most. How do we turn software into clay? I don’t know. Yeah, what do you think about that for logic? It is kind of gets at Bret Victor’s Dynamicland, like how do we tactically feel what’s happening in software?
Geoffrey Litt (1:20:09)
Yeah, since you mentioned Bret, Bret has this great essay though, Up and Down the Ladder of Abstraction, which I think has a lot to say about how do you see this map of a very complex space and navigate through it to find the place you want to be in that map? I think that’s a really beautiful idea that could be brought to what’s my map of my 100 to-do apps and how do I find the one that I want? Going up into very abstract land and then jumping down into like demoing concrete ones.
Another inspiration that I think about a lot is Michael Nielsen’s work on AI, I forget, I guess it’s like artificial intelligence augmentation, think is, yeah, where, you know, I think he and Sean Carter had this piece with these sliders which are changing very deep conceptual attributes of a font typeface that an AI, I think learned probably with an unsupervised learning algorithm or something. Basically, I think it was pretty simple. You are moving in latent space of fonts with a slider. And that does make me wonder, what is the equivalent of that for code?
Could you do some of the Anthropic steering vector stuff on code gen and then you choose the steering vector? Is there a complicated slider that you can just drag and the app gets more complicated or simpler? I don’t know.
Kanjun Qiu (1:21:46)
It’s really interesting. It makes me think about, if you combine the slider idea and the abstraction ladder idea, you kind of drag the slider up and down the levels of abstraction. You can modify at each level of abstraction, and then the full abstraction ladder is regenerated for the new app. And now you can move down or up again so you’re like moving at whatever level you want. So like, okay, I want to actually like totally change this to do app in order to be this way versus like, I want to change this like tiny function in this to do app to like do this slightly different thing instead.
Geoffrey Litt (1:22:24)
I love that. I think this is a very ambitious vision we’re sketching out here. But yeah, I think this is a very different path than the one that the industry’s on right now, which is mostly just a bunch of natural language in the groove chats.
Kanjun Qiu (1:22:29)
Yes. I’m curious, you know, looking back at the research that you’ve done and where you are now, what insights do you feel like may be overlooked right now? Or like, what kinds of things do you think people are not paying attention to that they should pay more attention to to get the future that is more empowering, agentic, free?
Geoffrey Litt (1:23:05)
I would come back to the infrastructure piece of malleability. I think that people are, obviously everyone’s excited about AI making professional software development more productive. And I think some people are excited about personal tooling around AI building software for us. I think people haven’t realized how much the existing ecosystem is not prepared to support that. And when you think about the most basic questions like, if I wanted to add a feature to Airbnb, what would I do? People are like, wait, what? That doesn’t even...
Kanjun Qiu (1:24:46)
I can’t even compute that.
Geoffrey Litt (1:24:47)
I wrote this little tongue in cheek story essay thing where it was a conversation. was like an imaginary conversation with this mysterious wizard. And the mysterious wizard says, I want to schedule a weekly seminar, so it’s on my calendar and I just want to like figure out how many attendees there are and order the right number of pizzas for them automatically by Uber Eats. And this apprentice person is like I’ll just vibe code a new app that does it. And then the wizard is like, no, no, no. Like, can you add the button to Uber Eats, please? And the apprentice’s brain is like, pff, I don’t know, I can’t add that button. What are you talking about? So think there’s just like this really like deep change of perspective that hasn’t fully been internalized. When you bring the cost of editing code down by as much as it’s coming down, what makes sense to build around that is just not on people’s radar, I think.
Kanjun Qiu (1:24:43)
Yeah, what do you think? If you could list out every piece of infrastructure, this is awesome because it’s something that we’re thinking about. A way I think about Imbue is we want to build the public infrastructure that’s necessary for malleable personal software. So what is that? So if you had a wish list, what’s on the wish list?
Geoffrey Litt (1:25:00)
I think it’s a lot of what we talked about. mean, it’s like so many of the ideas, it’s all software ships with the editor for that software. It’s all software is live modifiable locally. The live modifications can be instantly live shared with your collaborators. But there’s awesome version control so you can diverge and converge as needed. You have really awesome data infrastructure that’s really easy for random individuals to run and doesn’t require corporate skill and enables modern collaborative apps.
A lot of those elements you could imagine coming together and essentially some sort of new operating system or platform. I think over time, I expect that the pressure towards personal software will be strong enough that we’ll start to see this emerge in some form. But I don’t know quite how.
Kanjun Qiu (1:25:51)
That’s really interesting. It’s really interesting because when we were building Sculptor, one of the things that we’re thinking about is, what if the software you’re building ships with some Sculptor environment so that the end user can edit it and see the live edits in real time and share the code base with someone? And something like that. It’s not quite there. I’m not quite sure how to do it because the version control problem is really hard. The data versioning problem is also really hard.
Geoffrey Litt (1:26:23)
Yeah, I mean, think who pays for the AI edits that are going to happen from the users? Like currently, AI code editing is economically viable because a lot of people doing it a lot are making software that ships to millions of people so they can get paid a lot to do it. How does that work? I mean, think there are like, there’s a lot of questions.
Kanjun Qiu (1:26:46)
For the infrastructure piece of malleability, is there anything in terms of cultural norms or the way that we think about software or the way that people exist or communities exist that you think either are changing or need to change as we go into this future?
Geoffrey Litt (1:27:08)
I’m really glad you asked that. This is actually one of the reasons I care most about malleable software. It has less to do with software and more to do with how people feel about their relationship to the world. There’s a Steve Jobs quote that I love that goes something like, the moment you realize that the people who made all this stuff were no smarter than you is when you can start actually changing things.
Kanjun Qiu (1:27:29)
Everything around you can be changed.
Geoffrey Litt (1:27:31)
Yeah, I think that’s a really, powerful mindset. And that is a mindset that’s cultivated in people in response to an environment and in iteration with an environment. think like disempowering environments create disempowered people who have learned to be helpless. And I think there’s like a general trend where, you know, narrowly, like you could look at examples of like cars have become a lot harder to understand.
You can’t really take apart an iPhone. There’s less comprehension possible in the world. And I also think software, because it’s not malleable typically, the more time we spend in digital environments, the more time we spend in these places that we think of as prefab corporate environments. The thoughts of what to change doesn’t even occur to us. I don’t think about how should we decorate this podcast meeting room that we’re in, because I can’t change it.
Kanjun Qiu (1:28:30)
We’re just consumers.
Geoffrey Litt (1:28:32)
We’re just consumers. I think the more time you spend in places that cultivate that mindset, the harder it gets to have agency in the world. So I think there’s a double risk here, which is that with AI, think some of the conversations we’ve had around being in the details and understanding things, if there’s less of a need to even understand things, to minimally get through your life, I think it ties into this general trend of there’s a revealed preference towards convenience and we all, myself included, choose it often, but that can have long-term consequences both for ourselves individually and as a society. And so I do think it’s one way we can work on this, I think, is make software a place where people can exercise their will more and they’re encouraged to do that. And I think that’s maybe one way to stem this tide a bit and may perhaps even start a virtuous spiral where kids come to feel that they can do anything and they will be right because they can.
Kanjun Qiu (1:29:37)
I love that. I resonate with that a lot. Our digital worlds right now are disempowering environments. We are at the mercy of them in a lot of ways. We can’t really change them very well. We don’t have that much agency. And because we spend so much of our lives digitally, we end up feeling disempowered in our lives. so it feels like as we go into a world with AI agents, this can get worse. These are agents run by other people with their incentives that are now taking actions on our behalf. And so it’s even more disempowering, some of the things that we talked about. The core cultural shift is how do we go, you know, how do we rekindle the original idea of the personal computer, the original dream of like these systems as like, manifestors of our will and what we want in our lives.
Geoffrey Litt (1:30:27)
Exactly. I think that’s a great closing note in a way to bring up that original vision of, you know, people like Douglas Engelbart and Alan Kay, their original vision was the personal computer is precisely this empowering thing. That’s why it’s personal. And so I think if we can find ways to get back to that, the world will be a better place.
Kanjun Qiu (1:30:47)
Cool. Well, I think it’s possible. I think we’re at this turning point right now where software can become personal and we do need these pieces of infrastructure that make it possible. like we could change the economic incentives around it because now everything is so replicable. And so we’re at a good time.
Geoffrey Litt (1:31:09)
I agree. Let’s do it. I just joined Notion, but Notion is one of the players trying to make it happen, right? Like, build that platform. I think there’s, it’s not going to be one winner. I think there’s going be many platforms that enable this philosophy of personal software in so many different arenas. So yeah, I’m excited. Let’s do it.
Kanjun Qiu (1:31:23)
I agree. Awesome. Well, thank you so much, Geoffrey. This was really fun and a great meandering dive into all of these different ideas. So I really appreciate it.
Geoffrey Litt (1:31:32)
Thank you.










