Harbor

publicWebsite codeCode

I love having my coworkers around when I work. So when pandemic hit, I spent a lot of time on video chat. But traditional video chat gets uncomfortable with more than 5 people in a room because there's no natural way break into smaller conversations like there is in real life. Harbor was my attempt to solve that. It's a desktop app that lives in your menu bar and allows you to exist in a virtual 2D space where you can roam around with up to 200 coworkers at once, but only see and hear the few people nearby. I received design help from Carey Phelps and engineering help from Brandon Wang.

For a full list of features, see the marketing page. Or watch this video demo of an early prototype which contained most of the features present today, albeit without polish:

prototype

Challenges

Video chat

I started by trying to implement my own video chat infrastructure off of the open source library Mediasoup. Gradually I realized that this would be too large an undertaking for an MVP, so I turned to SaaS. I started with Twilio Programmable Video. It turned out to have abysmal performance; even on 200Mb/s wifi, the latency was often >1 second and the screen sharing framerate was <5fps. Customer support said they didn't think this was normal, but couldn't help because I was paying less than $10,000 a month. So I turned to the fledgling startup Daily, which turned out to be better than Twilio in almost all dimensions. Sadly it had one serious problem:

Daily allowed me to specify client-side which call participants to receive streams from, but not which participants to send streams to. The latter would have been perfect for my use case because a client should only send streams to nearby participants. Without send-side control, I had to use receive-side control, which presents a security problem because any user could hack their client to listen in on distant participants without them knowing.

Unfortunately there were only two ways to fix this problem: wait for Daily to support send-side control, or roll my own infrastructure. I took a break on the project before resolving this.

Hardest decisions

Peer to peer vs. Centralized server

P2p intuitively made sense for this project because you only need to be connected to the few users you're close to. It would allow a virtually unlimited number of users to coexist in the same world, since the server wouldn't be doing much work. And it would relieve user worries of being spied on.

But I couldn't do p2p because sometimes you need to broadcast to everyone in the room at once. That would mean O(N) uplink bandwidth in a p2p architecture, which is unsustainable in large groups.

So I went with SFU architecture instead, which is easier on bandwidth requirements at the cost of being extremely expensive. I suspect that the high cost is actually prohibitive at scale and unsolvable, seeing that gather.town actually switched to p2p after using an SFU. I'm curious how they approached the broadcasting problem with p2p.

Moving the map to the menu bar

In my initial prototype, the map view and the user tiles (the rectangles on the right containing camera feeds) were always in the same window.

This was simple and easy to understand. The problem was that most of the time, user tiles were much more important to see than the map. You don't need to see where everyone is located all the time. They're a background status, like available wifi connections. So I decided put the map view in the same location as the wifi status: in the menu bar.

Animated avatars → Profile pictures

I had originally planned on using animated avatars to represent people, as seen on the left, because it opens up possibilities for body language. But when I decided to tuck the space inside the menu bar, I realized that an avatar based design is too cluttered for the tiny window. Profile pictures, though more stilted and less flexible, convey identity much more efficiently.

Remote resources vs. local resources

Electron's security practices are confusing because they're constantly evolving. Slack wrote a very helpful article explaining why your Electron app shouldn't just be a browser into your web app (Path 1: The Shortcut and Path 2: Remote Isolation). However, many startups are doing exactly that because it's much easier and faster than the alternatives. Moreover, many Electron features like context isolation are being added that make Path 2 easier to pull off. I opted to go the recommended way (Path 3: Local Resources) but it's still unclear whether it was worth the cost in development velocity.

Key Learnings

Everything is possible but hard with Electron.

When using Electron, you can almost always use the native capabilities you need, but they may be finicky and undocumented. For Harbor, we used multi-window management, screen recording, notifications, the Tray API, the powerMonitor API, process forking, AppleScript, and more. Virtually every feature required diving deep into Github issues and other forums to unravel obscure behaviors. Many of the issues I experienced were still being actively discussed or had solutions just discovered a couple months ago. I would have greatly benefitted from having a friend with deep Electron experience, so if you need me to be that friend now, let me know.

Desktop app design is different from web.

The most important difference from a UI design perspective is that you have complete control over the screen space. Desktop apps can open multiple windows, control their sizes and positions, and even make them non-rectangular. Web apps, on the other hand, need to fill the browser, which is sized and positioned solely by the user.

We made extensive use of this control in Harbor. The most obvious example is the way a user's video slides in from the right side of your screen when they get close to you.

constructionTODO: Add gif

But window control affected our design of every part of the UI. Every page is only as large as it needs to be. For instance, the home page has more content than the auth page, so the window grows wider after you sign in.