May 23, 2017 • 7 min read

As I was coming out of an exam a couple of days ago, I saw something I’ve seen a couple times before.

The truck that redistributes bike-sharing bicycles.

I’ve seen this bicycle redistribution truck around campus a few times since Princeton started its partnership in 2016 with Zagster to launch bike-sharing on campus. The program has 10 or so bike docking locations on campus that support a fleet of 50 bikes.

It’s just a little bit odd and funny that managing the bike-sharing program (a program that is meant to promote less driving around campus and more biking) requires loading bicycles into the back of a truck and redistributing the bikes to different stations. And according to the minutes from the March 27, 2016, meeting of Princeton’s student government, this truck goes around campus twice (twice!) a week to redistribute bikes.

Then, I was interested in doing a little more research into the effectiveness of the bike-sharing. While I’m not a user of the service, if I was, I would have one requirement: can I find a bike every time that I need one. That is, can I be confident that when I go to a particular station to get a bike that there will be one there.

For different users of the service, I think the minimum success rate is quite different. For a tourist visiting Princeton’s campus, if you want to rent a bike, but there are no available bikes to rent, then no big deal. You can walk around campus instead.

But if you’re a student who relies on the service to get to lectures and classes, then if there’s no bike available that is actually a huge issue. Presumably, the whole point of using the bike-sharing service is so that you can allot less travel time. So if you try to rent a bike 5 minutes before lecture but don’t find one, then you will be late for lecture. If this happens a couple times, then you’ll probably just lose trust in the service, and get yourself a personal bike.

And that’s my qualm about the service: if one of its goals is to reduce the need to have a personal bike on campus for students, then I’m not sure if it can do that job. Similar to ride-hailing services, a bike-sharing service needs a large amount of inventory and liquidity to be thought of as a viable option. Though the more I think about bike-sharing, the more I am convinced that it is not for replacing a bicycle for a daily user.

All of that being said, here’s an interesting exercise: what is the success rate required for someone to depend on a bike-sharing service instead of a personal bike. I think the answer to this question is almost identical to how one might answer the same question about using Uber vs. using a personal vehicle. For me, I wouldn’t be surprised if a 99% success rate is the threshold to clear. That would mean that only 1 out of every 100 times you went to pick up a bike you couldn’t get one. At this level of reliability, you begin to approach the reliability of using a personal bicycle (I reckon that 99% reliability for a personal bike is a fair estimate).

But to get to 99% reliability, I wouldn’t be surprised if the Princeton bike-sharing program would need two or three times as many bikes as it does today. Unlike Uber, which can use surge pricing as an on-demand adjustment to bring more drivers onto its network during busy times, bicycles cannot be “dynamically” added into the network as demand changes. Of the docking locations I see around campus, it’s not uncommon for them to have no bikes at particularly busy parts of the day.

Another factor that is likely preventing a large influx of more bikes: the cost. According to an exploratory report on bike-sharing by the Los Angeles County Metro, the per bike installation cost for bike-sharing is $3000 to$5000! According to the press release, Princeton has 50 bikes deployed, which means an estimated cost of $150,000 to$250,000.

I guess everything above really brings me to my original point of writing this blog post, which is that this morning (5/22/17) after eating breakfast, I set out to visit 9 different stations and document how many bikes were at each location. Zagster clearly has information about inventory and usage orders of magnitude better than what this informal survey. And perhaps, the Zagster app might also have information about how many bikes each station has (but a cursory look at their app’s landing page suggests that it does not tell you which stations have available bikes). But I still just wanted to get out and see the stations with my own eyes.

Going into this, I had a suspicion that a few of the nine stations would have zero bikes. I admit, my bias is towards being skeptical of the effectiveness of the service. But to cut the suspense, of the stations I visited, only one of them (Firestone Library) had no bikes. However, to be fair, this morning was slightly drizzling, so I suspect that the usage rate of the bike-share is lower than normal. I think there are a couple more stations farther out of the main campus that I didn’t visit, but I was able to see basically all of the stations that are located on the main campus of Princeton.

Location Number of Bikes
Lakeside Apartments 9
Lawrence Apartments 9
Computer Science Building 4
Carl Icahn Laboratory 4
Princeton Station 3
Richardson Auditorium 2
Frist Campus Center 1
Firestone Library 0

Here’s pictures of all of them (in the order that I visited) and some commentary.

## Richardson Auditorium: 2 bikes (10:30 AM)

I see this station quite frequently on my way to the dining hall, and it’s been empty many times before. But today, there are 2 bikes here.

## Princeton Station: 3 bikes (10:45 AM)

This is a trend that I see fairly frequently: locking a non-bike-share bike to the bike-share location. I see why this happens as some places around campus don’t have convenient locking posts, and even of those locations that do have bicycle posts, the bike-share ones are often of higher quality.

## Lawrence Apartments: 9 bikes (11:00 AM)

This was surprising. Lawrence apartments house graduate students and are slightly off the main campus, but they do have quite a few bikes. I’m surprised that at 11:00 AM in the morning there were this many bikes still at the apartments. I would have guessed people would ride them into central campus in the morning.

## Lakeside Apartments: 9 bikes (11:15 AM)

Also graduate student housing. Also has a lot of bikes.

## Carl Icahn Laboratory: 4 bikes (11:15 AM)

This is where the picture of the truck redistributing bikes is from.

## Frist Campus Center: 1 bike (11:45 AM)

This station is frequently empty.

## Firestone Library: 0 bikes (12:00 PM)

Of course, it was the last station I saw of the day that I saw that had zero bikes.

# Two Safari Quibbles

February 3, 2017 • 3 min read

As a student, the flexibility of a PC is indispensable. And judging by the technology used by my peers, this is true of near every student. However, it is still true that the majority of my computing (maybe ~60%) happens in the web browser.

On macOS, Safari and Google Chrome are the two powerhouse web browsers. Both have support for modern web standards, and both are very extensible. But Safari has two clear advantages: two-finger scrolling responsiveness and power efficiency. The best comparison I can make between two-finger scrolling in the two web browsers is scrolling in Android and iOS. Chrome feels like scrolling in Android: not bad, but not good. Safari feels like scrolling in iOS: fantastic. Once you tinker around in iOS, you realize how janky Android scrolling is (I use a Moto X Android phone). And there might not be a more important feature than power efficiency. Because of how heavily I use the web browser it is often the largest consumer of battery life.

Having listed Safari’s advantages over Chrome, there are still two quibbles I have with Safari that keep me using Chrome. And both have to do with the way tabs are displayed in Safari. Note that I have Increase Contrast selected in Accessibility.

Safari tabs.

Chrome tabs.

## 1. Lower Contrast Text

First, the text contrast of website titles in Safari is lower than in Chrome, which makes them harder to read. This quibble might be a function of using a non-retina MacBook Air, as I could see a sharper, more color accurate screen alleviating these issues. Even though Safari has lower contrast than Chrome, its text contrast is still above the 7:1 contrast ratio recommended by Apple.

Web Browser Contrast Ratio Text Color Background Color
Chrome (foreground tab) 19.1:1 rgb(0, 0, 0) rgb(243, 243, 243)
Safari (foreground tab) 14.4:1 rgb(0, 0, 0) rgb(214, 214, 214)
Chrome (background tab) 14.3:1 rgb(0, 0, 0) rgb(213, 213, 213)
Safari (background tab) 10.7:1 rgb(0, 0, 0) rgb(185, 185, 185)

## 2. No Favicons

The decreased legibility of Safari tab labels wouldn’t be such a large issue if not exacerbated by my second quibble: no favicons next to website titles.

Here’s my reasoning for why Safari does not show favicons. Safari tabs are implemented using native macOS tabs that can be found in TextEdit, Finder, etc. And in every other application, tabs are labeled only with text.

Even so, for me, favicons are the single most important identifier for different tabs. With favicons, I can glance at an icon instead of reading text to figure out which tab is which. Even better, as you navigate to different pages of a website, often the title will change, but the favicon does not. So the favicon offers a certain degree of reliability that text labels do not.

To my point, where appropriate, Apple features icons on many other labels around macOS.

Icons used in System Preferences, Finder, and the “Command-Tab” Application Switcher.

Even Safari uses the Touch Bar to display favicons, not text labels.

Image from Apple.

Fingers crossed—🤞🤞—here’s to favicons and increased text contrast in Safari tabs.

# What Makes a Good Reddit Post?

February 2, 2017 • 22 min read

Note: This analysis of Reddit was created for the final project of ELE/COS 381 taught in Fall 2016. For this project, Alan Chen, Luis Gonzalez, and I decided to apply the topics learned in class to an analysis of Reddit to understand what makes a “good” Reddit post.

The link to the PDF of the report is here.

# ELE/COS 381 Final Report: What makes a good Reddit Post?

By Alan Chen, Eric Chen, and Luis Gonzalez-Yante

## 1 Introduction

In ELE/COS 381, we have studied various networks both physical and digital. In particular, Chapter 8 of Networked Life focused on the study of topology and functionalities of social networks like Facebook and Twitter. In this project, we are interested in the study of Reddit and what specifically makes a good Reddit post.

The frontpage of reddit.com.

Reddit is a bulletin board of user-generated content. As opposed to strictly ordering results by date, Reddit’s user interface strongly focuses on showing users content that is popular with other users. The site does this through a voting mechanism, where each user can upvote and downvote specific posts to the site. For individual users participating on the site, there is a motivation to create posts that resonate with other users, and thus generate a large number of upvotes.

Another key aspect of Reddit is its emphasis on sub-communities, which are each their own fiefdom on Reddit. According to redditmetrics.com](http://redditmetrics.com/history), on January 14, 2017, there were 1,005,275 unique subreddits. Communities are organized around topic or interest. Some communities such as /r/pics and /r/news are very broad and general interest communities. Other communities appeal to much smaller groups, such as /r/vexillology, which is dedicated to discussion and commentary about flags.

It is important to note that a large component of the Reddit community does involve commenting. Comments, like posts, can be upvoted and downvoted, and comments are sorted by popularity. However, in this project, we decided to focus only on analyzing posts—which subreddits they are in and how many upvotes they receive—to simplify analysis. Further and more extensive work would likely include the study of comments, as well as analysis of the content of posts and comments.

## 2 Goals

We decided to look at three different methods of quantifying what makes a good Reddit post. Reddit contains many emergent phenomena not explicitly designed for in the site and not completely obvious to new users. The site can be difficult to approach from the outside, but from our experience with the site (and one shared by the many frequent users of the site), we believe Reddit can be a funny, insightful, and engaging online community. Thus, ultimately our goals from this analysis were to be able to provide a set of conclusions and actions that might be given to a new user of the site in a “How-to use Reddit” guide.

For questions addressed in sections x.1 and and x.3 that follow, we are interested in characteristics of different types of subreddits. We decided on three types: top 100, small (< 10,000 subscribers), and original content.

Top 100 25 Small (<10,000} subscribers) 10 Original Content

Three categories of subreddits analyzed in x.1 and x.3

Top 100 subreddits are the biggest 100 subreddits based on the number of subscribers. The largest subreddit on the site is /r/AskReddit with over 15 million subscribers, and the 100th biggest subreddit has just under 500,000 subscribers. In contrast, we sampled 25 small subreddits that were randomly selected among subreddits with less than 10,000 subscribers. Finally, we hand selected 10 subreddits that contained significant amounts of original content, where we defined original content as a post the user creates the material of the post and is not aggregating external content. It is important to note that the original content subreddits had subscriber counts more similar to the small category, as well.

A particular user will often follow general-interest subreddits, but may also follow one for her municipality or for a niche RPG that she plays. So these three categories of subreddits provide a broad perspective of both general-interest and niche communities that we believe accurately reflects the usage patterns of users on the site.

### 2.1 Quadrant Analysis: How much engagement do top posts get?

Given the focus of Reddit on distinct communities, and with the over 1 million unique subreddit communities on the site, we expect that there will be large variations in communities. There might be countless ways to quantify the differences, but we decided on examining subreddits along two dimensions: individual engagement and community engagement.

Individual engagement is defined as follows for any subreddit:

For example, we might look at the top posts of /r/politics and generate and individual engagement score of 0.7. We calculate this number by looking at the users who author the top posts in a subreddit. For each user, we examine their post history, seeing how many posts are in /r/politics and how many posts are in other subreddits. We then calculate the proportion of posts that are in /r/politics for each user, and then average across all of the users who author the top posts in /r/politics to generate the individual engagement for the subreddit. A value of 1 for a given subreddit means that users are very engaged in that subreddit—they only ever post there. Whereas a value of 0.05 means that the top users in that subreddit only post to that subreddit 5% of the time.

Community engagement is defined as follows for any subreddit:

The intuition for the metric is as follows: if the top posts in subreddits A and B both receive 1,000 upvotes on average, but if A has 1 million subscribers while B has only 100,000 subscribers, then subreddit B has 10x the community engagement of subreddit A. Similar to individual engagement, community engagement is a metric that is calculated for any particular subreddit. For the top posts in the subreddit, we normalized the upvotes with the number of subscribers to that subreddit. Then, we averaged over all the top posts in the subreddit.

Our intention for creating individual and community engagement based on the strong suspicion that some subreddits, perhaps those of a more niche topic, attract higher individual engagement, while general interest subreddits might generally have lower individual engagement. Additionally, because community engagement corresponds to what proportion of the community a top post needs to appeal to, we thought it reasonable that it’s easier to create a top post in a subreddit with low community engagement because you have to appeal to a smaller fraction of the subscriber base.

Reddit users are, like most people, multidimensional and likely to frequent multiple subreddits. As they move between communities, they bring with them the context of the different communities they participate in. This context could take the form of knowledge of memes or inside jokes.

Because subreddits are composed solely through contributions to the subreddit, we implemented a metric that takes the top posts at the current time of the subreddit, and analyzes all the posts of the authors of those top posts—who we call top users—to see how often they post are in the original subreddit a, and how often they post in the target subreddit b.

By taking the concentration of posts in the first subreddit, taken as a representation of the degree the author is a member of the first subreddit, and multiplying by the concentration of posts in the second subreddit, taken as a representation of the degree the author is a member of the second subreddit, we receive a metric describing the crossover between subreddits for a given author. This participation function is maximized when a user participates 50% in subreddit a and 50% in subreddit b and contains an extra factor of four to possible values between 0 and 1. The metric for the subreddit is the average of the values for its top users.

Modeling subreddits as nodes in a network is an extension of Chapter 8: How do I influence people on Facebook and Twitter. An important notion from that chapter is that some links between users more important than others, such as links that were included in many shortest paths. Here, we extend that intuition to our participation metric which generates a network of weighted links, and we use the weights to directly infer which links are important in the network representation.

### 2.3 The Reddit Power Index: How “average” are top users?

Because of the lack of real names on Reddit, it is not immediately clear who are the “top users” and who are not. For example, on Twitter, the users with the most followers are often celebrities and public figures. But who are these top users of Reddit, and how dominant are they? Specifically, is it possible for the average user of Reddit to have a post become very popular in a subreddit? Or are the frontpages of subreddits controlled by an elite group of users?

To answer these questions, we were interested in quantifying how good the top users are. To do so, we created the a metric called the Reddit Power Index (RPI). Fundamentally, RPI is a metric that can be calculated on any post, and is defined as follows:

Thus, the RPI for any post is the number of upvotes it has divided by the average number of upvotes for a post in that subreddit. An RPI below 1 means that the post is below average, and any RPI above 1 means that the post is above average.

Because we are interested in categorizing subreddits as a whole, we extend the definition of RPI first to users and then to subreddits as follows:

Naturally, the RPI for a user is the average RPI of its posts, and the RPI for a subreddit is the average RPI of its top users.

With RPI, we created a heuristic which indicates the difficulty of creating a top post in any given subreddit. A subreddit with high RPI is one dominated largely by users who consistently have successful posts. Whereas a subreddit with lower RPI is in some sense more “democratic” because the community surfaces posts from users who are closer to the average Reddit user in terms of past post upvote performance.

## 3 Implementation

We collected our data using Python scripts and the Python package PRAW, the Python Reddit API Wrapper. The wrapper authenticates using OAuth, creating a Reddit instance that contains the client_id, client_secret, password, and username, which the rest of the PRAW API acts upon. While Reddit has its own public API that exposes data through JSON, we preferred using PRAW as an intermediary because it handled authentication, rate-limiting, and exceptions, and allowed us to focus on writing the data collection scripts.

Diagram of our data collection process.

The Reddit Instance generated by PRAW can access any subreddit, post, or user that is accessible through reddit.com. Additionally, PRAW generates iterables which are very useful for data collection. For example, we can create a PRAW object that is a list of all of the current top posts in a subreddit, and this object can be iterated through like any other list in Python. We then can get key data for each submission, such as author, upvote count, and title. Similarly, each Reddit user can generate a PRAW object which is a list of its recent posts. This technique of (a) iterating through the top posts in a subreddit, (b) finding the upvotes and author of each top posts, and (c) finding the history of posts for that user forms the foundation of our data collection techniques.

While PRAW simplified using the Reddit API, we still encountered significant roadblocks in regards to rate-limiting. Each request to the API can return a maximum of 100 posts at a time, and PRAW delays 2 seconds between API requests. In all of our scripts, there was the trade off where collecting more data resulted in longer execution times. We were able to strike a reasonable balance by often choosing to look at samples of top posts of size 25 to 100.

However, when collecting data for the RPI, these rate-limits actually became a bottleneck factor (script execution time took hours). Our issue was that calculating the RPI for each user often requires collecting average upvote data for dozens of subreddits, each requiring a scraping of random posts that took several seconds. We largely overcame the issue by caching the average upvote value for subreddits and storing this data in a CSV file so it was persistent across different executions. Then, for example, if we had already scraped /r/AskReddit before, our code gets the average upvotes from the local cache instead of hitting the API, which makes a multi-second operation essentially free.

Also, we should note that many subreddits closely follow power-law distributions for the number of upvotes on a random post, which had a noticeable impact on our sampling results.

Log-log plot of upvotes for 1,000 random submissions to /r/AskReddit, a Top 100 subreddit.

Thus, because rate-limits constrained the number of posts we could sample, there was the potential for statistics to be moved wildly by sampling one outlier post out of 100. We believe that the reason for power-law distributions for upvotes on Reddit posts is because of how Reddit surfaces content to users. By default, users of the site are shown posts that are already popular, and then as a function of their increased visibility, these popular posts will continue to receive more upvotes. This is analogous to the model of preferential attachment studied in Chapter 10: Does the Internet Have an Achilles’ heel of Networked Life. In preferential attachment, new nodes added to a network tend to connect to nodes which already have high in-degree, similar to how new upvotes tend to accumulate on posts that already have a lot of upvotes. Thus, when analyzing the RPI for subreddits in 4.3, because RPI for a subreddit depends on the average upvotes from various other subreddits, we believe the median is a more explanatory measure of typical RPI.

## 4 Discussion and Conclusions

### 4.1 Quadrant Analysis: How much engagement do top posts get?

Plot of individual and community engagement for various subreddits.

The category of Top 100 subreddits is clustered in the lower-left quadrant, meaning the top posts are authored by users with low individual engagement and those posts receive low community engagement. Given that these subreddits are general-interest, this pattern is not unsurprising. For instance, while many people might be interested in the funny pictures from /r/Funny, very few people will only be interested in funny pictures.

For Original Content subreddits, it is difficult to discern a significant difference from the other categories in terms of individual engagement, but it is clear that community engagement is relatively low on the whole. Perhaps this a factor of subreddits driven by Original Content not needing to appeal to a very wide fraction of the subreddit.

For small subreddits, we do see relatively large individual engagement and community engagement compared to the Top 100 subreddits. We propose the following explanatory mechanism: smaller subreddits are about more niche topics, so while many people might subscribe to /r/Funny because they like funny pictures, the only users that will subscribe to /r/lexington are people who live in Lexington, KY. As a result, this self-selection means that the typical subscriber to /r/lexington is more invested in the subreddit than the typical subscriber to /r/funny. As the plot shows, these self-selecting groups tend to have top posts from users with higher individual engagement (frequently above 0.5), and they can have very high community engagement.

One interesting datapoint separate from the three categories that we added was /r/The_Donald. This subreddit is for supporters of Donald Trump and has created quite a bit of controversy on Reddit, with Reddit coming under fire for possible censoring of the subreddit, and members of the subreddit being accused of being toxic and harmful to the overall community. The controversial status of /r/The_Donald carries over to the plot, as the subreddit is a datapoint with no peers. It has extremely high individual engagement of 0.81, which means that the users who have the top posts in /r/The_Donald contribute 81% of their total posts to that community. The community engagement is the second highest of the subreddits we investigated, coming second to a much smaller community dedicated to Star Wars prequel memes (335,783 subscribers /r/The_Donald, 5,910 to /r/PrequelMemes).

For users new to Reddit looking to create posts that rise to the top, we suggest looking at subreddits in the lower-left quadrant, with low individual and community engagement. We suggest to look for low individual engagement subreddits because these subreddits likely surface top posts that don’t require extensive history of context and knowledge about the particular inside jokes and idiosyncrasies of that subreddit. We also suggest subreddits with low community engagement because these are communities posts can become popular while appealing to a smaller proportion of the community. An example of a subreddit that matches these criteria is /r/PersonalFinance, with scores of 0.06 and 2.6e-4 for individual and community engagement respectively.

First we had to decide which subreddits to measure against each other. We first observed top subreddits, such as /r/funny and /r/pics, but what we found was that in all tested cases, the connection value was either negligibly low or 0. This is probably because the posters to those subreddits are so varied and their interests so varied that the size of samples we were taking did did not discover cross-pollination.

The second tests we made were across political subreddits, including /r/The_Donald and /r/hillaryclinton, trying to match them with subreddits with similar views, such as /r/dncleaks and /r/enoughtrumpspam. However, because the top posters in political subreddits seem to keep very heavily inside those subreddits, they received 0 scores even with subreddits with similar views.

We then decided to test a network known to be based on geography, local sports, to see if we could gain information about the sports tendency of areas and locations. We examined the “big four” of Philadelphia—basketball, football, baseball, and hockey teams—and their relatedness to themselves, as well as some of the relevant league subreddits (e.g. /r/NBA).

Cross-pollination graph for Philadelphia sports teams.

Here in the graphs, red lines indicate a participation from one subreddit in another of .1 or greater, green lines indicate a participation from one subreddit in another of .01 or greater, and blue indicate any other non-zero participation. What we found was interesting. Comparing to known information, we can confirm that /r/timberwolves and /r/sixers users are a subset of /r/NBA. We can also confirm that Philadelphia is a football town, as is known, and also make the interesting observations that /r/NBA is the biggest sports subreddit, even though local people are most likely to follow the football team in addition to any other sports of the area.

We also applied to same analysis to Minnesota sports teams.

Cross-pollination graph for Minnesota sports teams.

The above graph suggests that if you want to become a star in /r/Timberwolves, then you should also participate in /r/NBA and /r/MinnesotaTwins. There is likely context in terms of discussion, memes, and knowledge among these communities that is shared among the top users.

Overall we can conclude that subreddits with users that stay only in the subreddit have 0 cross pollination, and that it is clear that larger communities have more participation in and less participation out.

Future improvements that could be made include scraping all posts of a given subreddit, though it would take much more time, and also nuancing the conclusions reached here by interlacing these results with the previous analysis of engagement of the top posts.

### 4.3 The Reddit Power Index: How “average” are top users?

RPI is a metric we created that can help us pinpoint the difficulty of creating a top post in any given subreddit. The RPI also allows us to clearly see what subreddits are dominated by users that consistently have successful posts. All this really means is that with the RPI we are able to find subreddits that will have more or less “alpha redditors” that one will have to compete with for upvotes.

Category Average RPI Median RPI
Top 100 149.8 13.8
25 Small (<10,000) 3.0 2.0
10 Original Content 15.2 2.6

RPI scores for the three different categories of subreddits examined.

On the whole, RPI did provide a reasonable number we could use in analysis. We implemented a 10% trimmed mean in calculating the RPIsubreddit to protect the metric against large outliers. However, there are still cases such as /r/gaming in the Top 100 which had an outlier RPI of 4767 that skewed the average RPI of Top 100 subreddits upwards. This is why we believe that median RPIs are the better way to quanitfy the typical subreddit RPI in our categories.

Shifting our focus to the Figure 8, we can see that the median RPI are more reliable. Subreddits that fall into the Top 100 category have significantly higher RPI than small or original content subreddits. This tells us that the popular posts in the Top 100 subreddits are typically created by users that typically get more than 10x the average number of upvotes! Conversely, our data show that the RPI for top users in subreddits less than 10,000 subscribers has a median value of 2, which is much closer to being an average Reddit user. The Original Content subreddits also have low RPI, though we suspect that this might be largely a function of size of subscribers, as the Original Content subreddits we examined happened to have smaller subscriber bases.

New users of Reddit should target subreddits with RPI scores closer to 1. These subreddits commonly surface content from more average users, not just from Reddit superstars. Our data tell us that smaller subreddits, such as /r/improv with an RPI of 1.4 and 7,332 subscribers, typically have lower RPI scores. Even some larger subreddits, such as /r/Frugal with an RPI of 1.7 and 613,046 subscribers and /r/GetMotivated with an RPI of 1.6 and 9,779,376 subscribers, have low RPI scores as well.

Posting to a subreddit with low RPI does not guarantee success. What it does mean is that you are competing against more average users of Reddit to get the top posts, hopefully giving yourself a chance to standout.

## Final Words

“Man naturally desires, not only to be loved, but to be lovely.” Adam Smith wrote that line in 1759 in The Theory of Moral Sentiments, and the same could be said of our behavior on Reddit today. We, the users of the this social bulletin board called Reddit, crave recognition and popularity. We want to be “loved” and “lovely”—we want our posts to become popular.

To that end, our analysis produced a few steps of action. There are clear differences between large and small subreddits in terms of individual and community engagement, as well as the RPI of top users. These differences are important to keep in mind for new users coming to the site, as it might be beneficial to begin in subreddits tailored to your interests while also keeping an eye out for low RPI, low engagement subreddits. Additionally, understanding the importance of what we termed cross-pollination will help you tailor which sets of communities to participate in. By deliberately choosing a set of subreddits, you can benefit from the same shared context that top users in those subreddits already have. Hopefully, with these takeaways, the factors of what makes a good Reddit post have been made clearer, and we can all use them to be more “loved” and more “lovely” on the site.

# Mini-Buses and uberPool

January 14, 2017 • 2 min read

Last month, I started reading A Pattern Language: Towns, Buildings, Construction. It’s a book about architecture that contains 253 rules for building everything from metropolitan areas (2. The Distribution of Towns) to houses (221. Natural Doors and Windows).

There’s so much to talk about from this book, but one pattern in particular caught my attention: 20. Minibuses. Here’s a quote from the passage:

Buses and trains, which run along lines, are too far from most origins and destinations to be useful. Taxis, which can go from point to point, are too expensive.

To solve the problem, it is necessary to have a kind of vehicle which is half way between the two—half like a bus, half like a taxi—a small bus which can pick up people at any point and take them to any other point, but which may also pick up other passengers on the way, to make the trip less costly than a taxi fare.

The system hinges, to a certain extent, on the development of sophisticated new computer programs. As calls come in, the computer examines the present movements of all the various mini-buses, each with its particular load of passengers, and decides which bus can best afford to pick up the new passenger, with the least detour.

Replace “mini-buses” with “uber cars” and that quote reads like a convincing pitch for uberPool.

I don’t think that anyone could have reasonably predicted when the book was published in 1977 that, while mini-buses would not become widespread, just give the idea a couple of decades until the internet takes hold and Moore’s Law makes it possible to build an iPhone, and then the concept of a mini-bus would be possible.

I think there are two takeaways here:

1. A Pattern Language: Towns, Buildings, Construction stands the test of time. Mini-buses didn’t take hold, but the idea was clearly in the right direction. There’s a lot of “I never thought about it that way” moments in this book.

2. Human ingenuity is a very strong force. I’m generally not overly optimistic on any one specific technology (e.g. A.I., genetic sequencing, renewable energy). But on the whole, I am optimistic that things will be better in a decade than they are today—and I do mean that in the broadest sense. Who knew 40 years ago that mini-buses would become uberPool. I don’t know what’s coming tomorrow. Whatever it is, though, it will probably be better than what we have today.

# Intellectual Property Stifles Tech Innovation

July 31, 2016 • 4 min read

Note: This op-ed was originally published on page six of the January 21, 2016 print issue of the Middleton Times-Tribune. This 750-word piece is the distillation of a much longer research paper I wrote for my college freshman writing seminar, Property, Wealth, and Equality.

As a kid, I loved sledding down the front yard on wintry Wisconsin afternoons. I remember that one winter a while back, my older brother got a neon-green SnowSlider. This sled, to use tech parlance, disrupted my family’s sledding ecosystem. My chintzy plastic sled (the kind that is built like a miniaturized kiddie pool) was obsoleted by the svelte SnowSlider with its slick and speedy coating.

But as much as I wanted to try my brother’s speedy new ride, he had no intentions of sharing. Even when I went out sledding by myself, I was never allowed to the use his sled.

Of course, we all have stories like this. Whether it’s between siblings or coworkers or friends, everybody has a story about being on the wrong end of greed and selfishness.

And recently, we all have witnessed this same phenomenon develop in the tech industry. Instead of claiming exclusive sledding rights, tech companies are fighting over intellectual property rights—think patents and copyrights. They are making expansive intellectual property claims that threaten to stifle future improvements and innovations.

The solution to this problem is clear: intellectual property law must promote, not prohibit, technological innovation by limiting the scope of property claims. Without making these changes, we might never see the Amazons, Facebooks, and Ubers of tomorrow because the moat of intellectual property will have blocked their entrance.

In fact, news came out recently that Google plans to re-architect its implementation of the Java APIs in the next major version of Android. Because Java APIs are like the foundation of the house, replacing them is not trivial. Even though the courts have already found that Google’s codebase is original and unique, an ongoing appeal forced Google to dedicate considerable efforts to what essentially amounts to treading water.

If not for Oracle’s lawsuit, Google could instead spend these man-hours improving its software in tangible ways. It’s hard to quantify and report the loss of future innovation. But these losses are real, and we are starting to feel the treacherous side-effects today.

I recognize, though, that changing the law is no easy task (and rightly so). But there is precedent that can be considered when thinking about intellectual property in technology: water rights in the 19th century West. While 21st century technologists and 19th century settlers have little in common beyond entrepreneurial spirit, lessons learned from the West are still applicable today.

In the West at this time, the first settlers had many advantages. Processing precious ore required building mills along rivers. And as the first to arrive, these settlers had their choice of the best land and water sources. Plus, cordoning off a large swath of waterway had the knock on effect of limiting others’ mining potential.

When legislation finally caught up to the miners, the rules changed. A use principle was enacted, which simply stated that you had a right to water only if you made use of it. Rampant speculation and exorbitant claims, like my brother’s exclusive right to the SnowSlider, were no longer valid.

We can apply this principle to software, too. One function—one idea—can have a myriad of uses. You can use a dynamic pricing algorithm to sell stadium tickets (SeatGeek), taxi rides (Uber), or diapers (Amazon). Under the use principle, because each company uses the pricing algorithm for specific applications with unique codebases, all of them have the right to innovate, but none of them can claim ownership over the bigger idea.

However, some say that weakening intellectual property protection will actually discourage innovation by lowering financial incentives. While this is a valid concern, the reality of the legal landscape suggests a more pressing issue. According to Unified Patents, an industry organization dedicated to reforming patent law with members including Google and Adobe, patent trolls accounted for 92% of all patent lawsuits in technology in 2015. These patent trolls take hundreds of companies, big and small, to court at once, relying on the current intellectual property law to exact money from legitimate companies.

Instead of giving out broad intellectual property to the first creators of technology, we must reform intellectual property along the use principle to allow future companies to continue to innovate. Amazon didn’t create the first online marketplace, Google wasn’t the first to make a search engine, and Apple’s iPhone wasn’t the first phone. Let’s make sure they’re not the last.