This article was originally published in number 24 of Today Software Magazine in June 2014. You can see its Romanian version here and its English version here. You can also see me here presenting the article at Today Software Magazine’s launching event (it was kind of a total fail because they decided to host the event outside, and it was very windy that day).
If you’re reading this, you’re most likely a programmer. And, like any programmer, you had to search for programming questions online. I’m sure you noticed something interesting: in the last few years, when we search for a programming question online, a link to StackOverflow will usually be somewhere among the first 3 results from Google.
This is no coincidence: StackOverflow has somehow entered the live of programmers, slowly but surely. We use it practically every day, but I noticed that most programmers don’t know too much about how this site was born, what principles it works on and why it’s so successful.
StackOverflow is just one of the 119 sites of the StackExchange network, the two are not the same thing. In this article, we’ll discuss the philosophy on which this network is built and we’ll take a quick overview on how its mechanics allows it to basically function independently.
I hope the details I will present here will offer more comfort in using this network and I also hope that we’ll see a stronger participation by the Romanian programmers.
I. The network’s philosophy
History and motivation
Before StackOverflow, it was very hard to find a (correct) solution to a programming problem, except to the relatively common ones. The reasons for this are:
- The people that wrote documentation for programming languages, frameworks or technologies were incapable of putting it on the web and make it easily searchable.
- The solution might have been in a programming book. But the reality is that most programmers don’t learn from books anymore. This industry is slowly dying.
- The answers on forums were buried in many pages of discussions and comments.
- In most places, the answers had a ton of problems: from bad advice and fixes that only worked for some people up to vulnerable code and solutions that were basically hacks. There was no way to change them, fix them or improve them.
- A lot of problems ended up being fixed by the platform or the framework, but you didn’t know that, because the old solution was still among the top Google search results.
- If the problem was rare (maybe an API behaving strangely in a certain situation), then the search engine’s page rank wasn’t very useful: the problem only affected a few people, so no one posted links to its solution; therefore it didn’t show up in search results.
- There are too many ways to formulate a question: you have to use the right words to even have a chance.
To fix these problems and to have a much better availability of solutions online, Joel Spolsky and Jeff Atwood decided in January 2008 to launch a Q&A website called StackOverflow. The site’s development started in April 2008 and was launched in August 2008 as a private beta site. After 4 weeks, in September 2008, StackOverflow became public.
Joel’s blog was joelonsoftware.com and Jeff had his own blog too: codinghorror.com. These blogs were fairly popular and they proved to be part of StackOverflow’s success because, through these blogs, Jeff and Joel increased the popularity of their idea. This was important because they wanted new visitors to feel welcome and to actually find useful content when they reached the site.
StackOverflow wants to be a combination between a forum, a blog, a wiki page and a news aggregator. The basic idea is for people to ask and receive answers, not just to add useless comments. It’s a place where quality is voted up and promoted and where useless content is pushed down and disappears.
StackOverflow wants to collect as much knowledge and as many programming solutions as possible. The community evaluates them through voting. As the votes accumulate, experts and trustworthy people will surface and the community will trust them more and more.
It was an instant success and this convinced the founders to launch ServerFault in April 2009, a site for system administrators based on the same philosophy as StackOverflow. SuperUser followed in July 2009, a site for computer enthusiasts and power users.
The success of these sites has laid the foundation for the StackExchange network, which now includes a variety of sites, all following the same structure and philosophy that StackOverflow was built on.
Editing and maintaining the content in an up-to-date state is crucial on StackExchange sites. Content accessibility is also very important, there are strong SEO techniques applied on the sites.
StackOverflow’s popularity and the fact that most programmers hang out online has an interesting consequence: when a new technology or programming language is launched, support sites or forums no longer being created for them; instead, users are redirected to the relevant tags on StackOverflow.
Variety and professional atmosphere
As mentioned in the introduction, the StackExchange network currently has 119 sites. This number fluctuates (see the “Area51” chapter).
Each site works on the same basic principles: they’re Q&A sites, they use the same platform and they have, as target audience, people that work in a professional capacity in a certain field.
The difference is that each site has its own community that drives and administrates it, completely independent of the other sites. In fact, the only collaboration is when questions migrate from one site to another; this is possible since they all rely on the same platform.
Each site has its own subject and variety mentioned in this chapter’s title is an understatement. The most popular subjects revolve around programmers and technologies, but there is great diversity: from math, computer games, poker, sports, politics and photography up to financial management, chess, graphic design, parental advice, history, religion and linguistics. There’s a very low chance for someone not to find at least one or two hobbies or interests among all of StackExchange’s site subjects.
As mentioned above, every site’s purpose is to gather as much information from that site’s experts as possible. When someone is looking for something, the result must be a professional, objective and complete answer. That is StackExchange’s ideal.
The philosophy mentioned in the previous chapter to constantly edit and improve the content makes the above ideal possible. In many cases, this is achieved.
The content’s license
The network’s philosophy includes the concept of making the entire information completely public. Any question or answer that is posted on the network is automatically subject to the Creative Commons Attribution-ShareAlike license. This means that each author receives the appropriate credit on his contribution, that the content must stay 100 % public and anyone can use and modify it (with the condition that the modified version remains subject to the same license), even for commercial purposes.
Making the content available under this license allows using the data in many ways. See the chapter “Big Data” for more details.
II). Working Mechanisms
Q&A, reputation and privileges
The system is very simple: a person asks a question and others post answers. Each question and each answer can be upvoted or downvoted. Depending on these votes, the author receives reputation points. As a person accumulates more points, he or she will unlock privileges and will gain more trust from the community.
An upvote brings the author +5 reputation if it’s a question and +10 reputation if it’s an answer. The difference is because answers are the ones that provide the highest quality content, so they are more valuable.
A downvote reduces the author’s reputation by 2 points, no matter if it’s a question or answer. However, when you downvote an answer, your reputation will also go down with 1 point. This decision was made to encourage improving the answers as opposed to just marking them as low quality.
Every question has to have tags: at least 1, at most 5. These tags help categorize the questions so they’ll be easier to sort and find. For example, a question about how to apply a Look-and-feel in Java will probably have the “java”, “swing” and “look-and-feel” tags.
The question’s author is encouraged to pick an answer that he considers to be the best. In this case, the answer’s author receives +15 reputation and the question’s author receives +2 reputation.
The privileges obtained as the reputation grows are diverse: from creating bouties, moderator flags and chat rooms, up to voting to close a question and more and more advanced editing possibilities and content protection.
Since StackExchange is based on a social network, moderation is done a little different. It is divided in 3 levels:
- The regular users do most of the moderation activities. They can edit the content on the sites, can vote to close or even delete some questions and they can flag for moderator attention if something happens. To do this, members have access to review queues that become available as the member accumulates more reputation. Those with very high reputations even have access to some moderator tools. Therefore, the most abundant and common problems are moderated by the users, it’s a community that moderates itself.
- Moderators are the network’s police officers; they step in when regular users cannot. Moderators handle flags, identify duplicate accounts, migrate questions between sites, manage tags and many more. Though, among the most important responsibilities is solving arguments and disputes. The number of moderators constantly fluctuates, there are usually between 700 and 1000 on the network. They are appointed by the Community Managers when a site is in private and public beta. In these stages, they’re chosen for their proven activity and diplomacy. Once a site gets to maturity, democratic moderator elections are held, where members nominate themselves for the position and are then voted by the rest of the community based on the speeches they give. Such elections are not held just once for every site. For example, on StakOverflow, this happens once or twice a year.
- When exceptional situations arise, the problems are solved by Community Managers. They are StackExchange employees whose role is to make sure everything is working properly, to monitor and guide the activity in Area51, answer questions on the Meta sites and offer guidance in using site tools.
The network as a Wiki
The network is very wiki-like. Users are encouraged to constantly edit and improve the content on the sites. This is so strong that users are encouraged to add their knowledge in the form of questions and answers, basically to answer their own questions. This way, a question and its answers are considered to be like a wiki page on a certain topic. To get an idea: 39 % of questions and 19 % of answers are modified at least once after they’re posted .
There are situations when a question is so complex that it requires a long list or needs a lot of research and a very long answer or even a great number of authors for it to be answered. For example: “What are the best programming books?”.
In this case, the question will receive a lot of answers, which will be edited by a great number of people. This, together with the popularity and the huge number of upvotes such answers will receive, raise an interesting question: if so many people contribute to that content, is it fair for the original author to receive all that reputation?
To fix this problem, such questions are marked as Community Wiki. This means that the reputation generated by the content won’t be attributed to anyone. The original authors will no longer be listed; instead, the members that contribute the most to the question or answer are displayed. Also, editing such questions will be much more accessible since a member won’t need 2000 reputation do it, like they normally do. Instead, only 100 reputation is needed for this.
Comments, chat and the Meta sites
In the network’s early days, you could only add questions and answers. Members, though, needed a place to discuss the rules of the sites, the various exceptional situations that occurred and the overall content quality.
So a system was introduced that allowed commenting on both questions and answers. These comments can receive upvotes from others but they won’t generate any reputation.
Another similar system is the chat, where members can talk about anything and everything. The chat is divided in rooms, each with its own discussion. Generally, each site has its own room, but new rooms can be created by members that have sufficient privileges to do so. There are also rooms that are moderator-exclusive, so they can talk moderation issues without regular members interfering.
Every site on the network has an associated Meta site. These are separate, but they work on the same principles. The only thing they have in common with their parent site is their moderators. Here, the discussions are only about the parent site, adding or removing rules, posting announcements and many more. Meta sites are created when their parent sites reach the private beta stage (see “Area51”).
As I mentioned in the previous chapter, the network has a lot of sites, each with its own subject. But how are these sites born? And who decides which sites are launched and what subjects they’ll have? The answer is: you, the regular member. The network is social at its core, so the community decides which new sites are launched and what their rules will be.
This all takes place in Area51, a special place where new sites are defined and launched by following these steps:
- a new site is proposed. It needs to have a name, a subject (e.g. naval architecture) and a description. Anyone can make such a proposal.
- next comes the definition stage when the community tries to draw people interested in the new site and creating hypothetical questions that would fit on it. This is done in order to establish the new site’s domain name, acceptable subjects on the site, what a good question should look like, what a good answer should look like and so on. These questions will be edited and discussed until the community agrees on them. Once the magical number of 40 such questions is reached and there are enough people following the site’s progression, the site is considered to be “defined”.
- the next stage is commitment. In this stage, the community tries to get as many people as possible to “commit” that they’ll participate on the site once it’s launched. The members have to “digitally sign” a petition that confirms this. The reason for this commitment is because a site needs a critical mass of members that use it and make it popular in the first cycles of its existence. The site can’t go to the next stage without at least 200 such members.
- next comes private beta when the site is actually launched, together with a sister Meta site. In this stage, the site is only available to those who committed to it during the previous stage. The FAQ will now be defined, the first moderator positions are assigned and last minute changes are performed. Also, the existing members try to fill the site with content and also make it popular on Facebook, Twitter, blogs and so on.
- the next stage is public beta when the site is opened to the public and anyone can use it.
- If the site reaches the required activity and traffic levels, gets a critical mass of members and the community is comfortable, that it will remain popular, it will reach maturity. In this case, the site will get a brand new and unique graphic design that best reflects its subject. Also, democratic moderator elections are held.
StackOverflow currently has 3 million members and 6.6 million daily visits, is the most popular site of the network and generates over 80 % of the network’s content and traffic . Having such a huge audience of programmers opens the doors for a unique opportunity: jobs and careers. This is how Careers.StackOverflow was born, a kind of LinkedIn only for programmers and IT professionals.
Every member can create an account, an electronic résumé. On it, you can add all kinds of information: from job history, known technologies and authored articles up to projects you were involved and books you’ve read. For optimal functionality, the site integrates APIs from very popular 3rd-party sites like LinkedIn, GitHub, BitBucket, Amazon, SourceForge and many more. Various StackExchange profiles can also be included here, together with the best and highest voted answers.
Companies are not neglected on this site either: they can create their own pages where they can include a company description, a map with its location, currently available jobs, pictures, accounts of key employees, benefits, technologies used in its projects and many more.
III). Big Data
All accounts, questions, answers, comments etc. added to the StackExchange network are publicly available through a series of special sites and APIs. This is possible because of the license, see the “The content’s license” chapter.
This is a special site that allows access to the network’s content. Members can write SQL queries in a big text area, execute those queries and see the results in real-time. To help members in writing these queries, the site allows viewing the complete structure of the database tables that have the content.
Because StackExchange’s architecture works using SQL Server databases, the queries must respect this vendor’s syntax.
The databases used on this site are not the same as the ones used by the live StackExchange sites; instead, they’re just a copy. This means that data is not entirely up-to-date. An update of Data.StackExchange’s databases usually happens once a month.
Members that log in on this site can save their queries and then come back to change and improve them.
Given the public nature of the content, some information is not available on this site, like for example people’s email addresses.
StackExchange API and StackApps
Another way to access the network’s content is through the StackExchange API, a REST webservice that returns data in JSON or JSONP (padded JSON) format.
This API can be used in 3rd-party applications that rely on the StackExchange network. There are a lot of such applications already published, especially for mobile devices.
There’s a lot of content that can be accessed through this API only by authentication. To do this, the application must be registered in StackApps, at which point you’ll receive an authentication key. With this key, there will be a much higher allowed traffic limit for the application.
StackApps is, like I said above, a site where the applications that use the API can be registered. Authors present their applications here, together with installation instructions. Discussions about the API and how to use it are also present here.
The format and philosophy on which the StackExchange network was built is a real success, its popularity cannot be questioned. The sites that it includes made the work much easier for millions of people of all professions. Personally, I save many hours by using these sites, hours which I would otherwise spend digging through the Internet’s far away corners trying to find answers to my questions. It’s practically impossible to calculate how much money StackEchange saves, but it’s pretty clear we’re talking about many billions of dollars .
 Cristoph Treude, Ohad Barzilay, Margaret-Anne Storey. How do programmers ask and answer questions on the web? In ACM, 21-28 May 2011.