Thursday, January 26, 2012
Introducing the reddit Snoo Chullo (hat)
Wednesday, January 25, 2012
January 2012 - State of the Servers
My fellow redditors: the state of our servers is strong.
2011 was a year of explosive growth and daunting technical hurdles. Our infrastructure has changed dramatically over the past 12 months. I'm here to show you some the more technical details of the changes that have been made, and dazzle you with fanciful talk of the future.
To look at just the numbers, in December of 2010 we had 829 million pageviews and 119 servers. Today, we have 2.07 billion pageviews with 240 servers. That's an increase of 149% for pageviews and 101% for servers.
Postgres
Some of the more lengthy downtimes in 2011 were due to complications surrounding our Postgres infrastructure. The main issue being that whenever EBS volumes would slow down on our masters (which happened often), our database replication system, Londiste, would break. This required us to rebuild the broken slaves, and try to keep the site running while the long rebuild process completed. These replication breaks also caused data corruption on the slaves, resulting in bad data persisting in cache even after the slaves were fixed. In short, it was a huge mess of work whenever this happened.
The bug causing the replication break was unfortunately very transient, and it could not be reliably reproduced in testing. However, a complete upgrade from Postgres 8 to Postgres 9 appeared to resolve the issue. That upgrade was completed in July, and we have nary seen a replication problem since. We currently are just shy of 2TB of data in postgres, which takes an awful long time to replicate.
Farewell, EBS
One of the more painful lessons of the past year has been that EBS' performance degrades often. We were using EBS to store all of our Cassandra and Postgres data. After spending a considerably large amount of time trying to work around these issues, we came to the conclusion that EBS in its current state was not reliable enough. The feature set of the product can be quite handy, but the consistent performance degradations were not worth it.
As a result, we have moved all of our high-traffic data off of EBS, and onto local ephemeral disks. This migration required us to bulk up our redundancy considerably; a hardware failure on an ephemeral means your data is gone. Since the move, we have had significantly fewer issues with disks on our Postgres and Cassandra servers.
Cassandra 0.8
Throughout this whole year, we've been migrating data off of our mostly broken Cassandra 0.7 ring onto our mostly unbroken Cassandra 0.8 ring. This has resulted in much improved stability and faster response times. Additionally, our newer features - like flair and the moderation log - are canonically stored in Cassandra as opposed to Postgres. That said, there's still a lot of work to be done on Cassandra.
Random small improvements
There are a ton of small changes made each week that individually have a negligible impact. We've rekicked most of our servers to Ubuntu Natty and use Puppet to keep their configurations in sync. We're slowly building a kick system to automate most processes with adding new servers to our setup. Our monitoring now exists and we fixed the office TV so we can keep an eye on Google realtime analytics. :-)
The Future
Maintaining the infrastructure for a site like reddit is an exercise in never-ending changes. Here are some of the bigger projects we are going to be working on this year.
No downtime downtimes
One thing we hate the most is having to take the site down for maintenance. We try to avoid it wherever possible, but some changes will always require that systems be taken offline. While these maintenances will still be necessary for the foreseeable future, a project is currently in the works to lessen the sting of downtime.
rram is in the process of working with Akamai, our Content-Delivery Network, to change the sites behaviour during downtimes. Instead of taking the site completely offline for maintenance, we will instead be able to allow Akamai to serve up a cached, read-only version of the site. Once this project is complete, the majority of our maintenances can be done while still serving the site in some capacity. This same method can also be used during unexpected downtimes, should they ever pop up. Moreover, we're researching Edge Side Includes as a method to further reduce load on our servers. I expect that these will greatly reduce the number of bananas consumed by redditors during site downtime.
Cassandra 1.0
We're very happy to see Cassandra reach its shiny new "1.0" state. I'm planning on upgrading our Cassandra 0.8 ring to 1.0 very soon. This upgrade will resolve some of the more pesky issues on our current Cassandra ring, such as difficulty with bootstrapping and repairing.
Why buy one, when you can buy two for twice the price
One of the largest projects on the horizon is running reddit in multiple datacenters concurrently. This will allow us to gain some redundancy, and it is the first step in being able to host the site in multiple regions. Doing so will require significant changes in both the infrastructure and the code of the site. It is a huge undertaking, but it is well worth it.
It has its ups and downs
Finally, we're looking forward to have our infrastructure self-heal and auto-scale. Right now, when bad things happen, we get alerts, but for the most part any fixes are completely manual. This often leads to either app server bloat (we made 15 new ones in 2012 already) or us temporarily sacrificing a post in order to keep the rest of the site healthy. :-( With our infrastructure self-healing and auto-scaling, we'll be more hands off and working on getting rid of bottle necks, rather than fighting server fires.
tl;dr: 2011 was an awesome year, and things are only looking better. We could not have done all of this without your support through reddit gold, advertising, postcards, and just awesomeness. Remember, here at reddit, we're working hard so you won't have to!
Thursday, January 19, 2012
Thank you, reddit. Thank you, internet.
By the end of the day, over 115,000 websites had creatively altered their pages in protest of SOPA and PIPA. By doing so, they not only informed their audiences about these crucial bills, they helped reframe Wednesday's event from single sites protesting into an internet-wide blackout.
The impact of Wednesday's protest has been massive. Congressional attitudes on SOPA have shifted in direct response to your actions, and we now have senators posting to thank reddit for being involved and rally its attention. This morning, Tuesday's crucial PIPA cloture vote in the Senate was postponed, and the House Judiciary Committee has delayed action on SOPA.
Despite this momentary success, the threat of internet suppressing legislation is far from over. We are still in the early days of the internet. There will be future bills and future hearings. The problems and conflicts of interest these bills brought to the foreground have not gone away. Facilitating fair business models without endangering free expression is one of the most important problems of our medium, and if we want to end up differently from past revolutions, we'll have our work cut out for us.
Looking back, the past few days have been defining ones for us and other internet communities. Thank you, individuals and organizations that first raised the red flag about these bills. Thank you, Senator Wyden, Representative Loefgren, Representative Issa, and all of the other elected officials and staffers who are helping fight these bills, and did so even when the odds were long. Thank you, reddit, for deciding on and championing our blackout, and for helping to make an impact greater than any of us could have had separately. Thank you to the over 13 million people that got involved on January 18th.
Tuesday, January 17, 2012
A technical examination of SOPA and PROTECT IP
As a disclaimer, I am not a lawyer, I'm a sysadmin. The following is not legal advice, but rather an outline and personal interpretation of critical portions of the legislation. If you own or operate a site that may be affected by this legislation, I suggest having your legal counsel look at these bills. If you're a brand new startup with little to no money for legal counsel, well, best of luck to you. The internet may no longer be a friendly place.
Note: In recent news, several legislators have suggested that they will be removing the DNS provisions from both SOPA and PROTECT IP. However, those provisions still exist in the bills today, and they are likely to still be debated. For these reasons, I'm going to include the DNS provisions in this discussion.
The Sacred Texts
Much of this post will be focusing on Title 1, Sections 101, 102, and 103 of SOPA; and Sections 2, 3, and 4 of PROTECT IP. I hope to make the impact of these bills clear, however you shouldn't just blindly trust me. Here are links to the current versions of the bills provided by Library of Congress and the Government Printing Office.
- PROTECT IP (Senate) AKA Preventing Real Online Threats to Economic Creativity and Theft of Intellectual Property Act of 2011
- Stop Online Piracy Act (House)
The Battlefields
One of the most important distinctions in these bills is the difference between a 'foreign site' and a 'domestic site'. The definitions try to break websites into two groups using some fairly simple language; however the results may be unexpected in several cases.
- SOPA Domestic Internet Site - A domestic site is defined as a site that corresponds to a 'domestic domain name', or if there is no domain name, a domestic IP address.1 A domestic domain name is defined as a domain registered or assigned by a registrar or other authority that is located within the United States.2 Some common examples of domestic top-level domain names are '.com', '.org', and '.us'.
- SOPA Foreign Internet Site or PROTECT IP Non-domestic domain name - A foreign site is very simply defined as a site that is not a domestic site.3 4 Under this definition, any site not using a domestic domain name is a foreign site.
The Players
- Service Provider - A service provider is defined as a service that hosts a non-authoritative DNS server.5 This includes ISPs, sites like OpenDNS, Google's public nameservers, and any other service providing a public DNS resolution server.
- Internet Advertising Services - A service that will serve, display, or "otherwise facilitate" an ad in return for compensation.6 7 This includes both services which display ads linking to sites (e.g. Google AdWords and reddit's self-serve advertising), and services which host ads on other sites (e.g. Google AdSense).
- SOPA Payment Network Provider or PROTECT IP Financial transaction provider - A service that handles payment transactions (e.g. PayPal).8 9
- SOPA Internet Search Engine - The definition of a search engine in the legislation is very wordy. What it basically comes down to is a service that provides links to other sites based on a user query or selection.10 Sites like reddit certainly fall within this definition. Other sites likely to fall within this definition are live blogs, link shorteners, wikis, and blog networks.
- PROTECT IP Information Location Tool - The definition of this is not included in PROTECT IP itself, but rather referenced to language in existing copyright law.11 The existing law doesn't explicitly define an 'information location tool', but instead gives some extremely broad examples.12 It boils down to any service that displays links or 'pointers'.
- SOPA U.S. Directed Sites - A site, or portion thereof, that is used to conduct business or provide services to U.S. residents.13 "Service" is not explicitly defined.
- Qualifying Plaintiff - A holder of intellectual property who is harmed by the activity of a foreign infringing site.14 15
The Powers
Most of the power in these bills is granted to the office of the Attorney General. The Attorney General can obtain a court order to take action16 17 against a foreign infringing site, or portions thereof, as defined by the following.
SOPA18
- The site is U.S. directed.
- The owner or operator of the site is "committing or facilitating the commission [my emphasis] of criminal violations punishable under section 2318, 2319, 2319A, 2319B, or 2320, or chapter 90, of title 18, United States Code." Those sections primarily deal with copyright infringement and counterfeit products.
- The site would be subject to seizure if it were instead a domestic site.
PROTECT IP19
- The site is used "primarily as a means for engaging in, enabling, or facilitating the activities" of copyright infringement or counterfeit products; or
- The site is designed by its operator "as a means for engaging in, enabling, or facilitating the activities" of copyright infringement or counterfeit products.
If this criteria is met, the office of the Attorney General can then serve this court order to entities in the U.S., requiring them to take specific actions against the site. The following are the actions which must be taken upon receiving the order from the Attorney General's office:
- Require U.S. sites and search engines to remove all links to the foreign site.20 21
- Require U.S. advertising services to no longer serve ads linking to the site, or display ads (e.g. AdSense) on the foreign site.22 23
- Require U.S. payment networks to cease any transactions between the foreign site and U.S. customers.24 25
- Require U.S. service providers to block customer access to the foreign site (DNS blacklisting).26 27
"No Duty to Monitor"
SOPA
The requirements of ad networks22 and payment networks24 include a 'no duty to monitor' paragraph. This paragraph indicates that the networks are in compliance with the requirements if they take the actions described on the date that the order is served. It should be noted that 'search engines' have no such paragraph. This would mean that search engines can be required to continually monitor and prevent new instances of links to foreign sites. Coming from the point-of-view of the drafter of the legislation, this makes perfect sense. Requiring a site to scrub all the links to a foreign site is a useless effort if the links will simply pop up again the next day.
Actions which can be taken by qualified plaintiffs
The Attorney General doesn't get all of the fun. Qualifying plaintiffs can also send notice28 30 to advertising services and payment networks requiring them to cease interaction with a foreign infringing site.29 31
The Devil in the Details
Domestic vs Foreign
The concept of 'domestic' versus 'foreign' on the internet is complex. For example, reddit's primary servers are located in Virginia, however we have domain names through foreign registrars (redd.it, reddit.co.uk). The site is hosted via a third-party content-delivery network (Akamai). This means that if you connect to reddit from a foreign country, you are likely connecting to an Akamai server not located in the U.S. This legislation naively ignores this complexity, and simply labels a site 'foreign' or 'domestic' based solely on the domain name.
The legislators sponsoring these bills have indicated that they are only targeted at truly foreign sites. However, the language is so loose and ignorant of what is truly a foreign site that there is a huge amount of room to argue what is actually "foreign".
Facilitation of criminal violations
The potential for abuse in this language is painfully obvious. "Facilitation" can often be argued as simply teaching or demonstrating how to do something. Under this definition, a site could be targeted for something as simple as describing how to rip a Blu-Ray. This language also makes it clear that the legislation is not solely targeting sites "dedicated to theft".
The Fallout
Why this is going to harm user-driven sites like reddit
Up to this point, reddit and sites like it have been required to remove specific copyrighted content if presented with a properly filled out DMCA takedown request. The notices are required to indicate exactly what pages the content is on, and to prove that they are indeed the owners of the content. Even then, this process is often abused.
SOPA and PROTECT IP contain no provisions to actually remove copyrighted content, but rather focus on the censorship of links to entire domains.
If the Attorney General served reddit with an order to remove links to a domain, we would be required to scrub every post and comment on the site containing the domain and censor the links out, even if the specific link contained no infringing content. We would also need to implement a system to automatically censor the domain from any future posts or comments. This places a measurable burden upon the site's technical infrastructure. It also damages one of the most important tenets of reddit, and the internet as a whole – free and open discussion about whatever the fuck you want.
Why this doesn't actually stop piracy
This legislation is aimed at requiring private U.S. entities to enforce restrictions against foreign sites but does nothing against the infringement itself. All of the enforcement actions can and will be worked around by sites focused on copyright infringement. U.S. citizens will still be able to use foreign DNS servers, new advertising and payment networks will pop up overseas, and "infringing sites" will still be linked to by other foreign sites and search engines. In fact, tools used to circumvent these forms of internet restrictions are being funded by the U.S. State department to offer citizens under "repressive regimes" uncensored access to the internet. When the dust settles, piracy will still exist, and the internet in the U.S. will have entered the realm of federal regulation and censorship.
Why this is ripe for abuse
The vague and technology-ignorant language in this pending legislation opens a huge number of doors for different interpretations. When you take this broad language and use it to grant powers to both the Attorney General and plaintiffs like the MPAA and RIAA, you create a system that is begging to be abused. Given the history of abuse of laws like the DMCA, it has become obvious that institutions like the RIAA can and will stretch laws to the breaking point, often while suffering no repercussions.
To prevent a repeat in history of the abuse of internet copyright law, any new legislation must be drafted with the following:
1. Airtight, technically sound definitions.
2. Heavy input from the technology sector. Complex technology legislation should not be drafted by someone who barely has a working knowledge of the internet.
3. Checks and balances ensuring that due-process can be invoked before, during, and after any action is taken.
4. Clear repercussions for entities utilizing the legislation in an abusive manner.
Why this is going to hurt startups and tech innovation
One of the big reasons why a company is able to go from a few computers in a garage to a multi-billion dollar company is due to the open nature of the internet. The barrier to entry on creating a new site or product is very low. Adding legislation that regulates this open platform will seriously hamper future business.
Entrepreneurs will need to invest in legal counsel to ensure they can properly respond to a PROTECT IP or SOPA order. New sites and products will need to invest precious development time to build-in censorship utilities so that they can remove links to foreign sites. New advertising networks will need to calculate the new risk of displaying ads for or on foreign websites. Sites will also be heavily discouraged from using non-US domain names due to the broad language in the bills on how they may be defined.
Adding regulation to one of the few growing sectors in the U.S. will result in a "chilling effect" and will push individuals and business to start ventures elsewhere. Threatening this existing ecosystem for the purpose of making it slightly harder to pirate movies is a very dangerous tradeoff.
In Conclusion
It is my strong belief that both PROTECT IP and SOPA:
- Will not stop the piracy they are targeting
- Contain language that is highly ambiguous and extremely broad making them ripe for abuse, and
- Introduce regulation and enforce censorship on what should be a free and open internet.
Further Reading
- The Impact of U.S. Internet Copyright Regulations on Early-Stage Investment [booz.com]
- How PIPA and SOPA Violate White House Principles Supporting Free Speech and Innovation [eff.org]
- The Stop Online Piracy Act Violates the First Amendment [net-coalition.com]
References
- H.R. 3261, Title I, § 101, Para 5 – Definition of a domestic internet site
- H.R. 3261, Title I, § 101, Para 3 – Definition of a domestic domain name
- H.R. 3261, Title I, § 101, Para 8 – Definition of a foreign internet site
- S.968, § 2, Para 9 -- Definition of a non-domestic domain name
- H.R. 3261, Title I, § 101, Para 22 – Definition of a service provider
- H.R. 3261, Title I, § 101, Para 12 – Definition of an internet advertising service
- S.968, § 2, Para 5 – Definition of an internet advertising service
- H.R. 3261, Title I, § 101, Para 21 – Definition of a payment network provider
- S.968, § 2, Para 3 – Definition of a financial transaction provider
- H.R. 3261, Title I, § 101, Para 16 – Definition of an internet search engine
- S.968, § 2, Para 4 – Information location tool
- U.S.C. Title 17, § 512.d – Information location tool
- H.R. 3261, Title I, § 101, Para 23 – Definition of U.S. directed sites
- H.R. 3261, Title I, § 103.a, Para 2 – Definition of Qualifying Plaintiff
- S.968, § 2, Para 11 – Definition of a Qualifying Plaintiff
- H.R. 3261, Title I, § 102.b – Actions which can be taken by the Attorney General
- S.968, § 3.a – Actions which can be taken by the Attorney General
- H.R. 3261, Title I, § 102.a – Definition of a Foreign Infringing Site
- S.968, § 2, Para 7 – Definition of a site dedicated to infringing activities
- H.R. 3261, Title I, § 102.c, Para 2.B – Actions which are required of Internet Search Engines
- S.968, § 3.d, Para 2.D – Actions required of Information Location Tools
- H.R. 3261, Title I, § 102.c, Para 2.D – Actions which are required of Internet Advertising Providers
- S.968, § 3.d, Para 2.C – Actions required of Internet Advertising Services
- H.R. 3261, Title I, § 102.c, Para 2.C – Actions which are required of Payment Network Providers
- S.968, § 3.d, Para 2.B – Actions required of Financial Transaction Providers
- H.R. 3261, Title I, § 102.c, Para 2.A – Actions which are required of Service Providers
- S.968, § 3.d, Para 2.A – Actions required of operators of non-authoritative DNS servers
- H.R. 3261, Title I, § 103.d, Para 1.A – Actions which can be taken by a Qualifying Plaintiff
- H.R. 3261, Title I, § 103.d, Para 2 – Actions which are required of Payment Network Providers and Advertising Services
- S.968, § 4.c, Para 3 – Actions which can be taken by a Qualifying Plaintiff
- S.968, § 4.d, Para 2 – Actions which are required of Payment Network Providers and Advertising Services
Tuesday, January 10, 2012
Stopped they must be; on this all depends.
The freedom, innovation, and economic opportunity that the Internet enables is in jeopardy. Congress is considering legislation that will dramatically change your Internet experience and put an end to reddit and many other sites you use everyday. Internet experts, organizations, companies, entrepreneurs, legal experts, journalists, and individuals have repeatedly expressed how dangerous this bill is. If we do nothing, Congress will likely pass the Protect IP Act (in the Senate) or the Stop Online Piracy Act (in the House), and then the President will probably sign it into law. There are powerful forces trying to censor the Internet, and a few months ago many people thought this legislation would surely pass. However, there’s a new hope that we can defeat this dangerous legislation.
We’ve seen some amazing activism organized by redditors at /r/sopa and across the reddit community at large. You have made a difference in this fight; and as we near the next stage, and after much thought, talking with experts, and hearing the overwhelming voices from the reddit community, we have decided that we will be blacking out reddit on January 18th from 8am–8pm EST (1300–0100 UTC).
Instead of the normal glorious, user-curated chaos of reddit, we will be displaying a simple message about how the PIPA/SOPA legislation would shut down sites like reddit, link to resources to learn more, and suggest ways to take action. We will showcase the live video stream of the House hearing where Internet entrepreneurs and technical experts (including reddit co-founder Alexis “kn0thing” Ohanian) will be testifying. We will also spotlight community initiatives like meetups to visit Congressional offices, campaigns to contact companies supporting PIPA/SOPA, and other tactics.We’re as addicted to reddit as the rest of you. Many of you stand with us against PIPA/SOPA, but we know support for a blackout isn’t unanimous. We're not taking this action lightly. We wouldn’t do this if we didn’t believe this legislation and the forces behind it were a serious threat to reddit and the Internet as we know it. Blacking out reddit is a hard choice, but we feel focusing on a day of action is the best way we can amplify the voice of the community.
As we have seen yet again in the fight against PIPA/SOPA, the best ideas come from our community. We all have just over a week to figure out exactly what to do with our extra cycles on January 18th. Please join us in the discussion in the comments here and in /r/SOPA.
— the reddit team
Learn More
- Information on H.R.3261 - Stop Online Piracy Act at OpenCongress.org
- Information on S.968 PROTECT IP Act at OpenCongress.org
- /r/SOPA FAQ
- Problematic language in the bill pointed out by a redditor.
- Video examination of bill's language.
Get Involved
- /r/sopa
- List of companies that have expressed support for SOPA or PIPA.
- List of tech companies, and their contact info, that have expressed support for SOPA or PIPA.
- List of companies that have expressed concern with SOPA and PIPA.
- Take Action Checklist at Stop American Censorship.
- Contact Your Representative with info and a widget to find them by EFF and Wired for Change.
- Directory of Representatives
- Senators of the 112th Congress
- Helpful info on making phone calls to your Senator or Representative.
- SOPAOpera.org keeps track of where your Congressmembers stand on PROTECT-IP and SOPA.
Thursday, January 05, 2012
2 Billion & Beyond
- 2,065,237,338 pageviews
- 34,879,881 unique visitors
- 12.97 pages / visit
- 16 minutes average time on site
- Over 100 million monthly pageviews per employee
- 100,000+ subreddits
- 8,400+ subreddits with over 100 subscribers
- We don't get traffic through ads.
- We don’t participate in any traffic trading.
- We don’t email our users (unless they choose to enter an email and then forget their password).
- We don’t harass users to sign up.
- We don’t harass users to invite their friends.
- We don’t pester you to download our app.
- We don’t use slideshows and other pageview gimmicks.
- We don't know anything about SEO.
- We don't integrate with Facebook.
- We don't even link to our Facebook or twitter accounts.
For the year of 2011:
Windows
|
68%
|
Mac
|
20%
|
Linux
|
4%
|
Android
|
3%
|
iPhone
|
2%
|
Chrome
|
42%
|
Firefox
|
34%
|
Safari
|
12%
|
IE (5% of IE traffic still on IE 6 ಠ_ಠ)
|
7%
|
Opera
|
2%
|
United States
|
65%
|
Canada America’s Hat
|
10%
|
United Kingdom
|
6%
|
Aussies
|
3%
|
Germany
|
1.5%
|
Monday, January 02, 2012
Who would you nominate for the "Best of reddit 2011"?
Sunday, January 01, 2012
reddit helps with your New Year's Resolutions
Have you thought of a New Year's Resolution? Do you have big plans and need the tools to help you accomplish your goals? With more than 8,000 very active subreddits, reddit can help you get and stay on the right track. For those of you who have created goals (and for those of you looking for resolutions), below are some places to start on your road to an awesome 2012.
Fit in fitness
/r/fitness — be sure to check out the FAQ first
/r/running
/r/bicycling
Get stronger, lose weight, run farther
/r/loseit
/r/BTFC
/r/c25k
Quit
/r/stopsmoking
/r/stopdrinking
Help others
/r/Random_Acts_of_Pizza
Get your financial house in order
/r/frugal
This is by no means a comprehensive list of subreddits that can help you reach your goals. Go out and explore. We want you to succeed, and we know that wherever you look for help on reddit you'll find an immense amount of support and advice from each other. If your dedication ever seems to waver, check out the fine folks at /r/getmotivated, who can help keep you on track — whatever your goal!