Enterprise IT Watch Blog

May 30 2012   9:30AM GMT

YouTube engineer Apple Chow reveals testing secrets

Michael Tidmarsh Michael Tidmarsh Profile: Michael Tidmarsh

Does your company need help testing new software? In the book, How Google Tests Software, legendary expert James Whittaker and two Google experts show you new techniques and the best practices you can use for testing software.

Here’s a excerpt from the book as the experts interview YouTube test engineer Apple Chow. Enjoy!

An Interview with YouTube TE Apple Chow

‘Apple Chow is a TE for Google Offers and before that a test lead for YouTube in Google’s San Francisco office. Apple likes new challenges and is constantly looking to leverage the latest tools and techniques for testing.
The authors recently chatted with Apple about her thoughts on testing in YouTube.’

HGTS: Apple, what brought you to Google? And with a name like yours, surely you thought about employment elsewhere?

Apple: Ha! apple@apple.com is very tempting! But I came to Google because of the breadth of our product offerings and the opportunity to work with really smart and knowledgeable people. I like to change projects and I love a wide variety of challenges, so Google seemed like the right place for someone like me. I get a chance to make a difference to millions of users across a lot of different product areas. Every day is a new challenge and I never get bored. Of course the free massages are definitely a plus.

HGTS: What did you think of the interview process for TEs and SETs?

Apple: Google focuses on finding generalists who can learn, grow, and tackle a wide variety of problems. This goes for TEs, SETs, and SWEs in my opinion. A lot of places interview for specific roles on a specific team and the people you meet are all going to be people you will work with closely. A Google interview doesn’t work that way. The people interviewing you are from a variety of different teams, so you get a lot of different perspectives. All in all, I think it’s a process designed to get good people who can work on almost any team at Google. This is important, too, because it’s easy to move around within Google so you can always choose a new product area to work in, a new team to work with. Being a generalist is important in this kind of a structure.

HGTS: You have worked at many other tech companies. What would you say was the most surprising thing about software testing at Google?

Apple: A lot of things are different. Perhaps I am biased because I like Google so much, but I would say that our TEs and SETs are more technical than at most other companies. At other large companies I worked for, we had specialized automation teams and then a bunch of manual testers. SETs at Google have to write code; it’s their job. It’s also rare to find a TE who can’t code here. These coding skills allow us to be more impactful early on when unit testing is far more prevalent and there’s nothing really to test end-to-end. I think our technical skills are what make us so impactful here at Google.

Another thing that makes Google unique with respect to test is the sheer volume of automation. Most of this automation executes before manual testers even get hold of the product. When they do, the code they get to test is generally of very high initial quality.

Tooling is another difference. In general, we don’t use commercial tools. We have a culture where tooling is greatly appreciated and 20 percent time makes it so that anyone can make time to contribute to the internal Google toolset. Tools help us get past the hard and repetitive parts of testing and focus our manual efforts to really impact things where a human is actually required.

Then, of course, there is the developer-owns-quality and test-centric SWE culture we have here that makes it easy to relate to SWEs. We’re all in the quality game together and because any engineer can test any code from any machine, it makes us nimble.

HGTS: What things would you say are pretty much the same about testing at Google?

Apple: Software functionality that is hard to automate is just as hard to test or get right as at any other company. When there is a huge rush to get features out, we end up with code that isn’t as well tested as we’d like it to be. No company is perfect and no company creates perfect products.

HGTS: When you were a TE for YouTube, what feature areas were you responsible for?

Apple: I’ve worked with many teams and helped launch many features at YouTube. Some notable mentions would be the launch of the new Watch page that is a complete redesign of the YouTube video page, one of the most viewed pages on the Internet I am happy to say! Another memorable project is our partnership with Vevo. It is a new destination site for premium music content with YouTube powering the video hosting and streaming. It’s a joint venture with Sony Music Entertainment and Universal Music Group. On day one of the launch, more than 14,000 videos went live, and they averaged 14 M views on VEVO premium videos on YouTube.com for the next three months, following the December 8, 2009 launch. I also coordinated test efforts for the major rewrite of the YouTube Flash-based video player during our move from ActionScript 2 to ActionScript 3, and the launch of the new Channel and Branded Partner pages.

HGTS: So what does it mean to be a lead in testing at Google?

Apple: The lead role is a coordination role across the product, across the team, and across any product that our work might impact. For example, for the Vevo project, we had to worry about the YouTube player, the branded watch component, channel hierarchy, traffic assignment, ingestion, reporting, and so on. It’s definitely a “forest and not trees” mindset.

HGTS: How have you adapted the concepts of exploratory testing to YouTube?

Apple: With a product that is so human-oriented and visual as YouTube, exploratory testing is crucial. We do as much exploratory testing as we can.

HGTS: How did the YouTube testers take to the idea of exploratory testing?

Apple: Oh, it was a huge morale booster. Testers like to test and they like to find bugs. Exploratory testing increased the level of engagement and interest among the testers. They got to put themselves in the mindset of the person in the tour and, with those specific angles, got creative with the types of tests they conducted to break the software. This made it more fun and rewarding as adding more tests revealed interesting and esoteric bugs that would have otherwise been missed or discovered through more mundane and repetitive processes.

HGTS: You mentioned tours. Did James make you use his book?

Apple: When James first came to Google, that book was new and he did a couple of seminars and met with us a few times. But he’s up in Seattle and we’re in California, so we didn’t get much hand holding. We took the tours in the book and ran with them. Some of them worked, some didn’t, and we soon figured out which ones were the best for our product.

HGTS: Which ones worked? Care to name them?

Apple: The “money tour” (focus on money-related features; for YouTube, this means Ads or partner-related features) obviously got a lot of attention and was important for every release. The “landmark tour” (focus on important functionalities and features of the system) and the “bad neighborhood tour” (focus on previously buggy areas and areas that we find to be buggy based on recent bugs reported) have been most effective in uncovering our most severe bugs. It was a great learning experience for each one to look at the bugs others in the team had filed and discussing the strategy in finding them. The concept of tours was really helpful for us to explain and share our exploratory testing strategy. We also had a lot of fun joking about some of the tours such as “antisocial tour” (entering least likely input every chance you get), “obsessive compulsive tour” (repeating the same action), and the “couch potato tour” (provide the minimum inputs possible and accepting default values when you can). It was not only helpful to guide our testing; it built some team unity.

HGTS: We understand you are driving a lot of Selenium testing of YouTube. What are your favorite and least favorite things about writing automation in Selenium?

Apple: Favorite: Easy API, you can write test code in your favorite programming languages such as Python, Java, and Ruby, and you can invoke JavaScript code from your application directly—awesome feature and very useful.

Least favorite: It’s still browser testing. It’s slow, you need hooks in the API, and tests are pretty remote from the thing being tested. It helps product quality where you’re automating scenarios that are extremely difficult for a human to validate (calls to our advertising system backend, for example). We have tests that launch different videos and intercept the Ad calls using Banana Proxy (an inhouse web application security audit tool to log HTTP requests and responses). At a conceptual level, we’re routing browser requests from browser to Banana Proxy (logging) to Selenium to Web. Thus, we can check if the outgoing requests include the correct URL parameters and if the incoming response contains what is expected. Overall, UI tests are slow, much more brittle, and have a fairly high maintenance overhead. A lesson learned is that you should keep only a few such high-level smoke tests for validating end-to-end integration scenarios and write as small a test as possible.

HGTS: A large portion of YouTube content and its UI is in Flash; how do you test that? Do you have some magic way of testing this via Selenium?

Apple: No magic, unfortunately. Lots of hard work here. There are some things Selenium does to help and because our JavaScript APIs are exposed, Selenium can be used to test them. And there is the image “diffing” tool pdiff that is helpful to test the rendering of the thumbnails, end of screen, and so on. We also do a lot of proxy work on the HTTP stream to listen to traffic so we know more about changes to the page. We also use As3Unit and FlexUnit to load the player, play different videos, and trigger player events. For verification, we can use these frameworks to validate various states of the software and to do image comparison. I’d like to say it’s magic, but there is a lot of code we’ve written to get to this point.

HGTS: What was the biggest bug you or your team has found and saved users from seeing?

Apple: The biggest bugs are usually not that interesting. However, I recall we had a CSS bug that causes the IE browser to crash. Before that we had never seen CSS crash a browser.

One memorable bug that was more subtle came up during the new Watch Page launch in 2010. We found that when the user moves the mouse pointer outside of the player region, in IE7, the player would freeze after some time. This was interesting because users would encounter this bug if they were watching the same video for an extended period of time and moving the mouse around. Everything got slower until the player finally froze. This turned out to be due to unreleased event handlers and resources sticking around and computing the same things over and over again. If you were watching shorter videos or being a passive viewer, you wouldn’t observe the bug.

HGTS: What would you call the most successful aspect of YouTube testing? The least successful?

Apple: The most successful was a tool to fetch and check some problematic URLs. Although it was a simple test, it was really effective in catching critical bugs quickly. We added a feature to make the problems easier to debug by having it provide stack traces that the engineers could then use to track down problems and develop fixes. It quickly became our first line of testing defense during deployment and brought along considerable savings in testing time. With only a little extra effort, we extended it to hit the most popular URLs from our logs plus a list of hand-picked ones. It’s been very successful.

The least successful is probably our continued reliance on manual testing during our weekly pushes. Given that we have a very small time window for testing (code goes out live the same day it’s frozen) and we have a lot of UI changes that are hard to automate, manual testing is critical in our weekly release process. This is a hard problem and I wish we had a better answer.

HGTS: YouTube is a very data-driven site as much of the content is algorithmically determined; how do you verify that the right videos are displayed the right time and place? Does your team verify the video quality? If so, how do you do this?

Apple: We measure how much and which videos are being watched, their relationship to each other and a whole lot of other variables. We analyze the number of buffer under-runs and cache misses, and we optimize our global-serving infrastructure based on that.

We have unit tests for video quality levels to make sure the right quality is used. After I changed groups, our new team wrote a tool to test this in more depth. The tool is open-sourced29 and it works by having FlexUnit tests that use the embedded YouTube player to play a variety of test videos and make some assertions about the player state and properties. These test videos have large bar codes on them to mark frames and the timeline that are easily recognizable despite compression artifacts and loss of quality. Measuring state also includes taking snapshots of the video frames and analyzing them. We check for the correct aspect ratio and/or cropping, distortion, color shifts, blank frames, white screens, synchronization, and soon—issues found from our bug reports.

HGTS: What advice do you have for other testers of Web, Flash, and data-driven web services out there?

Apple: Whether it’s a test framework or test cases, keep it simple and iterate on the design as your project evolves. Don’t try to solve everything upfront. Be aggressive about throwing things away. If tests or automation are too hard to maintain, toss them and build some better ones that are more resilient. Watch out for maintenance and troubleshooting costs of your tests down the road Observe the 70-20-10 rule: 70 percent small unit tests that verify the behavior of a single class or function, 20 percent medium tests that validate the integration of one or more application modules, and 10 percent large tests (commonly referred to as “system tests” and “end-to-end” tests) that operate on a high level and verify the application as a whole is working.

Other than that, prioritize and look for simple automation efforts with big pay-offs, always remembering that automation doesn’t solve all your problems, especially when it comes to frontend projects and device testing. You always want smart, exploratory testing and to track test data.

HGTS: So tell us the truth. YouTube testing must be a blast. Watching cat videos all day …

Apple: Well, there was that one April Fool’s day where we made all the video captions upside down. But I won’t lie. Testing YouTube is fun. I get to discover a lot of interesting content and it’s my job to do so! And even after all this time,
I still laugh at cat videos!

This excerpt is from the book, “How Google Tests Software’, authored by James Whittakeer, Jason Arbon and Jeff Carollo, published by Pearson/Addison-Wesley Professional, March 2012, ISBN 0321803027, Copyright 2012 Pearson Education, Inc. For more info please visit the publisher site, www.informit.com/swtesting

 Comment on this Post

There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

Share this item with your network: