Featured Case Study

Publisher-backed
Sci-fi Story-rich Platformer Indie Title

Identifying high priority usability issues and first-time player reactions to a WIP demo through external playtesting

Project Overview

After working as a User Experience (UX) Researcher in tech for 10 years, I decided to take my skillset and move over to the video game industry out of my lifelong love for the medium.

In the early stages of starting my independent games UX research business, I offered to run a pro bono playtest for a friend’s indie studio’s mid-development game to both hone my skills and gather helpful player-centered data for them.

The timing aligned in summer 2025, because the team was readying a new public demo to be released at the start of September. We decided to run the playtest on the WIP demo build in advance of the build lock deadline.

Approach & Methods

Kick-off Meeting
I led a 1-hour kickoff meeting with the dev team where I asked them to share more background about the design intent and goals to make sure I had a good baseline. I had played an older demo prior to the meeting, but this discussion was important for filling in some gaps.

Then to align on learning objectives and prioritization for the playtest, I had each team member solo-brainstorm what felt risky and untested about the game and what they wanted to learn from the playtest. I then asked each person to share their top few questions with the group. One of the dev team remarked it was really interesting to hear how different the others’ questions had been! We finally chatted about what was most important and still changeable in the list, and that discussion helped me understand where I needed to go deep vs. where we just needed a bit of data for now.

Methodology & Approach

Unmoderated Playtesting
Since the playtest was on a WIP playable demo, it was a good candidate for unmoderated playtesting. Unmoderated playtesting is when players play the build on their own, usually at home, recording their gameplay while they play and think out loud so the team can understand why they’re doing or thinking a certain way. Unmoderated testing reduces the Hawthorne effect, which is when people alter their behavior because they’re being watched — sometimes without even realizing it. Having another person present can lead to players rushing when they wouldn’t, or not being as open and honest as they might be when they’re on their own and just talking at their computer. This leads to more natural gameplay and more honest responses to the game and post-test survey questions. Unmoderated testing also comes with a bonus of faster turnaround times than moderated testing, since a moderator or researcher doesn’t have to be present for each session.

(That said, moderated testing, where a researcher or moderator is present, is extremely valuable and can outshine unmoderated testing in some cases! For example, some early builds or prototypes may be a bit too unfinished or buggy to hand to players on their own and may lead to players just getting stuck, so a moderator or researcher can help guide players and implement workarounds as needed. Moderated playtesting is also great when it’s critical to be able to ask follow up questions about something that a player did or said right away! Players will often quickly forget exactly what they did or said, so it’s best to ask ASAP where possible.)

Target Playtester Number and Definition
I recommended 5 playtesters, because we were primarily interested in uncovering any usability frictions or comprehension gaps in the core mechanics - player opinions were secondary. In other words, we wanted to make sure that players could do what was expected of them, and if not, learn why. 5 is a great number for this kind of learning objective, because it’s been shown that it’s enough participants in a usability study to generally find “most” usability issues (though definitely not all!). Usability expert Jakob Nielsen has a good article on this.

Prior to the kickoff, I had drafted a target player profile based on what I knew about the game. During kickoff, I shared this with the team, and we made some tweaks based on their feedback. For example, we removed “liking puzzle games” as “puzzle games” has too broad of a definition and might lead to misaligned expectations. Instead, I added a requirement that they enjoy games where you solve a mystery through gathering clues.

Ideally, target players liked playing games that were: Single-player; Indie; Sci-fi; Story-rich; Platformers; Mystery or deduction-style games. They also had to have played PC games at least a few times per month over the last 3 months, and they had to have bought at least one game on Steam in the last year. Ideally, they also would have played one or more games from a list of 8 titles that had been inspiring to the dev team and/or that shared some similar characteristics to the game.

Playtest Design Phase

I designed a 10-question screener survey to assess the above characteristics to check if interested potential playtesters would be a good fit.

I also designed in-test tasks to check players’ comprehension and ability to pull off key mechanics that were taught throughout the demo. Players played naturally for 45 minutes before they were asked to complete these tasks. This ensured that we weren’t biasing their behavior by having them complete the tasks too soon.

I also designed the 15 question post-playtest survey. I asked players to share their opinions on top priority topics, such as how positive or negative was their overall experience, how compelling was the story, whether they’d ever felt stuck, etc. I also asked them to share why and/or for a specific example or two as a followup to some questions to dive deeper.

Finally, I also designed a pre-playtest briefing to make sure that players were using the right input method and that they were aware this was a work-in-progress, so they should expect bugs and unfinished art, etc. I also gave instructions on how to reset the game if any major bugs were hit that blocked progress.

Pre-playtest Build Review

In order to make sure we got back high quality, helpful data from the playtest, I asked the team to share the build 2 days prior to our pilot launch day. I played through it and flagged critical bugs that needed fixing before launch. We actually ended up delaying the playtest launch several days to get the build in better shape (more on that below in the section about Collaboration and Lessons Learned!).

Getting Playtest Data

Playtesters were sourced from the Playtestcloud PC Player panel. Playtesters who enjoyed matching genres received an email inviting them to take the screener survey. As answers rolled in, I reviewed them and selected who to invite to the playtest.

Playtesters who were selected received a Steam code. To start the playtest, they used the Playtestcloud app, which automatically recorded the gameplay and their audio as they thought out loud.

I started with one pilot playtest, to make sure there were no lurking issues in the build or the playtest’s design. I launched that on a Tuesday afternoon. Once that came back the next day, I reviewed it and all was well, so I launched the full playtest to 4 more players on Thursday. I also summarized the pilot test findings and shared it with the team ASAP so they could act on the findings (which they did!).

By Saturday morning, we didn’t have 4 playtests back yet, so I invited several more players who matched our target profile well to ensure we got enough responses before the weekend was up. On Monday, we had a full set of recordings and survey responses ready for me to analyze!

Findings & Insights

Mention keyboard & mouse findings - need more data

mention 2 critical issues related to comprehension of 2 core mechanics that blocked progress. The team had heard feedback on these being confusing before and this data helped cement that the teaching of these mechanics needed iteration.

Mention the top 5 and top 3 lists, and the what they should make sure to keep (usability win moments) in future builds. Swamped team, need to know what best ot spend time on. Also included screenshots of any high priority bugs that caused players to get temporarily stuck or degraded the experience noticeably.

Impact

TEXT

Collaboration & Lessons Learned

Build Stability

I requested that the team send a playtest build two days prior to our planned pilot launch. They delivered as promised, and in my next-day review I noticed some issues, but none were showstoppers, so I suggested we proceed. The team continued making well-intentioned changes the day before launch, and when I reviewed the new build on pilot day, I discovered a progress-blocking bug right near the start of the demo. The team had tested internally, but only in the editor, so some critical bugs hadn’t surfaced.

I immediately flagged the issue and proposed two options: use the earlier, more stable build, or delay until the new build was fixed. The team chose to delay, ultimately by three business days, so playtesters would have a smoother experience and we’d collect higher-quality data.

Three days later, I reviewed the updated build: the blocking bug was gone and several lower-priority issues had also been resolved. We launched the playtest, and players were able to get through most, if not all, of the demo. Along the way, they hit critical usability-related comprehension gaps around key mechanics. These moments of confusion turned into valuable insights — highlighting issues that could otherwise frustrate players, reduce motivation to continue, and lower the chances of wishlisting or positive word of mouth.

This reinforced the importance of leaving more buffer time for build handoff, review, and lock, and also the reinforced the value of having baked in time for pre-playtest build checks to catch issues early before involving playtesters.

Information Sharing

Because the team was swamped, I sent lightweight updates each time I completed a Playtester Summary Briefing in a team chat channel. I made sure to briefly highlight the highest impact takeaways per playtester so that they could stay updated at a glance.

Even though they didn’t have much time to go deep on the findings, these lightweight updates led to the team being better aware of which issues were highest impact to address. Thanks to a combination of findings from this playtest and other sources, including a recent peer playtest and other older tests, the team prioritized tweaking both of the in-game tutorials with the goal of improving clarity — one by tweaking text in the tutorial itself as well as the “clues” log that kept track of what you’d learned so far, and the other was addressed by combining two tutorials to more clearly convey how the mechanic functions, both of which were critical to understand for players to be able to move forward.

This reinforced the value of sending very brief updates about the top findings as analysis completes on portions of the data — the team was still able to keep a pulse in spite of feeling too slammed to spend much time with the details. I’ve also realized the importance of asking the best way to send updates, no matter what. We’d had all our calls in Discord, so I’d sent playtester briefing updates to a channel in Discord, assuming that was best. But it turned out the team actually primarily text-chatted in Slack! Lesson learned.