Many current design research methods can help teams and organizations to successfully validate interaction, the usability of an user interface and the findability of content. While some research methods are quite exact and quantifiable, others can be less refined. However, regardless of the chosen method, if not done right, the likelihood of subjectivity creeping in during the research process becomes much greater—potentially masking the real issues at hand. So how can we truly test the quality of a visual design and/or the aesthetic preference? Ultimately, how can we test whether end users approve of a visual design?

Whether you’re tasked with designing a task-oriented interface — that almost always benefit from a straight-forward and usable appeal — or a marketing-focused interface, a common project requirement is visual adherence to the brand guidelines as well as an appealing visual design. Even if branding guidelines don’t exist, there are almost always some basic visual style requirements. As such, the guidelines and/or requirements ultimately set the tone moving forward.

Nevertheless, when it comes to visual preference subjectivity can still pose quite a challenge for designers and stakeholders alike. Why? For a start, it’s sometimes difficult to apply aesthetics that will resonate the same with all end users and customers. Furthermore, for each given design problem there are usually many different viable solutions to that problem. Hence, which solution is truly the best fit for the problem at hand? Therefore, the first question we need to ask, is whether we can successfully measure aesthetics?

Scientists Papachristos and Avouris (2011) defined visual attributes such as symmetry, order and complexity, balance, and contrast as low level evaluative constructs. Perceived usability, credibility, trustworthiness, novelty, and visual appeal represent high level constructs. This is an important distinction, because high level constructs can not be measured mathematically, and usually require some qualitative feedback from real people to be evaluated.

But, but, but! Why would you test the aesthetics in the first place? Is it not the case that designers should be the final decision making authority? Well … not so fast. Indeed, experienced designers bring a lot of skills, ideas and expertise to a project. They skillfully identify and create different viable solutions. But even for the most experienced, it’s sometimes difficult to know which solution resonates the most with the target end users.

Benefits of including end-users

Apart from the primary reasons outlined above, there are other less obvious benefits to including end-users in the process:

  1. Aligning the team’s vision and developing common understanding. After seeing real people using the design, the whole project team, including key decision makers and stakeholders, get to clearly understand which design assumptions were correct and which ones were less relevant. This helps with aligning the team’s vision, developing common understanding and pushing everyone’s efforts in the same direction.
  2. It’s easy to estimate if the visual appeal is an important factor in the first place — and to what extent. In some projects, quick access to information heavily outweights the need for visual appeal. In others, overly sophisticated design can send the wrong message about the value of the product. However, even when core usability is top priority, many websites still benefit from distinctive visual appeal and brand recognition.
  3. End customers provide us with nuanced feedback on how they perceive the design and the organization. As a result, the visual design is informed and can be improved based on the input provided by the external audience. Sometimes a visual style preferred by the project team doesn’t resonate with the end customer and vice-versa. After all, as a designer Zuzana Licko said, “we read best what we read most”, which is obviously a different experience for each and everyone of us. What the designers or the stakeholders are used to seeing is not neccessarily what the end customers are used to seeing.

Having hopefully established strong enough arguments in favor of testing, let’s review some viable options for testing aesthetics of interfaces.

Scientific methods for testing website aesthetics

There are multiple options available for validating aesthetics. However, visual design tests don’t have to be conducted as a separate activity. In fact, the examples below can be easily combined with other evaluation methods too. This is what our design team at SymSoft prefers resulting in better efficiency and project budget optimization. For instance, after the usability test phase of the session is done, we can also test a participant’s attitude towards the visual aspect of the interface design. How?

  1. Ask participants to solve a cognitive task after using the design. This is useful when comparing which of two or more solutions are more appealing. Science says that people perform better in cognitive tests — such as the Candle problem — after they are being exposed to a more appealing design. For example, the participants who receive the good typography afterwards perform better on Isen’s cognitive tasks as well as on subjective duration assessment (PDF). This test is quite reliable, because the participants aren’t aware of the connection between the goal of the test (which is how they perform after being exposed to a design) and the cognitive problem task. A good way to establish a baseline is to first test the current design and then compare it with the results of the new design test.
  2. Relative Subjective Duration assessment (PDF). Time flies when we’re having a good time. Studies have showed that users underestimate tasks they felt were pleasant to accomplish. By comparing the user’s estimated time and the real time it took them to accomplish a task, we can easily compare two versions of a visual design where all other factors, such as navigation labels and interaction remain unchanged.
  3. Semantic Differential test. Semantic differential is a rating scale used to measure opinions, attitudes and values on a psychometrically controlled scale. By offering pairs of antonyms, subjects can select a value on a scale to evaluate the interface. Two examples of scales are modern vs traditional and appealing vs off-putting, but we can use many more scales in the same test depending on the given brand attributes.
  4. Desirability test was originally developed by Microsoft (DOC). Each participant individually selects a number of cards (for example, three, five, or more) from a variety of cards with different adjectives written on each card (one adjective per card, 60 percent positive vs. 40 percent negative adjectives). After a few participants undertake the test, the generated word cloud from all of the participants provides a clear idea about the aesthetic perception of the interface.

The methods above can be combined into a more robust suite of tests to push the research even further. For example, by introducing different versions. However, it’s not always viable to create multiple versions of the interface. Rather, it makes more sense to apply iteratice approach to the design process.

A more simple, less biased method

When following an iterative design process, it’s useful to use design validation for course correction, especially if we have conducted exhaustive user research up front and it’s clear why people visit a website. An iterative design process also allows us to utilize a much simpler method to test aesthetics which can be easily combined either with user interviews or usability tests. This is how we combine multiple tests into one session.

First, we capture the interview or the usability test verbatim. After the session, we would extract all the adjectives participants used during the sessions.

Second, before the end of each session we literally ask each participant to describe the interface with five adjectives:

Please use five adjectives to describe the website you just tested:

1.___________________

2.___________________

3.___________________

4.___________________

5.___________________

This simple task is open ended, unlike the Semantic Differential or Desirability Test that spell out options to the user and hence introduce a bias. With an open list, participants can come up with answers that first spring into their minds.

Finally, we compare the adjectives they used during the usability test against the ones they selected after the test to gather insight into their perception of the interface. Needless to say, the ones used during the test tend to have more value. However, if the two lists of adjectives end up matching one another and they too match the list of brand attributes—then we can be really confident that the design resonates with its intended audience. Be aware, that if the adjectives used are too diverse, then it just means the visual communication is not clear and requires more work.

The key factor is to ensure you gather a sufficient volume of feedback. From experience, having fewer than 5 participants, more often than not, does not generate any overlaps. With five or more participants, their combined answers generally reveal numerous patterns. However, we rarely see any improvement in precision after ten participants.

Including end-users pays off

Testing and validating designs with end-customers improves the final result and builds the project team’s common understanding about the user’s expectations. By testing designs, we end up delivering better websites that meet user’s needs and as a result improve business objectives.