What’s new: Flows V2, New Filter in Charts, New System Audiences, and more.Let’s see

ProductProduct

ResourcesResources

Contact

Anton

July 12, 2023

17 min read

9 Best Practices for Paywall A/B-Tests: Find the Most Performing Option

In order to have top-notch subscription apps in your portfolio, you need to choose the best combination of paywall subscriptions and have a near-perfect onboarding and paywall itself.

17 min read

9 Best Practices for Paywall A/B-Tests: Find the Most Performing Option

The mobile app market is very competitive these days. This is certainly good news for consumers. The high level of competition in all niches and categories means that more quality apps are being placed in the app stores.

But it also means that it is becoming more and more expensive for developers to create and promote a product. The cost of acquiring users is going up, and it's getting harder and harder to scale app revenue. At the same time, the subscription-based monetization model remains the foundation for generating impressive profits.

A/B testing, or experimentation, which many people are familiar with, can help achieve these results.

Solutions for paywall testing

How do you test different paywalls, prices, and subscriptions in practice?

The implementation can vary:

Your own testing solution
Universal tools like Firebase
Specialized solutions tailored for testing pawls

Let's take a quick look at the pros and cons of these solutions.

In-house solution

Pros

Flexible customization
Easy customization

Cons

High labor cost to build and support
Requires the right expertise on the team

An in-house solution may be appropriate for teams with deep in-house expertise in building revenue analytics and mathematical split-testing algorithms. The advantage, in this case, is the flexibility of such a system. The disadvantage is the complexity of implementation and testing.

Universal tools

Pros

Free of charge
Good for simple UI testing

Cons

Need to make app updates to run tests
Slow speed of changes (critical when running for the first time)
Few event parameters in analytics to evaluate deeper insights
Absent convenient revenue analysis

Firebase is a popular solution from Google, which is well-suited for testing product hypotheses and conducting simple UI tests. The disadvantages - it takes a long time to get the released variations (critical for tests when you first run the application), and there are restrictions on the number of parameters of events. Also, Firebase is not tuned for testing on the fly.

Specialized solutions (like Apphud Experiments)

Pros

Tested for pricing/subscriptions
Can also test visual elements and text
On-the-fly testing - no need to involve developers
Easy-to-use, full-featured analytics with all the metrics you need

Cons

The solution included in paid subscription
Not suitable for product hypothesis testing

A dedicated paywall testing solution is best suited for this particular task. We at Apphud have created a tool called Experiments that allows you to easily run an unlimited number of tests without having to upload builds to the store and test up to 5 variations simultaneously.

Benefits of Apphud A/B Testing

No hard coding of products and design elements
No need to make updates each time, tests run on the fly
Easy-to-use analytics and experiments tailored for mobile pivots

Best practices for paywall testing

Well, we've chosen a solution for the test, and we've figured out what we want to test. Now, let's go over some general things to keep in mind while running the test.

1. Make sure you formulate your hypothesis correctly.

For example, you see a low conversion rate on the first purchase and want to increase it by emphasizing product benefits and the buy button.

But it could be that people aren't buying because of an offensive error or omission. In this case, a split test will do little to help you. It's worth taking a closer look at the product analytics and bug logs to formulate a more meaningful hypothesis.

2. Tests should be simple and involve ONLY one variable.

To be sure you understand exactly which change affected the outcome of the experiment, you cannot mix several different hypotheses within one test.

3. It should take enough time to complete the test successfully.

It usually takes 2-3 weeks on average to test one hypothesis in one experiment. Also, a sufficient amount of traffic is required to obtain a statistically significant result.

4. Don't make any changes to the test after the launch.

This is the most important right for successful hypothesis testing and getting a true result. After launch, you cannot make new releases or change application features using Remote Config if these changes will affect the user experience in the running experiment.

5. You can adjust the distribution of traffic in a running test.

Although you cannot make changes to a running test, it is possible to adjust the distribution of traffic between variations. For example, if you are afraid that a new variation being tested will have a strong negative effect, you can initially allocate 10-20% of the traffic to this variation, and increase this to 50% (e.g., 2 variations being tested) if no changes that dramatically worsen the metrics are revealed during the course of the test.

6. On average, 9 out of 10 experiments will fail.

Unfortunately, this is the average statistic on the market - the vast majority of experiments fail (either by not showing a statistically significant best result, or by showing a statistically significant worst result). Quality analytics and market analysis can help increase the percentage of successful experiments, and also with regular iterative tests in the product, the quality of tested hypotheses grows.

7. Don't test every little thing.

In an ideal world, we would want to test every possible change to a product. But as we know, testing every hypothesis takes a lot of time and traffic. So, prioritize and test only those changes that can have a significant impact.

8. Do a series of experiments.

Don't stop testing after one or more experiments fail. Remember that testing is a continuous, iterative process. Accumulate knowledge and draw conclusions from testing, which will help you formulate better hypotheses and conduct successful experiments in the future.

9. Peek wisely.

It is considered bad practice to observe interim results during testing. It can distort the perception of the final results of the experiment. However, if you don't interfere in the course of the experiment and don't jump to conclusions about the victory of this or that variant, it can be useful to watch the analytics of the launched A/B test. For example, you can stop a clearly unsuccessful experiment without wasting additional time, or you can redistribute the traffic between the tested variants.

Case Study: Overcomplicated Experiment Design

To illustrate how a poorly designed experiment can affect the outcome, let's break down an example.

Suppose we have a subscription app that attracts traffic through Apple Search Ads. However, the app has a simple paywall whose design hasn't been tested in a long time.

Recognizing the opportunity to increase conversion rates, the app owners drew 9 different variations of the app paywall to test and compare with the current one.

At first glance, it seems like a great experiment - we are testing a lot of visual solutions at once, and we should definitely find the best one with the highest conversion rate.

Apphud is a subscription data platform that reveals your app’s growth points

Test pricing and paywalls. Automatically win back lapsed customers

But before we run such a test, let's take a look at how many users we need to successfully complete it.

Requirements

Let's say we want to test in a country where we have users with more or less the same purchasing power, let's say the USA. We know that in this country the average conversion rate with the current app paywall is 2% and we hope to increase it by 25% (a relative value).

Now let's predict how many users would be needed for such a test and how long it would take.

Example calculation using the AB Tasty calculator

So what are we seeing? Even if we get 1,000 new users per day in the US from Apple Search Ads (which seems very optimistic), it would take us a full 127 days to see a statistically significant change!

If we hadn't modeled the experiment and done the calculations above, we could have run the test and waited forever. In this case, we need to select a smaller number of the most likely hypotheses, run a quick test, and then iteratively run other experiments.

Thus, the design of the experiment itself can significantly affect its outcome and, in some cases, ruin the experiment even before it is actually run.

Apphud supports the ability to test up to 5 independent variations simultaneously. This is sufficient for any reasonable experiment.

Case Study: Something is wrong with our users

Another situation. We regularly run split tests and the results seem suspicious. Sometimes completely strange hypotheses work that is difficult to explain logically, and more correct assumptions do not work.

Of course, the results of experiments don't always fit our logic of assumptions, but still, if the results of a series of experiments don't paint a coherent picture of the project's development and the application's revenue doesn't behave as planned, it makes sense to dig deeper.

This is where A/A testing comes in. The point is that even if we have correctly hypothesized and prepared the experiment, in the case of heterogeneous traffic we can see changes caused by this factor and not by changes in the product.

What is the purpose of these experiments?

We test 2 identical variants (usually default ones) against each other on a user volume equal to the expected sample size for A/B testing. As a result, we should see no statistically significant difference in the metric being tracked. This means that our target audience is behaving approximately the same in the 2 groups and there are no factors that can distort the results of the experiment.

Experiment analytics

When we run experiments, we want to be sure that we have interpreted the results correctly and that we know all the key metrics of the test. Analytics can help us do this.

So, how do you look at these metrics and draw the right conclusions? Conversion metrics work well right after you run an A/B test because you can get quick results. ARPU/ARPPU/ARPAS, on the other hand, are cohort metrics that show subscription revenue growth over time.

For example, if we're testing button color or text, we're primarily looking at trial/subscription conversion rates, but if we're testing different products and their prices, we're interested in long-term ARPU/ARPPU/ARPAS.

LTV analytics for experiment variations in Apphud

Another example: If you increase the price of a subscription, you will certainly have a negative impact on the conversion rate, which could be considered a failure. But we're experimenting with paywalls, among other things, to increase LTV and revenue on a 1-2 year horizon, so if a more expensive product pays off in the long run, you're safe.

Case Study: Example of Product Testing

A great case study on product testing from AEZAKMI Group. Here we see a simple example of an A/B test using Apphud Experiments that compares 2 different products - with and without testing.

Paywalls options for the test

Experiment result, Apphud tool

According to the results, even such a seemingly simple change resulted in an excellent increase in key metrics!

It's easy to see that the conversion rate for a trial start is almost twice that of a monthly subscription.

It's also critical that the trials allow you to track ARPU, a metric that accumulates both conversions and rebills. After all, one subscription may convert very well, but users will immediately cancel it, and another may convert 2 times worse, but at the same time have 4 times the renewal rate of the subscription, which means it will be more profitable in the long run.

In this case, we see an APRU increase of almost 2 times, which is a great result!

Conclusion

In order to have a competitive application in the market, in whatever niche it may be, it is necessary to constantly test many hypotheses and conduct experiments. This is the only way to achieve an excellent ROI and scale the product (and therefore profit).

Apphud's A/B testing tool is ideal for the task of testing different products as well as paywall design or onboarding screen design. Sign up and try now!

Have fun experimenting and growing your apps!

Anton

Head of Product at Apphud

10+ years in mobile. Anton started his career as a backend developer, last 3 years has been working as a product manager. He created a podcast dedicated to software dev. Successfully launched (and sold) his own apps.

Analytics Subscriptions A/B Tests