
Let's go through the process step by step for determining whether up-ranking Events in Facebook Search is a good idea or not, and how to design an A/B test for this.
1. Clarifying Questions:
What is the primary goal of up-ranking Events in Search? Is it to increase user engagement, event discovery, or something else?
What specific metrics are most important to measure the success of this change? Examples could be click-through rates, engagement rates, conversion rates, etc.
Are there any potential downsides or risks to consider when up-ranking Events in Search?
What is the current process for ranking Events in Search, and how will the proposed change differ?
2. Prerequisites:
Success Metrics: Primary success metrics might include increased click-through rates (CTR) on Events from Search results and higher engagement rates.
Counter Metrics: Counter metrics could include decreased engagement with other content types or decreased overall engagement if users are primarily engaging with Events.
Ecosystem Metrics: Monitor overall user satisfaction, session length, and retention rates, as this change could impact the user experience beyond Events.
Control and Treatment Variants: Control group consists of users experiencing the current ranking algorithm, and treatment group will experience the up-ranking change.
Randomization Units: Individual users would be the randomization units in this experiment.
Null Hypothesis: There is no significant difference in engagement metrics between control and treatment groups after up-ranking Events.
Alternate Hypothesis: Up-ranking Events will lead to a significant increase in engagement metrics compared to the control group.
3. Experiment Design:
Significance Level (α): Typically set at 0.05, representing a 5% chance of rejecting the null hypothesis when it's true.
Practical Significance Level: Determine the smallest effect size that would be considered practically meaningful. This helps avoid detecting statistically significant effects that are too small to be practically valuable.
Power: Usually set at 0.80, representing an 80% chance of detecting a true effect if it exists.
Sample Size: Calculate the required sample size based on the chosen significance level, power, and expected effect size. For example, if a 0.1% increase in CTR is considered meaningful, and the control group CTR is 10%, you might need a sample size of around 16,000 users per group.
Duration: Determine how long the experiment needs to run to collect sufficient data. This depends on the expected rates of user engagement and the required sample size.
4. Running the Experiment:
Ramp-Up Plan: Gradually roll out the up-ranking change to a small percentage of users initially. Monitor the early impact and gather feedback to identify any unexpected issues.
5. Result to Decision:
Basic Sanity Checks: Ensure that both the control and treatment groups are similar in terms of key demographics and behavior before evaluating the primary metrics.
Statistical Test: Conduct a statistical test (such as t-test or chi-squared test) to compare the engagement metrics between the control and treatment groups.
Recommendation: If the p-value is below the significance level, and the effect size is both statistically and practically significant, you may recommend implementing the up-ranking change.
6. Post Launch Monitoring:
Novelty/Primacy Effect: Monitor if any initial excitement or curiosity about the change impacts user behavior. This effect could fade over time.
Network Effect: Observe whether the engagement boost observed in the treatment group spills over to other parts of the platform, leading to increased overall engagement.
Remember, this process is iterative and collaborative. It involves close coordination with the Events team, product managers, designers, and engineers to ensure a well-executed and meaningful experiment.