Organisational-level test strategies and modern approaches – Some thoughts-in-progress
In a recent role I was given permission to share some of my test strategy work publicly. It was largely generic in nature, given its high-level focus, so I’m pleased to be able to share a bunch of notes that may help others orient to the rapidly-changing landscape. What’s not here, and which I *should* include, is my approach to people leadership. This will hopefully be a work in progress and I will update from time to time. If you have any thoughts or criticisms, I’d love to hear from you either directly or in the comments.
The focus here is to provide helpful strategies in a modern context. If I was running my own company, there would certainly be some differences in approach, but I think this a sensible way to view the world right now.
Test Strategy
Guiding principles
Quality is value to some person (Weinberg).
Thinking about Test Strategy – A mnemonic device
Context-driven testing – https://context-driven-testing.com/
Tests support development and demonstrate value to stakeholders
Marick’s test quadrants set an expectation that through tests, developers create the ability to make safe change and continuous delivery of value.
http://www.exampler.com/old-blog/2003/08/21.1.html#agile-testing-project-1
Optimising for flow – Test automation is the primary gate on release
Testing will never be perfect and despite best efforts, there will always be bugs. As such, the ability to quickly gain confidence in a change and safely and repeatably take it to production requires appropriate test automation. Skilled, exploratory testing and independent testing provide review and validation of the automation suites.
Test pyramid
The test pyramid provides a strategy for optimising build time, and is incomplete as a discussion of coverage. See http://www.software-testing.com.au/blog/2018/05/10/on-ui-test-automation/ for considerations when optimising feedback and coverage.
Quality is not a slider
Attempting to use quality as a lever to speed delivery leads to unpredictability. We can break quality down into two categories – Internal and external. Internal quality is those aspects of quality that help the team deliver reliably and predictably. External quality is those aspects of quality valued by the purchasers or users of our software. While we can speed delivery by making tradeoffs in external quality as a result of it being primarily visible as scope, the same is not true of internal quality. Logic errors, poor design, poor test suites lead to slower delivery, late detection of errors and more bugs (https://arxiv.org/abs/2203.04374 ).
Traceability
Tracing to specifications is helpful, but not sufficient. Traditional traceability is an unsolved problem due to the many-many relationship between requirements and tests. We gather evidence to show requirements are met and define requirements as relationships and interactions between humans and the world. See https://nap.nationalacademies.org/catalog/11923/software-for-dependable-systems-sufficient-evidence
Testing confidence is achieved through notions of coverage. Coverage is understood with respect to models.
Excellent testing builds confidence through a process of bringing various models into alignment. Models are formal, informal, explicit and tacit. At a minimum, we expect the models to include –
- Intent, communicated through clear code.
- Corresponding tests that demonstrate compliance with specifications.
- Corresponding tests and/or monitoring that demonstrate business capability and stakeholder value.
- Architectural risks (performance, security, interfaces) with appropriate coverage.
A list of 101 different coverage models can be found at https://www.researchgate.net/publication/243782285_Software_Negligence_Testing_Coverage
‘Good enough’ quality
Perfect testing is impossible, so we must always apply fallible methods for deciding how much to test and when to stop. “Good enough” testing is the goal, which means that the test approach is defensible in the face of the known risks and other constraints such as time, budget, people and resources. We may certainly make the wrong call, but nobody should be punished for making reasonable decisions based on reasonable efforts to establish “good enough”.
- Clear objectives
- Quality measures
- Risks
- Constraints
Independence
Assuming all members of the team have the requisite base level of formal testing skill, independent testers provide value largely through their experience and role-based independent perspective. Independence comes at the expense of flow, and presents another fallible decision we may make when deciding priorities for delivery and quality. Having many different people review and test a change also maximises the probability of finding problems (see Weinberg – https://www.amazon.com/Quality-Software-Management-Systems-Thinking/dp/0932633722).
As a general guideline –
- People performing a tester role on critical systems will maximise their independence in order to maximise the likelihood of finding problems. This may manifest as ‘hardening’ or exploratory test windows in integrated staging environments for new features or significant architectural changes. By applying risk-based approaches to testing and prioritising development appropriately, this testing should not need to significantly delay completion of features.
- “Bug bashes” and mob testing including people outside the team present a low-cost opportunity for independent perspectives.
- Pair/mob programming with regular rotations of pairs is also a strategy for bringing fresh eyes to code and testing, or at least minimising ‘priming’ effects.
- Code review should be considered the bare(ly) minimum standard.
Consider tests that are data and environment independent
Tests that focus on behaviour minimise the amount of implementation detail in tests. Good test design also allows abstraction of test data through the use of environment-specific data providers and externalisation of environment specific configuration.
Shift-left
Front-load critical learning and the biggest risks –
- Validate key product drivers/goals/features
- Validate architecture
- Prevent easily-preventable, expensive mistakes
- Don’t use this as a reason to front-load thinking and analysis.
- Get the big things right then optimise for frequent change.
While RUP as a product was anything but agile, there are a lot of powerful ideas in Kruchten’s book on RUP which call out some key points that most early agile work fails to make explicit (despite the obvious influence). See https://www.researchgate.net/publication/220018149_The_Rational_Unified_Process–An_Introduction for some of the key ideas, or the full treatment – https://www.amazon.com.au/Rational-Unified-Process-Philippe-Kruchten/dp/0321197704
The AUP is also an interesting read, with key ideas presumably making their way into Ambler’s DAD. https://web.archive.org/web/20120214042439/http://www.ambysoft.com/unifiedprocess/agileUP.html
Shift-right
- Identify opportunities to safely test in production
- Parallel/Shadow testing
- Synthetic transactions
- Staged rollouts
- Stage gates
- Identify opportunities to use monitoring as a key part of the test approach
- Flow-based/end-to-end monitoring
- Run-time asserts to ensure system is behaving as expected, including heuristic tests (see https://rbcs-us.com/documents/Why-Most-Unit-Testing-is-Waste.pdf)
Test suites for third party services
- Consider contract-style tests to ensure behaviour we depend on is unchanged, aligned to hexagonal architecture – https://en.wikipedia.org/wiki/Hexagonal_architecture_(software). These tests ideally identify the core capabilities of the organisation and/or value realised through the software system.
Ken Scambler’s presentation may be useful for developers in this context – https://www.youtube.com/watch?v=EaxDl5NPuCA
Consider formal models for stateful testing
Model Based Testing (MBT) – See https://graphwalker.github.io/
BDD as an approach is enhanced by considering MBT principles, providing a way to think about state coverage –
- Given (initial state)
- When (trigger)
- Then (end state)
Outside of ‘test quadrant’ tests, consider that test frameworks can deliver value incrementally and iteratively if considered as a set of chainable functions or composable pieces-
- Test design (MBT, combinatorial generation, all-pairs)
- Manipulation of system state (eg. Snapshot/restore tooling)
- Generation of system activity (ie. Drive the system via UI/APIs)
- Provide/identify test data
- Inspect state
- Validate state
- Reporting
Models for quality
Quality is driven from/by
- Implementation
- Code and Infra
- Product
- Are we building a good/viable product?
- Platform
- Production state
- Deployment
- Infra
- Security
- Performace
- Delivery
- Teams need ‘space’ to do things right. Is delivery pushing too hard? Is it prioritising appropriately for sustainable pace?
- Visibility of quality required as an input to the above questions.
Time
Identify ways to make time-based testing simple.
Infra
- Consider the need to test infra code/config independently of the application itself (Pulumi, Terraform tests – https://medium.com/contino-engineering/terraform-infrastructure-as-code-testing-best-practice-unit-tests-bdd-end-to-end-scenario-c30d5a6921d)
Security
- Automated scans for source and bytecode
- PEN tests periodically or when significant new architectural patterns are introduced.
Accessibility
- Automated checks as part of the build – https://sparkbox.com/foundry/series/automated_accessibility_tool_reviews , https://github.com/dequelabs/axe-core
- What not to do (with a nice list of checks humans should do) – http://matuzo.at/blog/building-the-most-inaccessible-site-possible-with-a-perfect-lighthouse-score/
Governance/Regulatory testing
- Path to prod should include appropriate governance for regulated parts of the business.
- DR testing to be considered.
- Evidence of successful Backup/Restore capability is a frequent requirement.
Food for thought – https://www.youtube.com/watch?v=9Q5MhKUVLkc
https://up.com.au/blog/continuous-delivery-at-up/
https://continuousdelivery.com/2012/07/pci-dss-and-continuous-deployment-at-etsy/
Performance/Stress
- Component test performance ideally addressed by builds. That is, perform heuristic checks that can detect significant changes to performance (faster or slower could both be problems)
- Concurrency tests are ideally performed as part of the build. These are a frequent blind spot for unit and acceptance tests.
In practice
Focus on testing outputs as a part of the release/change governance
- Make clear the intent of the change (good behavioural tests will assist), and provide evidence that –
- It solves the problem
- Does not introduce new problems (eg. Regression tests)
- Meets appropriate internal/external compliance requirements (security, accessibility, privacy, performance, supportability, access control, technology standards, observability/metrics)
- Evidence of appropriate stakeholder engagement
- Provide evidence that the tested code is what is being released to production (ie. Commit id/Git SHA)
- Testing evidence can be linked to a commit
- Change to clearly include tested code
- Identify differences/limitations/risks in the testing (eg. Differences in test data and environments compared to production)
Build speed optimisation
- Parallelisation for slower/longer running tests (Playwright seems the current winner). This requires an appropriate test data strategy.
- Continued human attention to the value provided by different tests.
- Avoid writing lots of tests where the value of a feature is not yet proven. No test is free.
Techniques
High-volume randomised tests are a valuable but rarely used approach.
Everyone testing should understand combinatorics, boolean logic and how to traverse a graph. This allows us to review the foundational model of tests and have confidence in the rote parts of the test process.
Checking your log files can be extremely valuable, ideally in an automated way. Check for unexpected errors and exceptions. If a certain amount must be tolerated, monitor trends/error counts as part of running regular test suites to check nothing new is appearing.
Coverage
- Code coverage – %, branch, techniques. Trends are frequently more important than outright numbers.
- BDD reporting – Transparency regarding product goals and their relationships to stakeholders are critical. Acceptance criteria for stories drive business-facing functional coverage and function as documentation.
- Monitoring and Analytics
- Business flows, open telemetry
Data management
Ideally, consider carefully if production data is to be used in non-production environments.
PII approaches such as pseudonymisation/tokenisation ensures that data copied from production to non-production environments is not usable, but can complicate support and bug fixing. You may therefore need tools for desensitising customer data for test/support.
Consider heuristic checks for PII/password data in logs.
See https://piiano.com/blog/practical-pseudonymization-by-tokenization/
Tools
Open Telemetry/Jaeger/Grafana still seem a work in progress, but ideally there will continue to be opportunities to shift chunks of testing to monitoring, enabling efficiency (test/checks are built once but can run anywhere – test or prod).
Security scans should be considered mandatory these days – For code, and the running/deployed system.
Performance – K6 looks great as an opportunity to share artefacts with other parts of the development and testing process. Makes accessible to the modern development the ideas started in this book https://www.amazon.com.au/Testing-Design-Automation-Frank-Cohen/dp/0131421891
Contract testing – There’s PACT/PACTflow, and schema level checks in cloud providers. But care is required, and maintenance isn’t ideal.
Metrics
A few mandatory ones and some not commonly applied metrics
- Code coverage
- UI coverage (headlamp?)
- Find/Fix rates for bugs
- Incident rates/severities
- Cycle time – At least measure the Incident → Problem → Change → Release time.
- Have Product metrics and share them.
Governance/Audit
- Logs to be designed to be used as sufficient evidence for change success
- Per change, ensure audit trail on evidence for new features
- Logging should be a part of automated test suites, seen as a first-order component
- Immutable logs are likely required in regulated environments (Including regular audit/testing to make sure they continue to be immutable)
CMDB
CMDBs seem like one of those things that should help, but I’ve rarely seen them work. Make sure there are stated CMDB requirements/objectives if you plan to have one.
Capabilities
Content to come. This is about how you manage models of your products and align delivery, change and testing to those. The space is immature still, but the approach used by SerenityBDD should be taken more often. I just wish Gherkin would die.
Risk
- Risk can be viewed as negative value
- Think about preventable risks
- Rules based controls where possible, elimination strategies are best.
- Manage impacts (knowable) over probability (usually unknowable)
- See The Black Swan/Fooled by Randomness
- External risk tools
- Scenario planning – “What must hold true in order for our assumptions to be valid?” – See “Profiting from uncertainty”