Jeremy W. Sherman

stay a while, and listen

Notes on QuickCheck

QuickCheck is a nifty library embedded in Haskell for specification-based random testing of programs.

Instead of hand-writing your tests as you would with one of the TestUnit frameworks, you specify computable properties about your program functions. The specification includes a description of how to generate inputs to the property equation. The test framework evaluates the property against a fixed number of inputs produced by the spec’s generator and reports whether the results were correct or not.

This compresses the amount of test code you have to write. It effectively raises the level of discourse from hand-testing input/output pairs or execution traces to making general assertions you could use to reason about your functions.

Random testing is uniquely suited to revealing mistakes in the programmer’s conception of the program, since it can produce inputs you wouldn’t dream of on your own. (Composing generators to produce a big mess of a data structure is pretty nifty, too; hand-writing those inputs is just about the most boring thing ever.)

Reference

Koen Claessen and John Hughes. 2000. QuickCheck: a lightweight tool for random testing of Haskell programs. In Proceedings of the fifth ACM SIGPLAN international conference on Functional programming (ICFP ‘00). ACM, New York, NY, USA, 268-279. DOI=10.1145/351240.351266 http://doi.acm.org/10.1145/351240.351266 

Notes

“Random testing is especially suitable for functional programs because properties can be stated at a fine grain.”

“domain-specific language of testable specifications

test data generation language

“unless specifically stated otherwise, we always quantify over completely defined finite values.”

“the programmer must specify a fixed type at which the law is to be tested”

“+ is associative for the type Int, but not for Double!”

“In some cases, we can use parametricity [17] to argue that a property holds polymorphically.” ([17] is Wadler’s “Theorems for free!” paper.)

“implication combinator” for conditional laws

“the result type of the property is changed from Bool to Property” - handles “failed precondition” state in addition to pass/fail

“The classify combinator does not change the meaning of a law, but it classifies some of the test cases”

collect will gather all values that are passed to it, and print out a histogram”

“risk of [bias towards trivial test cases] every time we use conditional laws, so it is always important to investigate the proportion of trivial cases among those actually tested.”

solution: “replace the condition with a custom data generator” (emphasis added)

“two infinite lists are equal if all finite initial segments are equal”

“in general it is not clear how to formulate and execute properties about structures containing bottom”

“we must rely on the user to provide instances [of the type class Arbitrary] for user-defined types”

“the size bound is simply an extra, global parameter which every test data generator may access; every use of sized sees the same bound”

“use [promote] to produce a generator for a function type, provided we can construct a generator for the result type which somehow depends on the argument value”

“think of coarbitrary as producing a generator transformer from its first argument”

variant n g constructs a generator which transforms the random number seed it is passed in a way depending on n before passing it to g

“mapping each constructor to an independent transformer, and composing these with transformers from each component. Other recursive datatypes can be treated in the same way.”

“it is more convenient to let test data be automatically generated using arbitrary so one is encouraged to make distinctions explicit in types”

“using QuickCheck changes the balance of convenience in the question of introducing new types in programs”

“The most serious pitfall we uncovered with this experiment was the false sense of security that can be engendered when one’s program passes a large number of tests in trivial cases.”

“a typical development cycle is to write down the specification of the circuit first, then make an implementation, QuickCheck it for obvious bugs, and lastly call the external theorem prover for verifying the correctness.”

“we can test more properties than we can formally verify!” because testing is not restricted to first-order logic, which the Lava system’s external theorem prover was restricted to

“A drawback is that we have to fix the types of these circuits” This is the “we have to test with concrete data, y’all” requirement again.

Section 5.4 Pretty Printing describes an interesting way of reaching a working Java implementation: write once functionally, write once using state + exceptions in a functional language, check equivalence, then port the state + exceptions model to an imperative language, generate test inputs, and check the final implementation.

“since the specification refers to the implementation, then the specification module must import the implementation currently under test” which is annoying for different implementations of the same abstract interface (and solved by ML functors)

“‘By taking 20% more points in a random test, any advantage a partition test might have had is wiped out.’” From [11]. Explains why we shouldn’t bother using random testing rather than carefully selected tests or automatically selected tests that meet some coverage criteria.

“often one can check that a programs output is correct much more efficiently than one can compute the output” - see [4] on “result-checking”; QuickCheck is simultaneously more general (can check associative property rather than output correctness) and more directed (exclude testing bad/irrelevant inputs)

grammar-directed testing - “context-free grammars cannot express all the - also one ugly thing to learn, and divorced from your primary language desired properties of test data” (frex def-before-use constraint)

“the Gen monad which QuickCheck is based on is not a monad at all!” - seed use/order matters - “morally” the same wrt distributions, but even still, observable differences in the output - “There is some interesting semantic theory to be done here.”

“What we cannot do is observe non-termination in a test result.” This is in part a limit of the compiler, which reports errors to humans that it won’t to programs.

“one of the major advantages of using QuickCheck is that it encourages us to formulate formal specications, thus improving our understanding of our programs” And QuickCheck provides a practical reason to write specifications where there wasn’t one before.

Errors in practice divide roughly evenly between:

  • bad data generators (useless to find, but necessary for testing to proceed)
  • bad program code (this is why we’re testing!)
  • bad specifications (reveal misconceptions about the program)

Spec writers need libs they wouldn’t for program code, frex finite set theory. Different requirements on the libraries: can use less performant but more concise functions in spec than in main program.

“The major limitation of QuickCheck is that there is no measurement of test coverage: it is up to the user to investigate the distribution of test data and decide whether sufficiently many tests have been run.” (bold emphasis mine)