Flutter Golden Tests That Don’t Flake

Flutter

Flutter golden tests are screenshot-based tests that compare your widgets against a “golden” reference image. When they’re stable, they catch subtle visual regressions. When they’re flaky, they become noise. This guide shows how to make golden tests reliable by controlling fonts, themes, layout, and device settings.

Audience: IntermediateTested on: Flutter 3.x, Dart 3.x, macOS 14 / Windows 11, Android 14 emulators

What makes golden tests flaky?

Golden tests will flake whenever the rendered pixels depend on external or unstable factors. Typical culprits include:

  • Different fonts on different machines (system fonts, missing glyphs, font fallbacks).
  • Theme differences (platform brightness, text scale factor, platform density).
  • Layout jitter from unconstrained widgets, intrinsic sizing, or time-based animations.
  • DevicePixelRatio / surface size varying between environments.

The fix is to lock down the environment: fonts, theme, size, and device pixel ratio should be the same every time.

Lock fonts: always use bundled fonts in tests

Golden tests should never depend on system fonts. Instead, bundle a test font and register it in your test bootstrap.

// test/flutter_test_config.dart
import 'dart:ui' as ui;
import 'package:flutter_test/flutter_test.dart';

Future<void> main(Future<void> Function() testMain) async {
  TestWidgetsFlutterBinding.ensureInitialized();

  // Load a deterministic font for all golden tests.
  final fontData = await rootBundle.load('assets/fonts/Roboto-Regular.ttf');
  final fontLoader = ui.FontLoader('Roboto')
    ..addFont(Future.value(fontData));
  await fontLoader.load();

  await testMain();
}

Then tell flutter test to use this config file:

flutter test --config=test/flutter_test_config.dart

In your golden tests, explicitly use that font family in ThemeData or TextStyle so text metrics are fully deterministic.

Pin themes, locale, and text scale

Render widgets inside a controlled MaterialApp / Theme wrapper instead of relying on platform defaults. For example:

// test/goldens/my_widget_golden_test.dart
import 'package:flutter/material.dart';
import 'package:flutter_test/flutter_test.dart';
import 'package:my_app/my_widget.dart';

Widget wrapForGolden(Widget child) {
  return MaterialApp(
    debugShowCheckedModeBanner: false,
    theme: ThemeData(
      useMaterial3: true,
      fontFamily: 'Roboto',
      textTheme: const TextTheme(
        bodyMedium: TextStyle(fontSize: 14),
      ),
      colorSchemeSeed: const Color(0xFF0066CC),
      brightness: Brightness.light,
    ),
    locale: const Locale('en'),
    builder: (context, widget) {
      return MediaQuery(
        data: MediaQuery.of(context).copyWith(
          textScaleFactor: 1.0,
          boldText: false,
        ),
        child: widget!,
      );
    },
    home: Scaffold(body: child),
  );
}

void main() {
  testWidgets('MyWidget golden', (tester) async {
    await tester.pumpWidget(
      wrapForGolden(const MyWidget()),
    );
    await tester.pumpAndSettle();

    await expectLater(
      find.byType(MyWidget),
      matchesGoldenFile('goldens/my_widget.png'),
    );
  });
}

By wrapping with a fixed theme and MediaQuery, you avoid surprises when CI runs on a different OS or when a developer has a different system text size.

Control surface size and devicePixelRatio

Different surface sizes or pixel ratios produce different images. Set them explicitly at the start of your tests:

final binding = TestWidgetsFlutterBinding.ensureInitialized()
    as TestWidgetsFlutterBinding;

void main() {
  setUp(() {
    binding.window.devicePixelRatioTestValue = 2.0;
    binding.window.physicalSizeTestValue = const Size(800, 1600) * 2.0;
  });

  tearDown(() {
    binding.window.clearDevicePixelRatioTestValue();
    binding.window.clearPhysicalSizeTestValue();
  });

  testWidgets('MyWidget golden', (tester) async {
    await tester.pumpWidget(wrapForGolden(const MyWidget()));
    await tester.pumpAndSettle();

    await expectLater(
      find.byType(MyWidget),
      matchesGoldenFile('goldens/my_widget_800x1600.png'),
    );
  });
}

Using a consistent logical size (for example Size(400, 800) or Size(800, 1600)) and a fixed devicePixelRatio ensures goldens are identical across machines.

Make animations and time deterministic

Any animation that depends on time can cause flaky golden tests. Strategies include:

  • Disable animation in golden mode (e.g., show the “resting” state only).
  • Jump to a specific frame by calling tester.pump(const Duration(...)) until the desired state.
  • Use fake clocks or injected durations instead of DateTime.now(), so goldens don’t change with real time.

For complex components (skeleton loaders, page transitions), design a static snapshot state specifically for golden tests.

Organizing golden files and updating them safely

Keep all golden images under a predictable directory, such as test/goldens/, and use consistent naming:

  • componentName_state_size.png — for example, login_form_empty_400x800.png.
  • Separate directories per feature if the project is large: test/features/auth/goldens/....

When you intentionally change the UI, you will need to update the goldens. Some teams use a script that:

  • Runs golden tests in “update” mode to regenerate expected images.
  • Shows a diff (image diff viewer) during code review so reviewers can confirm the UI change.

To learn more about the underlying framework, see the golden testing section in Flutter’s UI testing docs.

Security & pitfalls

  • Relying on system fonts: Results differ per machine. Always bundle and load fonts for tests.
  • Not fixing devicePixelRatio: CI machines and local machines may use different DPIs. Set explicit test values.
  • Golden files in lossy formats: Use PNG, not JPEG, to avoid compression artifacts.
  • Overusing golden tests: Use them for stable, shared components (buttons, cards, forms). For highly dynamic content, prefer widget tests with semantic assertions.

FAQ

Q: Should I use a helper package like golden_toolkit?

A: Packages like golden_toolkit can simplify configuration (devices, fonts, themes). They are great for larger projects, but the principles are the same: fixed fonts, fixed configs, and deterministic layouts.

Q: How big should the surface size be?

A: Big enough to show the component clearly, but not so big that tiny differences become hard to spot. Common choices are 400×800 or 800×1600 logical pixels depending on the component.

Q: Are golden tests worth the maintenance cost?

A: Yes, if you focus them on reusable design system components and critical flows. They catch regressions that snapshot-less tests simply cannot see, especially spacing, alignment, and theme changes.

Conclusion

Stable Flutter golden tests come from a fully controlled environment: bundled fonts, pinned themes, fixed sizes, and deterministic animations. Once those are locked down, goldens become a powerful safety net for visual regressions instead of a source of random failures.

Updated: 2025-10-30

Comment

Copied title and URL