Automating Mac app screenshots

Feb 27

I’ve had a few people asking me how I automate taking screenshots of my Mac app for imposing PDFs, Octavo (now live on the Mac app store!). So I thought I’d write a few things up.

The app configures itself

It’s helpful if the app knows you’re taking screenshots, so it can be on its best behaviour. The app checks for the launch argument --screenshot-mode in applicationDidFinishLaunching() (via ScreenshotModeController):

// AppDelegate.swift

func applicationDidFinishLaunching(_ aNotification: Notification) {
    // ...

    // Check for screenshot mode
    if ScreenshotModeController.shouldActivate {
      ScreenshotModeController.shared.activate()
      // Don't show onboarding in screenshot mode
      return
    }

    // ...
}

I suppress document restoration and the default untitled document:

func application(_ app: NSApplication, shouldRestoreSecureApplicationState coder: NSCoder) -> Bool {
    if ScreenshotModeController.shouldActivate {
      return false
    }
    return true
}

func applicationShouldOpenUntitledFile(_ sender: NSApplication) -> Bool {
    if ScreenshotModeController.shared.isActive {
      return false
    }
    // ...
}

The ScreenshotModeController detects the launch arguments, hides the dock, creates a document window at a pleasing size (with a 40pt border around the screen edges), and loads a sample PDF:

class ScreenshotModeController {

  static let shared = ScreenshotModeController()

  static var shouldActivate: Bool {
    let args = ProcessInfo.processInfo.arguments
    return args.contains("--screenshot-mode") || args.contains("--screenshot-mode-empty")
  }

  func activate() {
    guard !isActive else { return }
    isActive = true

    // Hide the dock
    NSApp.presentationOptions = [.hideDock]

    // Create the document window after a short delay to ensure UI is ready.
    // This may be a code smell, but it only runs in screenshot mode!
    DispatchQueue.main.asyncAfter(deadline: .now() + 0.5) { [weak self] in
      guard let self = self else { return }
      if self.isEmptyMode {
        self.setupEmptyDocument()
      } else {
        self.setupDocumentWithSamplePDF()
      }
    }
  }

  private func configureWindowFrame(_ window: NSWindow?) {
    guard let window = window, let screen = NSScreen.main else { return }

    let visibleFrame = screen.visibleFrame
    let border: CGFloat = 40
    let bottomBorder: CGFloat = 40 + 33

    let windowFrame = NSRect(
      x: visibleFrame.origin.x + border,
      y: visibleFrame.origin.y + bottomBorder,
      width: visibleFrame.width - (border * 2),
      height: visibleFrame.height - border - bottomBorder
    )

    window.setFrame(windowFrame, display: true)
  }
}

Note the asymmetric bottom border. The screenshots will later be cropped to App Store dimensions, and the crop removes pixels from the bottom. The extra 33pt ensures the window is still in the visible area after cropping.

Octavo uses in-app purchase — some imposition styles show a "TRIAL" banner until Octavo has been unlocked, and there's also a big purchase button in the window titlebar. I don't want to show those on the App Store listing. So the StoreManager also checks for the screenshot launch arguments and treats the app as fully unlocked:

private(set) var isUnlocked: Bool =
    ProcessInfo.processInfo.arguments.contains("--screenshot-mode")
    || ProcessInfo.processInfo.arguments.contains("--screenshot-mode-empty")

Accessibility identifiers

Accessibility identifiers are essentially ways that automation tech can refer to UI controls/elements in your app. They're called 'accessibility' because they're useful for accessibility tech like screenreaders, but they're also useful when using automated UI tests.

The value you put in them is something that uniquely identifies a control. It won't be shown to a user, even if they're using a screen reader.

For example, when I'm setting up accessibility identifiers for the thumbnails in Octavo's page strip. They're all views inside a collection view. On each one, I might do something like this:

view.setAccessibilityIdentifier("pageStrip.page.\(pageNumber - 1)")
view.setAccessibilityElement(true)
view.setAccessibilityRole(.button)

You'll need accessibility identifiers on anything you want your automated tests to be able to click on or interact with.

Writing the UI tests

The actual screenshot capture uses Xcode's XCUITest framework. If you haven't used it before, it's Apple's UI testing framework — it launches your app as a separate process and interacts with it like a user would, finding elements by their accessibility identifiers and clicking on them.

Here's the basic structure. Each test method launches the app, navigates to the state I want to show, then takes a screenshot:

final class OctavoScreenshotTests: XCTestCase {

  var app: XCUIApplication!

  override func setUpWithError() throws {
    continueAfterFailure = false
  }

  override func tearDownWithError() throws {
    app?.terminate()
  }

  private func launchApp(empty: Bool = false) {
    app = XCUIApplication()
    app.launchArguments = empty ? ["--screenshot-mode-empty"] : ["--screenshot-mode"]
    app.launch()

    // Wait for the document window to appear
    let window = app.windows.firstMatch
    XCTAssertTrue(window.waitForExistence(timeout: 10))

    // Hide other apps so they don't photobomb
    app.typeKey("h", modifierFlags: [.command, .option])
  }
}

The launchArguments line is how the screenshot mode launch argument gets passed to the app. XCUITest launches your app as a subprocess, so any arguments you set here end up in ProcessInfo.processInfo.arguments on the app side.

Each test method is straightforward: launch the app, click around to get the UI into the right state, then take a screenshot. Here's a simple one:

func testScreenshot_02_SourcePage_SourceStrip() throws {
    launchApp()

    // Select page 2 in the page strip
    clickElement(withIdentifier: "pageStrip.page.1")

    takeFullScreenshot(named: "02_SourcePage_SourceStrip")
}

And here's one that configures imposition settings before capturing:

func testScreenshot_03_Imposition_PreviewStrip() throws {
    launchApp()

    clickElement(withIdentifier: "pageStrip.page.0")
    clickElement(withIdentifier: "sidebar.impositionType")

    // Set trim finishing to 220x140mm
    configureTrimFinishing(width: 220, height: 140)

    // Switch the page strip to preview mode
    clickElement(withIdentifier: "pageStripMode.preview")

    takeFullScreenshot(named: "03_Imposition_PreviewStrip")
}

The clickElement helper finds an element by accessibility identifier within the main window, waits for it to exist, and clicks it:

private func clickElement(withIdentifier identifier: String) {
    let window = app.windows.firstMatch
    let element = window.descendants(matching: .any)[identifier].firstMatch
    XCTAssertTrue(element.waitForExistence(timeout: 5))
    element.tapUnhittable()
}

You might notice the tapUnhittable() call there — I'll explain that one shortly.

Taking the screenshot

The actual screenshot capture is pleasingly simple:

private func takeFullScreenshot(named name: String) {
    // Wait for animations to finish
    Thread.sleep(forTimeInterval: 2.0)

    let screenshot = XCUIScreen.main.screenshot()
    let attachment = XCTAttachment(screenshot: screenshot)
    attachment.name = name
    attachment.lifetime = .keepAlways
    add(attachment)
}

XCUIScreen.main.screenshot() captures the entire screen — not just the app window but the menu bar and wallpaper too, which is what I want for App Store screenshots. The result is saved as an XCTAttachment into Xcode's test results bundle. Setting .keepAlways is important; otherwise Xcode might discard the attachment if the test passes (the default is to only keep attachments from failing tests).

The 2-second sleep is necessary — I need animations to finish before capturing, and Thread.sleep is the simplest way to do that. Since this code only runs for screenshot generation, I'm not too worried about elegance.

Test naming and ordering

The test methods are named with numbered prefixes: testScreenshot_01_EmptyState, testScreenshot_02_SourcePage_SourceStrip, and so on. The test names become the attachment names and, eventually, the filenames, and these numbers are the order I want them on the website.

I also have a separate test plan file (OctavoScreenshots.xctestplan) that only includes the screenshot test class:

{
  "testTargets" : [
    {
      "parallelizable" : false,
      "selectedTests" : [ "OctavoScreenshotTests" ],
      "target" : { "name" : "OctavoUITests" }
    }
  ]
}

Setting parallelizable to false means the tests run sequentially in a single process. This matters because each test takes over the screen — running them in parallel would be chaos.

The main Octavo.xctestplan used for regular testing does not include the UI test target. This means I can run my normal test suite without accidentally triggering the screenshot tests, which would take over my screen and generally ruin my afternoon.

tapUnhittable()

XCUITest's built-in .tap() method does a hit test before tapping. If the element reports itself as "not hittable" — maybe it's behind another view, or it's a custom control that doesn't play nicely with the accessibility system — the tap silently fails. This is annoying to debug, because the test doesn't necessarily fail; it just doesn't do what you expect.

My workaround is tapUnhittable(), which skips the hit test and taps at the element's centre coordinate directly:

extension XCUIElement {
  public var center: XCUICoordinate {
    coordinate(withNormalizedOffset: CGVector(dx: 0.5, dy: 0.5))
  }

  public func tapUnhittable() {
    XCTContext.runActivity(named: "Tap \(self) by coordinate") { _ in
      center.tap()
    }
  }
}

It computes the element's centre using coordinate(withNormalizedOffset:) and taps that coordinate. The XCTContext.runActivity wrapper is optional but useful — it adds a named entry to the test log, so when things go wrong you can see exactly which tap was attempted.

I use tapUnhittable() everywhere instead of .tap(). To be clear, it would (probably) be possible to fix my UI elements to work with regular tap(). But I haven't bothered to put in the time to diagnose some of the issues, and this way works fine as a workaround!

I also have a tapAt(normalizedX:) variant for hitting specific points within an element — handy for segmented controls where you need to tap the left or right segment:

public func tapAt(normalizedX: CGFloat) {
    XCTContext.runActivity(named: "Tap \(self) at x=\(normalizedX)") { _ in
      coordinate(withNormalizedOffset: CGVector(dx: normalizedX, dy: 0.5)).tap()
    }
}

// Tap the left segment of a two-segment control
previewPaneModeControl.tapAt(normalizedX: 0.25)

// Tap the right segment
previewPaneModeControl.tapAt(normalizedX: 0.75)

The orchestration script

The UI tests handle capturing screenshots, but there's a lot of surrounding work: setting the clock, configuring the desktop, running the tests, extracting the images, and cropping them. A bash script ties it all together.

Setting the scene

Before running any tests, the script needs the desktop to look presentable. That means a fixed clock time , no desktop icons, and a specific wallpaper.

I went for 24th January for its Mac significance, and 9:41 for similar Apple reasons. (Arguably, since Octavo is a Mac app, I should have picked a time related to the original Mac introduction, but I didn't think anyone would 'get it'.)

Setting the clock can be done through UserDefaults:

defaults write com.apple.menuextra.clock ForceClock -string "2026:01:24:09:41"
killall SystemUIServer

This is a defaults key that forces the menu bar clock to display a fixed time. The killall SystemUIServer restarts the process that draws the menu bar, so the change takes effect immediately.

Hiding desktop icons is similar:

defaults write com.apple.WindowManager StandardHideDesktopIcons -bool true
killall Dock

And for the wallpaper, I use AppleScript to poke at System Settings. This part is fragile — Apple can (and does) rearrange the System Settings UI hierarchy between macOS versions, which breaks the AppleScript. But it works today:

# Open System Settings to the wallpaper pane
open "x-apple.systempreferences:com.apple.Wallpaper-Settings.extension"
sleep 3

osascript << EOF
tell application "System Events"
    tell process "System Settings"
        set popupBtn to pop up button 1 of group 1 of scroll area 1 ¬
            of group 1 of group 3 of splitter group 1 of group 1 of window 1
        click popupBtn
        delay 0.3
        click menu item "Dark (Still)" of menu 1 of popupBtn
    end tell
end tell
tell application "System Settings" to quit
EOF

The script also saves the current appearance (dark/light mode) at the start, so it can restore everything when it's done. This is important because the script is going to switch between dark and light mode for the two sets of screenshots.

Running the tests

The actual test run is a single xcodebuild command:

xcodebuild test \
    -scheme Octavo \
    -testPlan OctavoScreenshots \
    -destination 'platform=macOS,arch=arm64' \
    -resultBundlePath "$PROJECT_DIR/TestResults.xcresult"

The -testPlan OctavoScreenshots flag tells xcodebuild to use the screenshot-specific test plan rather than the default one. The -resultBundlePath flag puts the results bundle in a known location so we can find it later.

The script deliberately does not use set -e (which would make it bail on the first error). If one screenshot test fails, I still want to extract the screenshots from the tests that passed, and I still want to go on and do the light mode run. The script checks the exit code but always continues:

local exit_code=${PIPESTATUS[0]}

if [ $exit_code -ne 0 ]; then
    echo "Some tests failed (exit code: $exit_code)."
    echo "Continuing to extract screenshots from passing tests."
fi

return 0

Extracting the screenshots

The screenshots are stored inside the .xcresult bundle that xcodebuild produces. Getting them out requires xcresulttool:

xcrun xcresulttool export attachments \
    --path "$xcresult" \
    --output-path "$temp_dir"

This exports all test attachments (our screenshots) to a temporary directory, along with a manifest.json that maps filenames to human-readable names. The exported files have UUID-based names, so I use a Python script to parse the manifest and rename them:

with open(os.path.join(temp_dir, "manifest.json")) as f:
    manifest = json.load(f)

for test in manifest:
    for attachment in test.get("attachments", []):
        uuid_name = attachment.get("exportedFileName", "")
        suggested_name = attachment.get("suggestedHumanReadableName", "")

        if uuid_name and suggested_name:
            # suggested_name looks like "01_EmptyState_0_AF006D57-...-.png"
            # We want just "01_EmptyState.png"
            parts = suggested_name.rsplit("_", 2)
            clean_name = parts[0] + ".png" if len(parts) >= 3 else suggested_name

            shutil.copy2(os.path.join(temp_dir, uuid_name),
                         os.path.join(output_dir, clean_name))

Cropping for the App Store

The Mac App Store expects screenshots at 2880×1800 pixels (among a few other possible sizes). My Macbook Air's display captures at 3420×2214, which needs a resize and is not the correct aspect ratio. So the script scales and crops each screenshot using sips, which is macOS's built-in image processing tool:

for file in "$dir"/*.png; do
    # Scale width to 2880px (height scales proportionally to ~1865px)
    sips --resampleWidth 2880 "$file" --out "$file"

    # Crop to 1800px tall, keeping from the top
    sips --cropOffset 1 0 --cropToHeightWidth 1800 2880 "$file" --out "$file"
done

The scale gives us something about 1865px tall, and the crop removes 65px from the bottom. This is why the ScreenshotModeController uses a slightly larger bottom border (40 + 33 = 73pt) — those extra 33pt mean that the window still looks approximately after the bottom is cropped away.

--cropOffset 1 0 anchors the crop to the top-left corner. I'm using 1 0 rather than 0 0 because sips interprets 0 0 as "centre the crop" rather than "offset of zero" for some odd reason.

Both modes, then clean up

The script runs this whole process twice — once for dark mode, once for light mode. Screenshots end up in ~/Desktop/Screenshots/DarkMode/ and ~/Desktop/Screenshots/LightMode/.

When it's done, it restores everything: the original appearance setting, desktop icons, wallpaper, and clock. There's also a trap cleanup EXIT to make sure the clock and desktop icons are restored even if you ctrl+C out of the script partway through (since it was driving me mad while I was trying to get all of this working!).

The sample PDF

The app generates its own sample PDF for the screenshots. ScreenshotPDFGenerator creates an 8-page A5 document at runtime using Core Graphics — a title page, some lorem ipsum in two-column layout, a placeholder image, a pull quote, and some back matter. It's nothing fancy, but it looks plausible in a thumbnail.

Future improvements?

I'm sure my workflow could be better. It's a bit time consuming to run with all the delays, so I could go through and work out exactly where they are needed. (Or even disable animations in the app when running in screenshot mode!) But for now this'll do. My website build script also refers to these screenshots, so whenever I rerun it and then rebuild my site, the screenshots are updated there too.

Amy Worrall