Understanding screen reader interaction modes

Traduction française

Windows screen readers have multiple modes of interaction, and depending on the task being carried out they’ll automatically switch to the most appropriate mode. This post explains why Windows screen readers behave the way they do, and how your code can influence that behaviour.

Virtual/browse mode

When a document is rendered in the browser, Windows screen readers like JAWS and NVDA access the Document Object Model (DOM) either directly or through the available accessibility APIs. The DOM is a hierarchical representation of the objects in the web-document, and the information that’s retrieved from it is augmented by the screen reader and displayed to the user as a virtual copy of the original.

By creating a virtual copy of the document, screen readers make it possible for blind people to interact with content in ways that would otherwise be impossible on the Windows platform. This happens because the screen reader intercepts most keypresses before they reach the browser, triggering an interaction with the virtual document instead.

For example the left/right cursor keys are intercepted and used to move focus to the previous/next character in the content, and the up/down keys move focus to the previous/next line instead of scrolling the page.

This behaviour also makes it possible to navigate through content using shortcut keys that are native to the screen reader. Most Windows screen readers follow a broadly similar shortcut convention: For example t moves focus to the next table, h to the next heading, l to the next list, g to the next graphic and so forth. It is also possible to open dialogues that list all the elements of a particular type – for example form controls or links.

In JAWS this mode of interaction is known as virtual mode, and in NVDA and Window-Eyes as browse mode. The copy of the original document is generally referred to as the virtual buffer.

Forms/focus mode

Not all keypresses are captured by the screen reader however. When the tab key is pressed it is automatically passed through to the browser where it causes keyboard focus to move to the next piece of interactive content, exactly as though the screen reader weren’t running. The same thing happens in other circumstances too, when the enter key is used to activate a link or the space key to select a checkbox for example.

This intelligent processing happens automatically and without the user being aware of it, but there are circumstances in which the user needs to know about a change of interaction style. When interacting with a text field or combobox the user needs to know that the keys they press will do something other than perform a screen reader navigation command – for example that h will type a character instead of move focus to the next heading, or that the down cursor key will select an option in a combobox instead of move to the next line of content.

In NVDA this mode of interaction is known as focus mode, and in JAWS it’s forms mode. Window-Eyes doesn’t give it a name, but simply refers to browse mode being off. There are subtleties to this mode of interaction though. For example, NVDA will automatically enter/exit focus mode when the tab key is used to move focus on/off the form field, but not if the cursor keys are used. JAWS will automatically enter/exit forms mode whichever method is used to move focus to the field, although as of JAWS16 it’s possible to configure JAWS to ignore forms mode when navigating through content using the cursor keys. Both screen readers can also be forced to switch modes manually, and both indicate the switch in mode with an audible “click”.

There is one anomaly amongst form fields when it comes to forms/focus mode. Although it’s possible to select a radio button without switching modes, it is necessary to be in forms/focus mode in order to use the cursor keys to cycle through the radio buttons in a group. Being unaware of this can sometimes lead to the mistaken belief that a radio group is somehow flawed.

Applications mode

Although this mode switching may seem unintuitive to someone unused to Windows screen readers, it works well in practice and most screen reader users are unaware of what’s happening “under the hood”. From a development point of view it’s far more important to understand something of screen reader mechanics though.

For the most part a screen reader will handle the different interaction modes automatically, providing the underlying code of the original document is robust semantic markup. All bets are off when it comes to custom/rich internet widgets though. A custom widget (like a menubar or tab set) is a web-based component that’s designed to behave like its counterpart in a software application. As a rule Windows screen readers don’t use a virtual buffer with software applications, so putting a custom widget in a web-document suddenly forces two screen reader paradigms into the same space.

A set of tabs is a good example: When interacting with a set of tabs in a software application, the left/right cursor keys cycle between each of the tabs in the set. When a set of tabs is transposed into a web-document the same interaction design pattern is supported by the script that provides the widget’s functionality. Herein lies the challenge though: A Windows screen reader will intercept the left/right keystrokes and use them to move focus within the virtual buffer, instead of passing them through to the browser to interact with the set of tabs.

ARIA (known as WAI-ARIA on formal occasions) is the solution. When certain ARIA roles are applied to custom widgets, they inform the screen reader that the element (or group of elements) has a specific purpose, and also that virtual/browse mode is not appropriate. The result is that the screen reader switches into applications mode and treats the widget as though it were a component of a software application.

To all intents and purposes, applications mode is the same as forms/focus mode – it causes the screen reader to pass keystrokes back through to the browser so they can fulfil their original purpose. For example, when the tablist and tab roles are used as part of the tab widget design pattern, using the tab key to move focus onto the first tab in the set causes a Windows screen reader to automatically switch into applications mode. From that point all the keyboard interaction is handled by the script. This does mean of course that the script driving the functionality of the widget has to be setup to handle keyboard interaction!

With thanks to Hans Hillen.

11 comments on “Understanding screen reader interaction modes”

Skip to Post a comment
  1. Comment by Stomme poes

    Quote: “A set of tabs is a good example: When interacting with a set of tabs in a software application, the left/right cursor keys cycle between each of the tabs in the set… A Windows screen reader will intercept the left/right keystrokes and use them to move focus within the virtual buffer, instead of passing them through to the browser to interact with the set of tabs.”

    How do people go character by character in a Desktop Application tab (like if the tab has a weird word in it), and how does the developer ensure the user can do that on the web page?

    Or can we expect people to force their way out of application mode if they need to (as they’ve been doing for bad dialogs)?

  2. Comment by Marcy Sutton

    This is so timely. Thank you for writing!

    My question is about radio buttons: “Although it’s possible to select a radio button without switching modes, it is necessary to be in forms/focus mode in order to use the cursor keys to cycle through the radio buttons in a group.” This seems like a good case for role=”application” (which I typically avoid). Is it preferred to trigger forms mode as a developer by using such a role, or can we expect AT users to manually change modes? I would think the former would be a more seamless experience.

    1. Comment by Zach sigal

      Thanks for this tutorial, you just have one mistake: when you use stern reader in browse mode the ENTER key does intersept and a click event happens.

  3. Comment by Léonie Watson

    @Stomme poes
    “How do people go character by character in a Desktop Application tab (like if the tab has a weird word in it), and how does the developer ensure the user
    can do that on the web page?”

    In a software application that level of navigation wouldn’t be the norm, but there are ways to do it when nescessary. For example Jaws has the ability to virtualise an application window or dialogue, effectively creating a mini-virtual buffer. NVDA and Jaws also make it possible to use the mouse pointer to inspect what’s on-screen (as opposed to following the PC/carat – which although not th emost efficient way to navigate around gets the job done under certain circumstances.

    On a web page it’s possible to manually exit forms/focus mode and/or applications mode. This is where the onus moves to the screen reader user though – they have to know about the different modes of interaction, and when they might need to switch in/out of them.

  4. Comment by Léonie Watson

    @Marcy Sutton
    “My question is about radio buttons: “Although it’s possible to select a radio button without switching modes, it is necessary to be in forms/focus mode in order to use the cursor keys to cycle through the radio buttons in a group.” This seems like a good case for role=”application” (which I typically avoid). Is it preferred to trigger forms mode as a developer by using such a role, or can we expect AT users to manually change modes? I would think the former would be a more seamless experience.”

    If you use native HTML form inputs then screen readers will handle mode switching automatically. If you use the radio/radio group roles, screen readers should behave the same way – although I haven’t tested thoroughly.

    Unless it’s a fully fledged web apkplication, I’d stick to avoiding applications role as a rule.

  5. Comment by Marcy Sutton

    Thanks, Léonie. I definitely agree that native inputs make everyone’s lives easier–the developers and the users. However, I should have clarified that I was wondering more about custom HTML tags such as Angular.js directives or Web Components.

  6. Comment by Sam

    “Not all keypresses are captured by the screen reader however. When the tab key is pressed it is automatically passed through to the browser where it causes keyboard focus to move to the next piece of interactive content, exactly as though the screen reader weren’t running. The same thing happens in other circumstances too, when the enter key is used to activate a link or the space key to select a checkbox for example.”
    Actually some screen readers will fire the onClick event handler when space or enter is pressed on an element while in virtual/browse mode. This is one reason why an element can be activated via the keyboard but not triggered from the keyboard when a screen reader is running.
    Also check out my post at https://www.ssbbartgroup.com/blog/2013/04/08/how-windows-screen-readers-work-on-the-web

  7. Comment by Gary R

    I am attempting to integrate a voice recognition product called Voiceattack (VA) in a computer that has Freedom Scientific’s Jaws installed. The idea is for a blind operator to speak a command to VA and have VA execute keystrokes or series of keystrokes to accomplish a task.

    VA can give verbal confirmation during its execution of the required keystrokes. Of course Jaws is also giving verbal data resulting in a confused message to the operator. I have attempted to mute Jaws during the task execution and un-mute it when the task is complete. Muting is done in Jaws with a INSSPACE then S key. Jaws does not recognize any keyboard commands sent by VA.

    I have experimented with different options in both VA and Jaws with no success. I am sure the issue relates to the information in the article somehow. The VA keystrokes are not getting to the Jaws interface.

    Any ideas on how I should attack this issue? I’m open to trying anything from a API in VA to keyboard setups in Jaws. I just need some new ideas on what to try next.

    Thanks Gary R

    1. Comment by Léonie Watson

      To mute Jaws you will need to create Jaws scripts for the voice input application. Jaws has a propriatary scripting language:
      http://www.freedomscientific.com/Content/Documents/Other/ScriptManual/01-0_Introduction.htm
      AFAIK it isn’t possible to control Jaws externally.

      If you’re finding that the keyboard commands needed to use the voice input application with Jaws are not working (assuming it is a web based application), it may because the JavaScript is running interference. More on this here:
      http://tink.uk/time-to-revisit-accesskey/

      1. Comment by Gary R

        It appears that the script route seems to be the viable solution to my issue.

        Now I have to get my arms around the scripting language and process logic used by Jaws. Another language to learn and understand but I have been doing computer management and programming for over 40 years so I do have a head start.
        Just have to work with it till the light bulb goes on and all will be fine.

        Thanks for the support, It seems like the UK has a lot of dedicated intelligent individuals like yourself working in this arena to make life a lot easier for a blind person, I applaud your efforts.
        The individual I am working with loves the voice input operations of the computer already. Now I just have to get the rough edges of the integration between the two systems to make the experience more enjoyable.

        1. Comment by Gary R

          Well I made this work using information from various internet searches using a visual basic script. The documentation JAWs provides leaves a lot to be desired, at least for me and I have been programming since 1965. It sure could use a lot more examples in the documentation. So I am not real sure about the Parameters values and the choices I have.

          Sound off

          Option Explicit
          Dim oJaws
          Set oJaws = CreateObject(“FreedomSci.JawsApi”)
          oJaws.SayString ” “, 1
          ‘ Run a function
          oJaws.RunFunction “Speechoff(0)”
          Set oJaws = Nothin

          The nothing “ “ SayString with a 1 is suppose to flush the Jaws speech buffer and then the Speechoff (0) turns the speech off
          Not real sure if that should be 0 Parm or a False word.

          Sound On

          Option Explicit
          Dim oJaws
          Set oJaws = CreateObject(“FreedomSci.JawsApi”)
          oJaws.RunFunction “SpeechOn()”
          Set oJaws = Nothing

          This turns the sound on, again, not real sure what should be in the Parm for the SoundOn function.

          This seems to do what I need, Jaws sound is turned off, the voice control sound is on and not competing with Jaws and then when the voice control accomplishes its task, Sound is turned on. If the voice control task was to open a program, Jaws starts giving detail information about that program. The Programs are very small.

          I tried to use the StopSpeech function but Jaws does not recognize as a valid function. I’m using Jaws 16.0 in a demo mode.

          Gary

Comment on this post

Will not be published
Optional