Mobile text input has provided a fertile field for tinkerers over the past four decades. Flurries of invention and research have followed the introduction of the Touch-Tone phone, the PDA market expanding (and contracting), and the ubiquity of mobile phones with SMS. A new flurry should be expected with the advent of wireless Internet devices.
At least since Graffiti® and T9® made their commercial debuts, a continuous parade of “new and improved” input systems for pocket-sized devices have appeared. MacKenzie and others developed metrics to better compare all of those keypad input methods. [1] Numerous key layouts and input devices have been proposed and measured, with champagne prizes showing that even human performance testing may be open to innovation.
But it seems I still see, every year, at least one graduate student project determining the “optimal arrangement of letters” on a touchscreen or 12-key phone keypad! And last year I received a request for concept validation services from someone developing yet another Mobile Keyboard That Will Change The World.
Is there anything new? I don’t think so. Just old ideas in new clothes, to borrow a phrase. [2]
Means to an End
The ultimate goal of mobile text input is to type, as fast as you can think, on a mobile device with limited real estate. Nearly all existing solutions include attributes that hinder or detract from that goal, and/or feature a user interface that draws some of the user’s attention away from more pressing needs – such as watching where he or she is walking. [3]
(By the way, there are lots of summaries of mobile input methods available. A write-up by I. Scott MacKenzie [4] or Poika Isokoski [5] is a good place to start.)
The problem is, there’s only so many ways to solve the problem. There’s only so many core technologies in your toolbox to work with, and each must trade off maximum potential human performance against the inertia of existing standards and “good enough” solutions. [6]
Referring to the following table, one way to map the mobile text input invention space to date is in two dimensions: a type of input hardware and a disambiguation/correction strategy. (A sample of existing solutions is listed for each combination.)
|
Per Letter |
Per Word |
|
Keypad (a small number of keys)
|
(Including two-key, multi-tap, chording, and N-way key solutions) Touch-Tone variants,”SMART” mode, Data Bank watch, multi-function calculator, Microwriter/Twiddler/ChordTap, data gloves, hat switch |
Bell Labs, T9 et al, SHK
|
|
Keyboard (a number of small keys) |
BlackBerry (QWERTY), Fastap |
XT9 (“SloppyType”) |
|
Touchscreen, trackpad, digitizing tablet |
Tap |
Soft keyboard (QWERTY, OPTI, Metropolis), Dasher |
XT9, iPhone, Android |
Gesture |
Unistrokes/Graffiti, EdgeWrite, Cirrin, Quikwriting |
Transcriber, SHARK, Swype, T9 Trace |
|
Joystick, wheel |
TwoStick , Nokia 7280 spinner, iPod Click Wheel, KeyStick, Dasher, nScribe, EdgeWrite |
SloppyType for joysticks |
|
Tilt accelerometer |
TiltType, TiltText |
Choose one readily available (hand-operated) input device with mobile proportions:
- keypad (a small number of keys) or keyboard (a number of small keys)
- touchscreen or trackpad
- joystick
- tilt accelerometer.
Or, if you’re daring [7], an up-and-coming technology like:
- eye tracking
- virtual reality (data gloves, etc.)
- brainwave detection.
Then choose one disambiguation/correction strategy:
- per-letter (explicit input)
- per-word (ambiguous input).
I’ve grouped the input hardware, and alternatives like speech recognition, for further consideration below. Each disambiguation/correction strategy has its pros and cons.
Per-letter explicit input methods resemble our desktop typing experience: lock-down or correct each letter before moving on. They are mostly deterministic and many are “eyes free” (like the Twiddler [8]). But due to the reduced space these methods trade off speed versus accuracy, or they impose a learning curve on impatient users. And international character sets reveal the naïve simplicity of solutions optimized for only 26 letters.
Per-word ambiguous input, typically using the entire input sequence (so far) to disambiguate and offer the most likely word, offers benefits like automatic letter accenting. But it is challenged by limited space for dictionaries, higher visual attention requirements, sensitivity to spelling errors and typos, and is overall less “intuitive” than per-letter input.
Somewhere in-between, perhaps, is syllable-based input. [9] Mobile text input for Chinese, Japanese, and Korean gravitates towards this middle ground because of the morpho-syllabic [10] nature of Chinese characters, but the stenotype machine is syllable-based as well.
Let’s ignore word/phrase completion [11] and abbreviation expansion. [12] Not only is word completion a well-known technology, it can be applied to almost any input method to reduce the total number of inputs. [13]
Mechanical Keys
Ironically, even after more than a decade of text messaging, keys on mobile devices are still not optimized for text input. [14] But because people still use a mobile phone mostly for voice, manufacturers are obligated to ensure that the dialing digits 1-9 and 0 are simple keypresses. That imposes another constraint on “innovative” mechanical-key-based solutions.
Here are the typical key-based approaches and a few alternatives.
5- to 20-key
Keypads with fewer keys than letters need disambiguation – whether the keypad uses the original Touch-Tone layout, the semi-QWERTY of RIM’s SureType®, optimized letter arrangements like JustType®, the display-mapped keys of TNT, [15] or a one-handed keyboard like the original Microwriter/CyKey and others. [16]
For per-letter explicit input, the methods are:
- Two-key, i.e., press the ABC key and then the 3 key for “c” [17]
- Multi-tap / multi-press / triple-tap, i.e., press the ABC key three times quickly for “c” (typically including timeout, a challenge particularly for older users) [18]
- Chording, i.e., press the ABC key and a (third) auxiliary key simultaneously for “c”.
A further alternative is to allow each key to generate multiple keycodes, e.g., when pressed in a direction other than straight down. But few manufacturers have shipped such a keypad, [19] likely due to its increased cost and lower reliability.
Word-based disambiguation, often called predictive text [20] and appearing on most mobile phones, [21] offers nearly one keypress per letter:
- Press the ABC key followed by the ABC and TUV keys for the word “cat”.
QWERTY [22]
The original BlackBerry keyboard’s design made it possible for at least some large-thumbed executives to type on the tiny keys of a complete QWERTY keyboard. The visual familiarity of its layout gives QWERTY a distinct advantage. [23]
But there are regionality problems: QWERTY is a not-quite-standard standard, including dozens of international variants. [24] Wouldn’t it be a further act of North American hubris to impose the QWERTY layout on the rest of the world?
Stylus or Finger
Okay, let us say that, as far as novelty goes, mechanical keys are tapped out. [25] What about input methods for touch-sensitive surfaces? The iPhone has certainly renewed manufacturer interest in touchscreens for high-end mobile phones, though touch-sensitive input devices still face a few practical challenges. [26]
Text input methods for such devices may have started back in the era of the light pen; here are the most common approaches for smartphones and PDAs.
Touch, Tap, 2D array
The most ubiquitous input method is the soft keyboard version of QWERTY, though various algorithms have produced optimal layouts [27] for a particular language. It is simple hunt-and-peck typing on labeled keys. The very small, discrete targets, subject to Fitts’ Law, result in either slow-and-careful entry or increased error rates, though near-miss letter correction is possible using, e.g., letter tri-grams. [28]
You can also put a phone keypad on a touchscreen, as we did with the TI Avigo [29] and the Philips Nino. Though usable, and consistent across devices, the large keys didn’t offer as much benefit on a PDA that required the use of a stylus. [30]
Applying what we had learned about word-based disambiguation while developing and refining T9 Text Input, we prototyped and developed a word-based auto-correcting system (informally called “SloppyType” but now the basis for XT9® Smart Input) for touchscreen and thumb keyboards and even virtual keyboards. [31]
Gesture, Handwriting
Gestures have long been part of keyboards married to a touch-sensitive device. [32] Synaptics even developed a capacitance trackpad layer allowing block letter writing across the top of a modern no-profile keypad design like the RAZR’s.
Writing is very natural – once you’ve made it through grade school – but relatively slow no matter what technology is used or how accurate the recognition engine is. [33] Digital pens remove the need for a touch-sensitive surface or a digitizing tablet, but it’s still handwriting.
Graffiti, EdgeWrite, and other simplified stroke alphabets [34] allow rates up to twice that of natural handwriting, once the new shapes are committed to memory. Simplifying gesture input even further: an on-screen diagram containing letters, visual cues, and boundaries that lets the novice user employ the method immediately without memorizing a new stroke alphabet. Cirrin and Quikwriting are two examples of this. Both also allow complete words to be entered by stringing the gestures together.
SHARK, now known commercially as ShapeWriter™, does for per-word entry what Cirrin et al did for per-letter entry: it employs a simple framework of rules and visual cues (based on a QWERTY soft keyboard in this case) to get the novice user started and then allows performance to increase with skill and memorization. Swype™ offers similar benefits, though its approach is slightly different. Swype’s disambiguation benefits from more accuracy at each vertex (the location of each letter) whereas SHARK benefits from more accuracy along the path (the shape of the complete gesture).
Though “natural” handwriting recognition on mobile devices was tarnished by the first Newton releases, the full-word write-anywhere Transcriber for Windows CE showed that progress has been made over the years. Early computer-based (offline) handwriting recognition efforts also attempted cursive and shorthand, such as the Pitman or Gregg shorthand systems used for dictation. Like the stenotype, these shorthand systems are phonetic rather than alphabetic and enthusiasm for their performance potential is tempered by a very long learning curve. [35]
Joystick, Tilt, Wheel
Game consoles have gone online and massively multi-player, making text entry more important. If only the game controller is used, the joystick can be employed to select a letter from an-onscreen array; or, in combination with a secondary joystick or other controller keys, a two-step approach, typically choosing one letter set from a marking menu (pie menu) and then one of 4-8 letters in the set. [36]
The advent of tiny, affordable accelerometers produced a litter of “Gee, what can we do with these?” research studies, including selection of specific letters from ambiguous keys in a two-step approach. [37] Or, if you want to drive yourself crazy, dynamic solutions like KeyStick keep you guessing on every tilt and keypress. At least Dasher maintains the same relative position of each letter (while making you feel like you’re playing a videogame).
Nokia’s 7280 model “lipstick” phone features a spinner wheel, a mechanical version of the touch-sensitive Click Wheel on the iPod which provides a similar letter ribbon (known as the date stamp method) for searching music titles. [38]
Can handwriting be emulated with a joystick? A number of people have tried that too. [39] Other gesture-based approaches for input devices like joysticks and trackballs include Isokoski’s device-independent method and further applications of EdgeWrite. [40]
Similarly, we applied the SloppyType word-based auto-correction technology to each of these non-keyboard input devices, using the tilt of the joystick, or the device itself, or the change in direction of a wheel, as an approximate selection of the area surrounding each letter.
Alternatives
What about other approaches? Well, there’s Fastap® – David Levy’s truly novel invention from the early ’90s. It takes advantage of jammed-together keys on small keyboards to naturally encourage chording. [41]
One degree removed from reality, perhaps, but showing some originality: virtual keyboards, such as laser projection keyboards; detection and interpretation of muscular tension, galvanic skin response, or sign language; and others. [42] For example, Senseboard® tries to interpret hand muscle movements representing the finger positions used when typing on a PC keyboard. [43]
Moving away from the constraint of mobile device dimensions, but still enlightening:
- Stenotype machines for court reporting and live captioning. The syllable-based chording keyboard is designed for raw speed. The system becomes personalized as each operator develops shorthand appropriate to the transcription context.
- Assistive technologies, [44] such as orbiTouch™ two-hand chorded input.
And what about extra-thumb solutions? [45]
Speech Recognition
Speech is very natural and very fast. [46] Speech recognition is getting better as technology improves, on desktop systems at least. The low-power processors and limited memory of mobile devices, however, constrain speech recognition accuracy; ambient noise (when away from a closed-door office) is of no help either. To compensate for these limitations, the staff at Tegic (and other recent Nuance acquisitions) explored multi-modal remedies such as combining the results of speech recognition and 12-key input to resolve ambiguities. [47]
The other practical issue for speech recognition is that of privacy. Haven’t you had enough of listening to one side of other peoples’ conversations, on the bus or in the line at Starbucks, as it is? European countries like Finland have shown, though, that social etiquette does adapt to new technology once it becomes ubiquitous, so the issue of privacy may work itself out. [48]
Eye Tracking, etc.
Eye tracking systems are improving, at least for informing usability studies if not also text input. [49] Software is getting better at compensating for normal eye jitter, [50] but desktop systems use multiple infrared emitters and a high-resolution camera and they are always pointed toward the user’s face; a mobile phone is not so fortunate. An easier solution for mobile devices is to position the detector close to the eye. With miniaturization, they could become integral with eyeglasses [51] – simple, if it weren’t for the fact that most people don’t wear glasses unless they have to. [52]
No longer a novelty, a direct interface like brainwave detection would be ideal – truly typing at the speed of thought! The initial research has been promising, even inspiring, for those dealing with severe impairments. [53] It is reasonable to assume that, through biofeedback training, the brain could become even better at generating the signals that can be detected even while the input systems get better at decoding them. [54]
Adapt to Me
Perhaps there’s a different solution, a long-tail [55] solution. One size doesn’t fit all – but one API could. Imagine a Personal Input Device tailored to the abilities and preferences of the user, abetted by Bluetooth and employing simple and secure sync (thus avoiding the PCjr infrared keyboard problem [56] and keylogging). Imagine starting kids early with a good alternative to QWERTY [57] that is guaranteed to be compatible with every data system or kiosk [58] they come in contact with over the course of a day. [59]
Okay, a reality check: anything carried by anyone under the age of 20 – or over the age of 50 for that matter – is going to be lost or misplaced, repeatedly. Therefore, the Personal Input Device has to be inexpensive, [60] and any accumulated data needs to be backed up onto a server and/or PIN-protected on the device.
New is Old, Again
I expected that the next text input breakthrough would be so efficient (and, ideally, one-handed) that people would be willing to learn an unfamiliar layout or technique, and finally(!) give up the QWERTY keyboard on the desktop as well. But, in spite of some progress in speech recognition and the arrival of the “two-thumbed generation”, a high-performance solution for mobile text input has not been developed – or perhaps we have yet to recognize and appreciate it.
Commercial input systems vendors incrementally improve their existing technologies for products that are already on the market and successful. That makes sense; pretty-good solutions often need refinement to make them even better for even more people. Scott Berkun notes that inventions are built upon the work of others, while Peter Denning considers innovation (the adoption of a new practice in a community) more significant than mere technical invention. [61] Concurrently, academic researchers are directing the user studies that objectively measure and compare the various approaches described above, and refine the models that help establish best practices in this field.
But a lot of time and money is wasted reinventing the wheel. [62] Perhaps it’s just that first-year engineering students need to be assigned simple programming exercises, like “How could you arrange the letters on a touchscreen keyboard to reduce stylus travel?”, or an advisor is ensuring that a new graduate student knows how to execute and write up a small research study. Perhaps young entrepreneurs around the world truly think that they are the first to realize that the Touch-Tone keypad is not optimal for text entry.
There is little excuse, however, for not spending a few hours with Google to discover what is out there already before embarking on a “glorious quest” with yet another Keyboard That Will Change The World – especially before wasting other people’s money, and the patent office’s time, on such a futile effort. [63]
My thumb is still waiting…