This page covers the following topics:
» full word
» watch stress
» minimal pairs
» negate
» search in results
» maximal output
» web frequency
» syllable count
» grammar
» systems
» symbols
» analyses
Ticking this box will only return results that take your spelling input as a full word. While bat normally returns battery and acrobat besides bat, with ‘full word’ only bat is matched. So bat the ‘full word’ option is equivalent to #bat#.
The authors are notified of ‘full word’ searches that give no result, ie of words that are missing from the database. We will add the new items if legitimate.
If the spelling input contains a space, that is, if it consists of more than one word, each of these words is searched for as a full word. So if you were looking for Gwyneth Paltrow, CUBE would only find Gwyneth (at the time of writing this sentence), and it would send a report to the authors about the absence of the surname.
This is useful for matching only unstressed vowels (read more…).
CUBE can search for minimal pairs of a given string. For this, you must enter a string in the transcription box. This string will be considered a full word. No sound classes (introduced by !) or regular expressions are allowed in this string. Stress is ignored in minimal pair searches, ie the verb record rəkóːd and the noun record rɛ́koːd are considered a minimal pair. To search for minimal pairs of your input string tick the ‘minimal pairs’ button.
All vowels are interchangeable with each other. Thus fit fɪ́t is a minimal pair of feet fɪ́jt or fought foːt, ie the short vowel ɪ and the diphthong ɪj or the long monophthong oː are interchangeable.
Alternatively, you may give two strings separated by an equals sign (=) or slash mark (/) in the transcription box (remembering also to tick the ‘minimal pairs’ box in the first case, this is not obligatory if you separate the strings by a slash mark). This finds word pairs differing only in the strings specified. The first string may be ‘0’ (zero), in this case you get word pairs where one is longer by the second segment. Currently, e/a only finds cases where these two vowels do not bear a stress mark, to find minimal pairs in which they do (like bet vs bat), you have to input E/A.
Ticking the
Note that the
search | finds | does not find |
---|---|---|
#!~nap ☐ negate | cap, chap, gap, lap… | map, nap |
!n ☑ negate | cap, bag, fast, lab… | can, camp, conquer, nag… |
Ticking this button means that a new search will look only at the results of your last search. This is useful for narrowing down your results with successive searches.
It can be useful to combine this function with the ‘negate’ filter mentioned above.
If you do not limit your search sufficiently, there may be a very large number of matches, the displaying of which could slow down some machines. By default, therefore, CUBE limits results to 300 entries. By inserting a number in the ‘maximum output’ box, you can specify a smaller or greater number.
Note that this does not affect the ‘search in results’ option in any way: all the results are searched for in the next run, irrespective of how many of them are actually shown to you.
Each entry in CUBE is marked with its frequency of occurrence, as acquired from web searches (limited to the top-level domain ‘.uk’). These counts therefore reflect up-to-date usage, though they obviously exhibit some biases (eg towards popular culture and information technology) and are to some degree ephemeral. An additional limitation is that any items with the same spelling, regardless of upper/lower case, are treated as the ‘same word’: for example, the same frequency counts are given for the noun bit and the past-tense verb bit, the name Bill and the noun bill, the verb rip and the abbreviation RIP, the French capital and the Trojan prince Paris, etc. Despite these shortcomings we believe that these figures do provide some guidance regarding word frequency in current BrE.
CUBE groups the frequency counts into ten searchable categories as shown in the following table.
category | occurrence in web search | number of words | % of all words |
---|---|---|---|
9 | 1,000,000,000–1,930,000,000 | 54 | 0.05% |
8 | 100,000,000–999,999,999 | 1,266 | 1.19% |
7 | 10,000,000–99,999,999 | 6,869 | 6.46% |
6 | 1,000,000–9,999,999 | 13,181 | 12.39% |
5 | 100,000–999,999 | 26,292 | 24.72% |
4 | 10,000–99,999 | 35,481 | 33.35% |
3 | 1,000–9,999 | 17,112 | 16.09% |
2 | 100–999 | 5,063 | 4.76% |
1 | 10–99 | 884 | 0.83% |
0 | 1–9 | 178 | 0.17% |
You can search for items of a given frequency by entering a number from 0 to 9 in the ‘web frequency’ box, or a range, eg 7–9 for the most frequent 8% of words. Tick the ‘show’ button to see the approximate figures for each of your result entries. Each of these figures will link to Google Book’s Ngram Viewer, which shows a frequency timeline for the entry in English-language books stored by that service.
If you tick the ‘sort by’ button too, the results will be ordered from most to least frequent, instead of the default alphabetical order. If you display the frequency counts for your results, all these numbers are summed up at the top for your convenience.
You can put numbers in this box to filter the length of words found.
If you open the grammar pane (by clicking on ‘show grammar’) something like the flight deck of a stealth bomber opens up. Don’t be alarmed: you can’t do any harm in CUBE. These grammatical categories, which can be included/excluded in searches, have been adapted from Roger Mitton’s CUVOALD; we plan to simplify them in future versions of CUBE. The categories have short descriptive labels and letter-codes, eg ‘transitive verb (H)’, ‘definite article (R)’. The letter-codes are shown in the rightmost column of search results. Probably the best way to familiarize yourself with these categories is to experiment with the buttons and cross-check the letter-codes in your results. Note that their short description is shown if you hover over the letter-codes.
There are various categories of names (proper nouns), eg personal forenames and names of towns and cities. The catch-all category No includes many surnames, eg Jones, Zuckerberg. There is no separate surname category since many words used as surnames are common nouns (eg smith and brown) or place names (eg Holland and Lincoln). The No category also contains many brand names (eg Nintendo) and some full names (eg Mao Zedong, Darth Vader, Ralph Fiennes), which are commonly used as wholes and/or have pronunciations differing from the most common form of their components (eg Ralph).
There are three buttons before each category. If you tick the first one (☑ ☐ ☐), you get words of that category. If you tick the last button (☐ ☐ ☑), you get words that are not of that category. You can cancel either of these choices by ticking the middle button (☐ ☑ ☐).
You can limit your search to certain grammatical categories by ticking the categories you need. If you enter the spelling thin and use the ‘verb’ filter, you get:
search | finds | does not find |
---|---|---|
☑ ☐ ☐ verb (G-J) | bathing, bethink… | airworthiness, anything… |
☐ ☐ ☑ verb (G-J) | airworthiness, anything… | bathing, bethink… |
☐ ☑ ☐ verb (G-J) | airworthiness, anything, bathing, bethink… |
If you tick two categories you get results if either match. That is, if you tick the ‘verb’ and the ‘adjective’ filters you get both verbs and adjectives in the output.
search | finds | does not find |
---|---|---|
☑ ☐ ☐ verb (G-J) ☑ ☐ ☐ adjective (O) | bathing, labyrinthine… | airworthiness, anything… |
☐ ☐ ☑ verb (G-J) ☐ ☐ ☑ adjective (O) | airworthiness, anything… | bathing, labyrinthine… |
If you need only words that may be used both as a verb and as an adjective, you have to do two searches: first search for verbs, then tick the search in results button and search again for adjectives.
If you close the grammar box, all grammar filters are turned off, and no grammar codes are shown in your results.
A unique feature of CUBE is that it allows some freedom to customize the IPA transcriptions in search results. These options are explained below.
Any options that you select will be retained for further surches until cancelled. If your browser allows cookies to be set, the options of your last search will be active at the beginning of your next session too. (See also our privacy policy.)
CUBE always returns the search results in IPA transcription (in
By ticking this button the results section will include an IPA transcription (in
By ticking this button the results section will include a ‘Gimsonian’ IPA transcription (in
This is based on the re-spelling system used by the BBC Pronunciation Unit for guiding BBC staff. Similar re-spelling systems are used by some dictionaries, Wikipedia, etc. They are based on the conventions of English spelling and don’t require knowledge of special phonetic symbols. This is given in
The re-spelled transcriptions in CUBE are converted from those in the CUBE database, and so will not always be identical those recommended by the BBC Pronunciation Unit.
Note that English re-spelling systems generally use many digraphs (double letters), such as uu and sh for the middle and last sounds in the word push. Hyphens are therefore necessary to prevent the misinterpretation of letter sequences (eg goshawk
While BBC-style respelling uses the conventions of English spelling, this transcription uses those of the spelling for Hungarian in
By ticking this button the results section will include the ASCII transcription that the database uses in
There are two ways to transcribe long monophthongs: by using the IPA length mark, ː, and by doubling the vowel symbol. By default CUBE uses the length mark, but the double-vowel alternative can be selected with this option.
option | result |
---|---|
☐ fɑɑ | airport ɛ́ːpoːt EHpoHt |
☑ fɑɑ | airport ɛ́ɛpoot EHpoHt |
English diphthongs are falling, gliding to weaker endpoints. The offglides can be represented either by the consonantal symbols j and w, or by nonsyllabic vowel symbols. Tick this button to select the latter option.
option | result |
---|---|
☐ gəu | linotype lɑ́jnəwtɑjp lAJnowtaJp |
☑ gəu | linotype lɑ́i̯nəu̯tɑi̯p lAJnowtaJp |
For the palatal sibilants, IPA uses the special symbols ʃ, ʒ, ʧ, and ʤ. The convention of APA (American Phonetic Alphabet) is to put a háček (alias caron) on Latin letters (as in the orthography of some Slavic languages): š, ž, č, and ǰ, and to use y instead of j. This option lets you apply this convention.
option | result |
---|---|
☐ fɪš | short-change ʃoːtʧɛ́jnʤ SOHtCeJnG |
☑ fɪš | short-change šoːtčɛ́ynǰ SOHtCeJnG |
The vowels of FOOT and jury tend to have a slightly opener quality than the beginning of the GOOSE diphthong. By default, CUBE shows this by using distinct IPA symbols: fɵt, ʤɵːɹɪj vs gʉws. However, it is not clear that the distinction between ɵ and ʉ is used contrastively by any language. By ticking one of these buttons you can choose the more economic option of using only ʉ (gʉws, fʉt, ʤʉːɹɪj) or only ɵ (gɵws, fɵt, ʤɵːɹɪj).
If you tick both ‘fʉt’ and ‘gɵws’, the latter option is ignored.
option | result |
---|---|
☐ fʉt ☐ gɵws | cuckoo kɵ́kʉw kUkuW Euronews jɵ́ːrənjʉwz jUHrxnjuWz |
☑ fʉt ☐ gɵws | cuckoo kʉ́kʉw kUkuW Euronews jʉ́ːrənjʉwz jUHrxnjuWz |
☐ fʉt ☑ gɵws | cuckoo kɵ́kɵw kUkuW Euronews jɵ́ːrənjɵwz jUHrxnjuWz |
☑ fʉt ☑ gɵws | cuckoo kʉ́kʉw kUkuW Euronews jʉ́ːrənjʉwz jUHrxnjuWz |
By default ray is transcribed as ɹɛj. If you tick this button, the lower case Roman letter r (which in strict IPA represents a trill) will be used instead: rɛj.
option | result |
---|---|
☐ r | retrograde ɹɛ́tɹəgɹɛjd rEtrxgrejd |
☑ r | retrograde rɛ́trəgrɛjd rEtrxgrejd |
It is customary in the British transcribing tradition to use different symbols for the strut vowel (ʌ) and the last vowel of comma (ə). However, many speakers pronounce them with similar qualities, and they are not strictly contrastrive, the former having by definition a higher level of stress or prominence. The CUBE database retains the distinction: in the ASCII representations, commA is x and the STRUT vowel is y. CUBE’s IPA transcriptions follow the traditional distiction of ʌ and ə by default; but by selecting the ‘ʌ=ə’ option you can display the STRUT vowel too as ə, with the difference represented (as in the Merriam-Webster dictionary of American English) in terms of stress.
option | result |
---|---|
☐ ʌ=ə | unburden ʌ́nbə́ːdən YnbYHdxn |
☑ ʌ=ə | unburden ə́nbə́ːdən YnbYHdxn |
In many contexts fortis (
option | result |
---|---|
☐ mɪsdɛjk | space-time |
☑ mɪsdɛjk | space-time |
By default CUBE shows the unitary nature of diphthongs (and affricates) by placing the pairs of characters closer to each other than other symbols, which are separated by narrow spaces. This option dispenses with the narrow spaces, showing the connection of diphthongal components with an underarch. Nonsyllabic subscripts are turned off when underarches are selected. (Please note that these underarches are misplaced in some browsers. This is a font rendering problem beyond our control.)
option | result |
---|---|
☑ vowel offglides ☐ tie diphthongs | linotype l ɑ́i̯ n əu̯ t ɑi̯ p lAJnowtaJp |
☑ vowel offglides ☑ tie diphthongs | linotype lɑ́͜inə͜utɑ͜ip lAJnowtaJp |
…