Acceptability tests are notoriously unreliable:
a) if the tests are administered to "informed" (linguistically aware) informants, the results tend to reflect the theoretical biases of the informants; the extreme – but but by no means infrequent - case is that of the linguist tapping his own intuitions with respect to his own idiolect to decide a theoretical issue he is writing a paper about. A good example is Postal, P.M., On the surface verb "remind", Linguistic Inquiry 1,37-120, 1970
b) if they are administered to linguistically naive informants, all sorts of irrelevant parameters creep in, like truth value with respect to the "real" world and the weight of the prescriptive tradition in language teaching.
Still more problematic than
– Can you say X?
– Can you say X to mean Y?
What can we do? A good starting point is provided by S.Greenbaum and R.Quirk, Elicitation Experiments in English, Longman 1970. They replace acceptability tests by operational tests, where the informant has to carry out a manipulation on the sentence whose acceptability is at issue; the manipulation needs to be totally independent of the cause for the possible unacceptability of the test sentence; the degree of success in the carrying out of the manipulation is deemed to reflect the degree of acceptability of the sentence.
Consider the conjunction test. Consider reading 4 of the verb ACCEPT in COBUILD, namely:
– If you accept a story or statement, you say or you show that you consider it to be true.
Examples: The panel accepted Carleson's version of the story... I don't accept that NATO is in disarray.
Suppose for a moment that we suspect that there are two distinct readings lurking behind the - we think - blanket definition provided by COBUILD, and that these two definitions are neatly exemplified by the two examples provided by COBUILD. We therefore want to assess the acceptability of 1 or 2:
1. The panel accepted Carleson's version of the story and that NATO was in disarray.
2. The panel accepted that NATO was in disarray and Carleson's version of the story.
I am not a native speaker of English, but I suspect that 1 is much better than 2, but is not quite OK all the same. The problem is that the difference in acceptability, and the marginal status of 1, have nothing to do with what we are interested in getting at, namely whether ACCEPT 4 in COBUILD should be split into two distinct readings (it shouldn't, but the test does not reveal this). Since my acceptability judgements are pretty unreliable (for reasons to be found above) in general, but still more unreliable for English than for French, I give below the French equivalents of 1 and 2, accompanied by queries and stars, as appropriate :
3.?? Le groupe d'experts admet la version des faits présentée par Carleson et que l'OTAN se portait mal.
4. * Le groupe d'experts admet que l'OTAN se portait mal et la version des faits présentée par Carleson.
The proposed tests rely on the informants' ability to spot, not only acceptability, but also the creative use of language, for jocular or other purposes. Consider 5 and 6:
5. All the banks in this town - except the river banks, of course - will vouch for my credit-worthiness
6. La Meuse – il vaut mieux la lire que se jeter dedans. (La Meuse: a local Liege paper; the river Meuse)
5 and 6 are highly acceptable, but their acceptability should not lead us to believe that BANK and LA MEUSE are monosemic. If it is said in reply that of course they will be disregarded because of the twist in them, the question is: how do we know that there is a twist in them: precisely because they play on the polysemy of BANK and LA MEUSE: back to square one.
The tests were not devised, presumably, to tell us that river banks and money banks should not be conflated. How do the tests behave in difficult cases? Consider LOVEMAKING (with or without hyphen).
– in LDOCE: 2 readings:
1. words or actions expressing love or sexual desire
2. the act of having sex
– in COBUILD: 1 reading:
LOVE-MAKING refers to romantic activities, especially sexual activities, that take place between two people, often including sexual intercourse.
Is LDOCE right, or COBUILD, or neither of them? We have to devise a frame for testing. Suppose we wish to apply the test relying on universal quantification. Part A of the test sentence is easy enough to decide on:
– all lovemaking
– every instance of lovemaking
Part B (the remainder of the sentence) is much more difficult to agree on. First, we wish to avoid, I presume, a metalinguistic part B; if we do not, we could simply copy the definitions provided by the dictionaries, and the test would simply equate reading the dictionary definition and assessing its appropriateness (an absurd result).
If we decide on a part B that is equally applicable to all the readings under consideration (satisfies both the splitters and the lumpers), I find that the test is not really helpful. Consider 7:
7. All lovemaking is pleasant.
Introspect and tell us whether you now know better if you side with LDOCE or with COBUILD.
If we decide on a part B that is applicable to one of the splitters only, we favour it unduly:
8. All lovemaking carries with it the danger of unwanted pregnancy.
(A little digression on the source of the problem with respect to LOVEMAKING: the meaning of MAKE LOVE has evolved from "faire la cour" (Jane Austen and all that ...) to "faire l'amour". LOVEMAKING seems to be caught somewhere in between. Notice the awkwardness in the COBUILD definition: I wonder how many of us would be prepared to agree that sexual activities are a sub-category of romantic activities...)
Let's take another example: we surely want to distinguish between trees (plants) and trees (tree-like representations), if only because we do want to have a usable system of selectional restrictions: we want to be able to state restrictions on the possible objects of FELL and PRINT, for instance:
9. Fell the tree.
10. Print the tree.
11.All trees have a root.
11 applies to both kinds of trees. I can use it to mean (at least I believe I can) that both plant trees and data trees have a root. This would suggest that tree is vague with respect to the two putative readings. Similarly with the conjunction test:
12. The trees in my garden have a root, and so do those I have just drawn to account for the syntactic structure of this sentence.
It seems to me that the question of how we devise the frames for testing (in both the universal quantification test and the conjunction test) is crucial in the assessment of the value of the tests. I am not sure that we have reliable intuitions, except in the obvious cases. In short, I am not sure the tests have any value at all.
As R.Moon puts it (in "The Analysis of Meaning", in Sinclair, J.M (ed), LOOKING UP, Collins ELT, London and Glasgow, 1987, p.86): "there are no final or absolute answers to the question of how many senses words have, or how they should be divided". I suggest that we should not be dismayed by such a lack of "absolute" answers. We should not be dismayed either when we see that the average number of readings per item (lexicographic granularity) increases with dictionary size.
A word or mwu covers a semantic space (which itself can only be delineated in terms of the space covered by the other members of the lexical field the item belongs to) which it is possible to cut in different ways. We must consider the purposes that reading distinction is supposed to help us achieve.
BILINGUAL lexicography has a clear principle to work from, namely the maximization of the semantic isomorphism of the 2 languages under consideration; for example, if both plant tree and data tree translate as ARBRE (and morphologically and syntactically behave the same way...), then it suffices for an English-French dictionary to record the pairing TREE – > ARBRE (simplifying in the extreme: this would be the case if TREE had only the two readings mentioned above).
As soon as we leave bilingual lexicography (either to move to monolingual or to multilingual lexicography), the above principle no longer holds. Monolingual lexicography nowadays stresses the importance of formal criteria (like distinct POS – Part of Speech – or syntactic behaviour – see inter alia P.F.Stock, Polysemy, in Hartmann (ed), LEXeter'83 Proceedings, Max Niemeyer Verlag, Tubingen, 1984), but it should be clear that these criteria can only help to support a distinction that must ultimately be semantic in nature. Otherwise, ACCEPT would have to have the two readings we toyed with above, on the ground that we have two distinct syntactic environments the item can fit in. Even a basic distinction like POS should not necessarily be decisive; it would be much more relevant to find rules which would enable us to derive noun readings from verb readings, and vice-versa, in case only one of the two is recorded in the dictionary.
As a matter of fact, it is possible to find in commercial dictionaries reading distinctions that are established solely on the basis of formal criteria, and do not seem to reflect semantic distinctions. I believe this style of lexicographic work should not be encouraged. A case in point is POWERS in LDOCE:
1. [P] general natural abilities ( Ex: She is 80 years old and very ill, and her powers are failing)
2. [P9] a special combination of natural abilities of a stated type (Ex: When he wrote this book, John was at the height of his powers as a writer)
"Special" and "of a stated type" in reading 2 clearly relate to the 9 in the grammatical code. COBUILD records the phrase "at the height of one's powers", which is probably a better treatment than that offered by LDOCE.
The best we can offer is a very negative principle, the Observable Feature Principle. It is not proposed as a way of discovering whether putative readings should be conflated or kept apart, an issue it cannot help with. It is a very obvious way of determining whether the system – at a given point in its development, i.e. having available a given feature theory – can AFFORD to distinguish two readings.
OBSERVABLE does not mean "that can be read off from the item itself or from its immediate environment". The path from feature value to observable parameters is not always straightforward, and we would deprive ourselves of potentially useful information if we insisted on too direct a link. Consider SUBJECT codes (such as medicine, agriculture, economics/Stock Exchange, etc.). They are often criterial for disambiguation but there certainly is no straightforward way (except by telling the system) of determining whether the topic of a text or part of a text falls under a category recorded by a subject code. All the same, in a text which probably deals with chemistry, ANALYSIS is probably not used in the same reading as in a text which probably deals with psychiatry. We do not need to insist on 100% certainty to operate a choice.
Furthermore, there is the problem that even features with a large set of readily observable correlates, such as POS, can be non-assignable in contexts where none of the correlates of a given value is diagnostic with respect to the other value(s). For instance, there are lots of observable criteria to tell adjectives from nouns in English. However in the context NP [ X N ] for a given string X with POS = adj/n, we have no observable clue through which to assign either n or adj. Consider negative tests (the tests are negative, the tests are based on negatives (ling, phot)). We conclude that observable should be taken to mean "associated with observable parameters ".
Computable from text therefore merely means that there exist texts from which the distinction is computable. We must be careful here. Consider the use of the existence of the derivation ADOPTIF to distinguish, monolingually, between adopter une proposition (* une proposition adoptive) and adopter un enfant ( un enfant adoptif)
What are the chances of finding a text where the existence of the derivation is useful for drawing the distinction? We can think of texts such as:
Il adopta un petit Coréen. Cet enfant adoptif lui apporta certes quelques soucis, mais il donna aussi un sens à son existence.
If we can manage the anaphoric link between "cet enfant adoptif" and "un petit Coréen" (no trivial achievement), we could use the existence of the derivation to disambiguate adopter. However, it should be clear that paradigmatic properties will be much less useful than syntagmatic ones, since text is basically syntagmatic. Besides, paradigmatic relations expressed in text will often cross sentence boundaries, and only a text-based grammar will be able to use them for disambiguation.
The number of observational correlates of the formalized information fields found in machine-tractable dictionaries varies from one type of information to another, and also from exponent to exponent within a given information type. For instance, POS has a broad number of observational correlates, namely all the distributional statements associated with that part of speech in a given grammar, whereas a grammatical code may very well code a property such as countability, which is often neutralized in context (cf. compounds: crocodile observation, crocodile handbag,...)
In fact, observability/computability from text is a cline. We make the hypothesis that the formalized information fields found in machine-tractable dictionaries are often computable/observable from text, as is also suggested by the fact that they code properties often used in NLP systems.
We should take into account the hierarchical relations between the various types of information, i.e. the dependency relations which make information type x non-computable unless information type y is available. To take a simple case, semantic codes associated with verbs in LDOCE apply to deep subjects, objects and complements: they are not computable unless the deep grammatical relations have been retrieved, and are therefore dependent on grammatical codes. Similarly, it is not possible to dissociate grammatical code and part of speech, in so far as grammatical codes include part of speech information
In short, we should rely most heavily of the information types whose degree of computability from text is high. We surmise that POS and syntactic frames (with their semantic annotations, selectional restrictions) are such information types. A major issue is that of measuring computability from text, in order to give priority to those information types that have the greatest chance of being computable, and therefore to lead to disambiguation.
 observable features embody distributional properties that the system is sophisticated enough to compute.