Informativity relationship problems

informativity relationship problems

Keywords: lenition, information, informativity, actuation problem, sound change. 1 . To further test the relationship between informativity and lenition, it would. The results also raise the question of the relationship between informativity and An exploration of this problem shows that the two factors are interrelated; we. It's often tough to spot potential relationship problems when you're in the throes of a new love. Heck, you might even see a red flag or two, but.

An experimentally attested finding of this sort should help us arrive at a conclusion that setting strict mechanical rules of Cl and ClCl placement misses the point, in that the information structure of the sentence and not just phonological and syntactic structure is at play in word-order phenomena. The Study Aims and hypotheses The aim of this study is to provide relevant data and to analyze the information structure of sentence types with Cl and clitic ClCl in contemporary Croatian.

For that reason, given both the linguistic and normative account, one would expect sentences with the second placement of Cl and ClCl to be rated highest, and sentences with the placement of Cl and ClCl outside the second position much lower. There were 40 sentences among which there were 9 fillers which were presented to 76 participants.

However, the experiment conducted for this study was designed so as to control for multiple factors, amongst which various phonological, syntactic and se- mantic peculiarities were found to be at play. Silent reading has been attested in numerous contemporary psycholinguistic findings cf.

Informativity of sentence information structure: From what we know this position could be un- derstood phonologically — after the first accented word, or syntactically — after the first syntactic constituent.

It is worth noting that these two notions, phono- logical second position and syntactic second position, sometimes overlap, but are very often realized as distinct positions. Sentence types analyzed in this study mostly fall under two general catego- ries, clitic second and clitic third, with one exception, which is listed as a special case: Clitic second after the first syntactic and prosodic constituent that con- sists of a single word N ; sentence type: Clitic second after the first syntactic constituent that consists of an NP that consists of words or two NPs; sentence types: Clitic second splitting the first syntactic constituent that consists of two words or a complex syntactic constituent consisting of two NPs; sentence types: Clitic third after the first syntactic constituent NP followed by a verb; types of sentences: The special case 3.

Clitic splitting the first complex syntactic constituent in such a way that none of the constraints are satisfied, sentence type: When reporting on the results, I used certain simplification in notation.

The use of informativity in the development of robust viromics-based examinations [PeerJ]

The first reported number stands for the mean value of high school students who rated sentences on a Likert scale from 1 to 7. The second number stands for the mean value of high school students who rated sentences according to the ME methods 8 Cl and ClCl are italicized in sentences.

Translations and glosses are given bellow where I analyze the data. The third number represents the mean value of elementary school students who also rated sentenc- es on a Likert scale from 1 to 7. In what follows I will present the analysis of several examples which give evidence of provocative and unnoticed details that affect judgment ratings, and, as a consequence to that, the theory of the information structure in free word- order languages.

Clitic second after the first prosodic and syntactic constituent that consists of one word is the only instance representing the perfect match between prosody and syntax, because this position satisfies both prosodic and syntactic con- straints. The type of a sentence presented to the subjects was the following one: Although I controlled for several constraints in the sentence design such as the length of the constituent, the number of syllables and the type of the syntac- tic construction in complex NPsI was not aware that the details in sentence design unrelated to the placement of Cl and ClCl can affect the ratings in a sta- tistically significant manner.

Such a difference was never reported in the litera- ture and, to the best of my knowledge, has never been tested for Croatian.

informativity relationship problems

How- ever, two variants of this pattern presented in the questionnaire showed a statis- 10 The results of comparative testing with a 7-point scale and ME are in line with the re- serve voiced by some researchers concerning the availability and relevance of ME for linguistic testing. This point emerges from my data and experimental design. Although ME is, as I already said, claimed to be a fine-grained technique that provides much more reliable results than any other method, and although mean values do not show such a sig- nificant difference, there is a striking discrepancy between variances in experiments that used ME and the 7-point scale.

I will report on these details in a separate study. Sentence 2 was rated significantly higher 6. The sentences presented to the subjects were: In a free word-order language such as Croatian, there is no restriction for the object to precede the subject and one would not expect for these two sentences to be rated differently, although they were.

As we can see, the statistically significant difference between examples 2 and 3 appears across subjects, as well as between elementary school students and high school students. The explanation for the across-subject difference can be found in the fact that in example 3 the direct object was placed initially, while in example 2 it was the pronominal subject that was placed initially.

Di- rect object fronting can be seen as an element that calls for an IS explanation. Being fronted, the direct object is focused because the unmarked word order in Croatian is SVO. From the perspective of language learning and the complexity of the structure, it is especially striking that elementary school children rated sentence 3 so low.

One possible explanation would be that they did not adopt scrambling processes, but for any conclusion in this direction a detailed and carefully planned experiment would be necessary.

Another explanation might be that they have a different way of conceptualizing what was asked in the task, which may, in fact, go hand in hand with language acquisition. However, the available limited data do not allow for the conclusion as to what led to such a significant difference in ratings.

The second subtype in this section is clitic second after the first syntac- tic constituent that consists of an NP that consists of two words or two NPs.

informativity relationship problems

This is a subtype that shows no special features. The only interesting point is that for the placement of Cl or ClCl in this position it does not matter how long or how complex the first constituent is, because there is no significant variation in the ratings.

Because of the fact that this subtype was not expected to be particular in any respect, I included only one example of each pattern in the questionnaire. The results show that all subjects rated the sentences belonging to this subtype as perfectly acceptable, although they are labeled as substandard in Croatian grammar.

Obviously, the length of the first syntactic constituent does not play any role in the rating effects. There are three subsets of this subtype. However, this is not the case. The only group that rated the sen- tence with an explicit contrastive reading slightly higher but not statistically significant than the sentence without contrastive reading, were high school stu- dents who were tested with the use of the ME methodology.

The example presented to the subjects was: Such an example lies Informativity of sentence information structure: A pattern of this kind is: Clitic third Clitic third is one of the most intriguing WO patterns because of frequent claims that Croatian is a rigid Wackernagel type of language in which Cl and ClCl are placed sentence second. The second position is also favored in traditional grammars of Croatian which are for the most part descriptive and to a certain extent normative.

Types of sentences for these cases are: Other grammars do not give similar qualifications.

Informativity of the sentence information structure: word order | Anita Peti-Stantic -

My very clever older sister and her similarly clever and very interesting friend from high school will wait for me. There are several related phenomena that need to be accounted for when ad- dressing the clitic third pattern. Some of them received unexpected acceptability ratings. I recognize two relevant types of clitic third pattern that have to do with the length and complexity of the first syntactic constituent — one of them is real- ized when the first syntactic constituent is simple and short and the other is real- ized when the first syntactic constituent is complex and long.

The first pattern in my data is realized twice with the simple NP and once with an AP followed by a VP which is then followed by a clitic. All of these ex- amples received low ratings, justifying the claims in Croatian grammars.

This might be a sign that when the rating is altogether low, an additional difficulty does not contribute to even lower rating. The ratings did not increase because of the contrastive reading, and remained in the same range as for the examples without contrast 2. If there were no other examples, these sentences would confirm the claims from Croatian grammars. However, numerous counterexamples point the analy- sis toward a less mechanical and more inclusive theory of grammar.

Without a larger body of experimental data it is im- possible to suggest the reason for that. Although it might seem plausible to think that the first syntactic constituent in this sentence was too long for the elementary school students to keep track of, they rated two sentences that are of the same length or even longer as being perfectly acceptable, so this is obviously not a plausible explanation.

Although these are just preliminary results, it has to be said that all other types of clitic third, such as the sentences in which ClCl follow an Adverbial Phrase followed by a VP or the sentences in which Cl or ClCl follow the coordi- nate NP [NP 10719 Conj 10719 Conj 10719] followed by an adverb, also receive high ratings, such as in the examples given below. This pattern is extremely frequent on the Internet and in electronic corpora of the Croatian language, which also speaks in favor of its regular use.

If Croatian is considered as a typical Wackernagel type language, which does not favor clitic third as the standard position, the realization of this pattern would elicit a specific IS per- spective. From this point of view the sentences which contain clitic third should also be expected to receive at least slightly lower ratings, because the specific IS always calls for limited readings and the rating drops.

informativity relationship problems

Since the given sentences did not receive a rating lower than clitic second sentences, and there is no obvious reason to put the clitic in the third position, I Informativity of sentence information structure: Given the significant difference in the ratings be- tween examples 11121314 and examples 1617 and 18one has to take the length of the first syntactic constituent to be the decisive factor.

I will formulate the following new rule for Croatian: Neutral clitic placement in sentences that do not exhibit a perfect prosodic and syn- tactic match within the first constituent can be governed by the proximity to the verb, regardless of its position. The type of sentences I am talking about are examples in which Cl or ClCl splits the first complex syntactic constituent in such a way that does not satisfy the prosodic or syntactic constraint.

This pattern has been labeled ungrammatical in linguistic literature and was not even men- tioned in Croatian grammars, obviously because grammarians treated this pat- tern to be nonexistent. There is no analysis which can, under a strictly formal view of prosodic and syntactic constraints, produce such a result.

In a question- naire there was one example of this pattern without explicit contrastive reading 19 and two examples with explicit contrastive readings 20 and However, we see that this is not the case and that the rating this utterance re- ceived is almost as high as the ratings for the splitting patterns that are claimed to be the prototypical Wackernagel type positions, and significantly higher than the ratings for the clitic third after the simple first constituent pattern.

The results presented in this section are not significantly lower that the re- sults presented for examples 7 and 8 where we see the splitting of the first syntactic constituent after the first prosodic word [NP 10719 ClCl [Adj, N]], or than the results of clitic third with the short and simple first syntactic constituent fol- lowed by the verb examples 11, 12, 13 and I presented the subjects with an additional example of a sentence that should undisputedly be rated as ungrammatical, the source of ungrammaticality lying in the splitting of ClCl.

This has been claimed in the literature to be undisputedly ungrammatical. In this sentence separated clitics were placed third and fifth in the sentence, cf.: Such a situation requires further investigation, but it is already clear that there is much more variability in Cl and ClCl placement than has been recognized until now, as well as that the interplay of word order and the phonological, syntactic and semantic tiers within the grammatical structure heavily relies on the conceptualization of the entire sentence, as well as on sentence subparts.

From this perspective one can find a plausible explanation for the possibility to rate the sentences that violate syntac- tic rules, but not the prosodic rules within a phonological phrase, such as As concerns example 22the explanation might be that one should consider the fact that there is a change in progress in Croatian, which allows Cl and ClCl to be placed second in a phonological phrase much further than merely following the first syntactic or prosodic constituent of a sentence.

The question that still has to remain open is which conditions allow such placement to be felicitous and whether there are any semantic constraints on the grammaticality of the sen- tence.

Conclusion In conclusion, I want to stress two points. Firstly, I must emphasize how im- portant it is never to underestimate the complexity and richness of the repertoire of utterances.

Text linguistics

Secondly, and that is of interest especially for word-order re- search, it should be clear that there are areas of research in which methodologi- cal concerns are at play to a much greater degree than in others.

This has to do with the aforementioned complexity and richness of the repertoire. Gradient and limited acceptability in these cases is not necessarily governed only by the placement of target elements such as clitics and clitic clustersbut also by some other, rarely noticed constraints, be that of prosodic, syntactic or semantic kind, or a combination thereof. Therefore, one should constantly question the empirical status of the re- search on complex grammatical questions such as word order.

It is obvious that, contrary to what has been claimed about word order and clitic placement in Croatian grammars, as well as in the linguistic literature on the matter, we should account for the relative freedom of clitic placement. In contrast to the relative simplicity of data presented in the literature, Cl and ClCl positioning is evidently restricted only by the mere fact that the clitic needs a prosodic constituent to its left on which it can lean.

This gives rise to various segmentations on the sentence level that are gov- erned by the simple prosodic rule that, according to the constraints of prosodic domination, coincides with the phonological phrase Selkirk A fair num- ber of speakers obviously adopt certain prosodic rhythmic and sentence se- mantic groupings that govern their preference to allow the acceptance of these highly unexpected syntactic patterns.

This is possible only if the examples pre- sented to subjects get acceptable readings on the level of sentence semantics. Furthermore, lateral gene transfer is pervasive within phage communities.

As such, the presence of a particular gene may not be indicative of the presence of a particular viral species. Rather, it is just that: To circumvent this limitation, we have developed a new method for the analysis of viral metagenomic datasets.

informativity relationship problems

BLAST hits are weighted, integrating the sequence identity and length of alignments as well as a taxonomic signal, such that each gene is evaluated with respect to its information content. Through this quantifiable metric, predictions of viral community structure can be made with confidence. As a proof-of-concept, the approach presented here was implemented and applied to seven freshwater viral metagenomes.

While providing a robust method for evaluating viral metagenomic data, the tool is versatile and can easily be customized to investigations of any environment or biome. Background Bacterial viruses bacteriophages play a crucial role in shaping microbial populations and processes on a global scale. Nevertheless, from this small and imprecise representation of phage diversity we have uncovered a great deal about their genomes: The majority of phage genes, however, are unfamiliar to us, their function unknown Hatfull, ; Sharon et al.

Nevertheless, as is true of all aspects of microbial diversity in the environment, the significance of the work performed to date does not negate how much there is left to discover. Numerous studies of phage communities spanning a wide variety of environments, from the human gut Minot et al. Thus, whole genome sequencing WGS is widely considered to be the most representative method for exploring viral diversity in the environment.

Bioinformatic approaches for analyzing viral metagenomes largely mirror those used for the study of bacterial and archaeal populations: While comparisons can be made to, e. This approach has been employed frequently e. This approach is employed by many metagenomics-based studies, analytical tools, and metrics e. Homology-based classifications, however, can be misleading due to two factors.

Firstly, phage genomes available in public repositories: Secondly, lateral gene transfer LGT is pervasive within phages communities. There is an abundance of evidence of LGT between phages with similar host ranges, between phages within the same environment, and between phages and their hosts e. Here, we introduce a rigorous method for classifying viromes. Genes exhibiting homology to characterized sequences are weighted based upon their informativity—a new metric for describing viral community structure.

Thus, it is possible to distinguish between genes indicative of a particular taxa and those that are frequently exchanged within viral communities. In addition to presenting the method, we have tested its robustness through the analysis of all individual genera of tailed bacteriophages order: As a proof-of-concept, we examined seven publicly available freshwater DNA metagenomic datasets. The taxonomic signal threshold T is determined through a two-step process prior to evaluation of the metagenomic data.

In the first step, each annotated coding region for a given taxon of interest is compared to all annotated sequences within the genome s of a known relative. Where sequence homology is detected, the sequence identity and query coverage of the match is recorded: S1 and Q1, respectively. Many hits may be recorded for a particular gene x. Figure 1 illustrates the two-step process and the T values produced. S1 and S2 represent the sequence identity of homologies identified in step 1 and 2, respectively.

Likewise, Q1 and Q2 refer to the query coverage of the match detected in step 1 and 2, respectively. Therefore, in this case, the most distant relative belonging to the taxonomic group in step one would be the closest related species. If a particular taxa of interest lacks available genomes capturing the phylogenetic diversity of the species or genus or subfamily, etc.

In addition to the intended purpose of establishing the taxonomic signal threshold, the two-step process can provide insight into putative horizontally acquired elements and gene loss events, e. Using informativity to ascertain confidence in taxonomical calls As indicated in Fig. For a given hit within a metagenomic dataset, the sequence identity and query coverage, SH and QH respectively, is assessed relative to the taxonomic signal threshold T for the gene producing the match.

For example, consider the case in which a novel species, n, within a genus is represented within a metagenome. It shares homology with other genomes for the genus. For the sake of simplicity assume there are two other genomes for the genus: