WHAT WE KNOW ABOUT THE VOYNICH MANUSCRIPT – Sravana Reddy & Kevin Knight

The Voynich manuscript, also referred to as the VMS, is an illustrated medieval folio written in an undeciphered script.”

Since even the basic structure of the text is unknown, it provides a perfect opportunity for the application of unsupervised learning algorithms. Furthermore, while the manuscript has been examined by various scholars, it has much to benefit from attention by a community with the right tools and knowledge of linguistics, text analysis, and machine learning.”

Carbon-dating at the University of Arizona has found that the vellum was created in the 15th century, and the McCrone Research Institute has asserted that the ink was added shortly afterwards.”

We use a machine-readable transcription based on the alphabet proposed by Currier (1976), edited by D’Imperio (1980) and others, made available by the members of the Voynich Manuscript Mailing List (Gillogly and Reeds, 2005) at http://www.voynich.net/reeds/gillogly/voynich.now. The Currier transcription maps the characters to the ASCII symbols A-Z, 0-9, and *.”

Currier (1976) observed from letter and substring frequencies that the text is comprised of two distinct ‘languages’, A and B. Interestingly, the Biological and Stars sections are mainly written in the B language, and the rest mainly in A.”

A more likely explanation is that the script is an abjad, like the scripts of Semitic languages, where all or most vowels are omitted. Indeed, we find that a 2-state HMM on Arabic without diacritics and English without vowels learns a similar

grammar, a*b+.” “The similarity with devoweled scripts, especially Arabic, reinforces the hypothesis that the VMS script may be an abjad.”

mr lkl xplntn s tht th scrpt s n bjd l kth scrpts f smtc lnggs whr ll r mst vwls r mttd ndd w fnd tht tw stt hmm n rbc wtht dcrtcs nd nglsh wtht vwls lrns smlr grmmr

there is most likely no punctuation”

The word frequency distribution follows Zipf’s law, which is a necessary (though not sufficient) test of linguistic plausibility.”

VMS letters are more predictable than other languages, with the predictability increasing sharply given the preceding contexts, similarly to Pinyin [Chinese dialect].”

Several hypotheses about VMS word structure have been proposed. Tiltman (1967) proposed a template consisting of roots and suffixes. Stolfi (2005) breaks down the morphology into ‘prefix-midfix-suffix’, where the letters in the midfixes are more or less disjoint from the letters in the suffixes and prefixes. Stolfi later modified this to a ‘core-mantel-crust’ model, where words are composed of three nested layers.”

order new word

older would phew

dew dare you show

the text has very few repeated word bigrams or trigrams, which is surprising given that the unigram word entropy is comparable to other languages.”

Currier (1976) observed that the distinction between the A and B languages corresponds to two different types of handwriting, implying at least two authors. He claimed that based on finer handwriting analysis, there may have been as many as eight scribes.”

Claims of decipherment of the VMS script have been surfacing for several years, none of which are convincing. Newbold (1928) believed that microscopic irregularities of glyph edges correspond to anagrammed Latin. Feely in 1943 proposed that the script is a code for abbreviated Latin (D’Imperio, 1980). Sherwood (2008) believes that the words are coded anagrams of Italian. Others have hypothesized that the script is an encoding of Ukrainian (Stojko, 1978), English (Strong, 1945; Brumbaugh, 1976), or a Flemish Creole (Levitov, 1987). The word length distribution and other properties have invoked decodings into East Asian languages like Manchu (Banasik, 2004). These theories tend to rely on arbitrary anagramming and substitutions, and are not falsifiable or well-defined.

The mysterious properties of the text and its resistance to decoding have led some to conclude that it is a hoax – a nonsensical string made to look vaguely language-like. Rugg (2004) claims that words might have been generated using a ‘Cardan Grille’ – a way to deterministically generate words from a table of morphemes. However, it seems that the Grille emulates a restricted finite state grammar of words over prefixes, midfixes, and suffixes. Such a grammar underlies many affixal languages, including English. Martin (2008) proposes a method of generating VMS text from anagrams of number sequences. Like the previous paper, it only shows that this method can create VMS-like words – not that it is the most plausible way of generating the manuscript. It is also likely that the proposed scheme can be used to generate any natural language text.”

We have detailed various known properties of the Voynich manuscript text. Some features – the lack of repeated bigrams and the distributions of letters at line-edges – are linguistically aberrant, which others – the word length and frequency distributions, the apparent presence of morphology, and most notably, the presence of page-level topics – conform to natural language-like text.”

Deixe um comentário

Este site utiliza o Akismet para reduzir spam. Saiba como seus dados em comentários são processados.

Descubra mais sobre Seclusão Anagógica

Assine agora mesmo para continuar lendo e ter acesso ao arquivo completo.

Continue lendo