Шеметев А. А.(г. Санкт-Петербург, РФ)
TECHNICAL ANALYSIS OF THE GERMAN LANGUAGE AND ITS ECONOMIC SIGNIFICANCE BY METHODS OF DATA ANALYSIS TO IMPROVE THE QUALITY OF TEACHING
German is an important language of our time.
Mathematical linguistics issues in this aspect affect areas such as the general
use of words and the context of speeches for a certain period. This article
analyzes news, Twitter and blogs in German using the methods of mathematical
linguistics. The text data of the German language is provided by the John
Hopkins University and is available online [1].
The first step was merging all the blogs, twitter and
news in German in one .txt file.
From the position of the mathematical linguistics any
text is a mix of characters which contain words, which, in their turn, create
unigrams, bigrams, trigrams, quatrograms, pentagrams and grams of more advanced
level.
In order to understand what the news, blogs and
Twitter were approximately about – one has to omit the most common words that
have no particular meaning or direct sense. The usual German words that are
wise to be omitted are represented in the inbuilt R function vocabulary.
Then, the meaning keywords that would represent the
meaning of everything told in the news, blogs and Twitter over a specific
period (bad words and some other usual meaningless words are additionally
omitted in prior by the author).
Now we see that half of the German language is covered
by circa 2000 words; circa ¾ of the German language is covered by less than
16 000 words; and 100% of the language is covered by the 125 000
words.
We see the learners of the German language may wisely
concentrate on the most frequently used words of the language and ignore the
words that are not used often in order to improve the quality of mastering the
language. Economically this will demand less human-hours of work in order to
study the material, hence, this may save time for each learner.
REFERENCES
1. The German language news, twitter and blogs database
[electronic resource]:
https://d396qusza40orc.cloudfront.net/dsscapstone/dataset/Coursera-SwiftKey.zip
(date of access: 26.09.2020).
2. Alexander Shemetev Word Recognition. – NY: RPUBS
[electronic resource]: https://rpubs.com/alexshemetev/Word_Recognition_Midterm_Report
(date of access: 26.09.2020).
Комментариев нет:
Отправить комментарий