Difference between revisions of "WikipediaExtracts:N-gram"

From Academic Lecture Transcripts
Jump to: navigation, search
(Created by WPExtractsBot)
 
m (Converted to use new extension InterwikiExtracts))
 
Line 1: Line 1:
 
<div style="text-align: center; font-size:large;">[{{fullurl:wikipedia:{{{1|{{PAGENAME}}}}}}} Go to full Wikipedia article on: {{{1|{{PAGENAME}}}}}]</div>
 
<div style="text-align: center; font-size:large;">[{{fullurl:wikipedia:{{{1|{{PAGENAME}}}}}}} Go to full Wikipedia article on: {{{1|{{PAGENAME}}}}}]</div>
 
''Extracted from Wikipedia'' --  
 
''Extracted from Wikipedia'' --  
{{#WikipediaExtract: {{{1|{{PAGENAME}}}}}|intro = true}}
+
{{#InterwikiExtract: {{{1|{{PAGENAME}}}}}
 +
|wiki=wikipedia
 +
|format=text
 +
|intro=true
 +
}}

Latest revision as of 21:43, 22 February 2022

Go to full Wikipedia article on: N-gram

Extracted from Wikipedia --

An n-gram is a sequence of n adjacent symbols in particular order. The symbols may be n adjacent letters (including punctuation marks and blanks), syllables, or rarely whole words found in a language dataset; or adjacent phonemes extracted from a speech-recording dataset, or adjacent base pairs extracted from a genome. They are collected from a text corpus or speech corpus.

If Latin numerical prefixes are used, then n-gram of size 1 is called a "unigram", size 2 a "bigram" (or, less commonly, a "digram") etc. If, instead of the Latin ones, the English cardinal numbers are furtherly used, then they are called "four-gram", "five-gram", etc. Similarly, using Greek numerical prefixes such as "monomer", "dimer", "trimer", "tetramer", "pentamer", etc., or English cardinal numbers, "one-mer", "two-mer", "three-mer", etc. are used in computational biology, for polymers or oligomers of a known size, called k-mers. When the items are words, n-grams may also be called shingles.

In the context of natural language processing (NLP), the use of n-grams allows bag-of-words models to capture information such as word order, which would not be possible in the traditional bag of words setting.