English (EN) Language Data

Inflectional Morphology Data

The Lexical Resource for English contains all the standard inflectional forms for nouns, verbs, adjectives, prepositions, conjunctions, etc.

Derivational Morphology Data

Contains all the standard derivational forms comparatives, superlatives and common compound words.

 

Extended Morphology Data

Contains the result of extending the inflectional and derivational forms lists as a result of considering additional morphological phenomena such as genitive forms and common contractions.

Frequency Indication

Contains the data regarding the relative frequency of appearance for the words in the above lists in the given language.

Each word has been assigned a frequency group, where the frequency group corresponds to a normalized logarithmic scale from 0 to 255. The most frequent word in the corpus has been assigned frequency group 255, and words not appearing in the corpus have been assigned frequency group 0.

 

Complementary Semantic Annotations

 

Named Entities Morphology Data

Contains the data regarding named entities comprising person names, places, companies and organizations.

Offensive Language Flag

Contains information per word indicating if the word might be considered offensive in certain contexts. 

Regional Variants

In addition to the lexical data for English, the Lexical Resource also contains the equivalent lexical data for the following dialects:  

    • British
    • Indian English

Volume of Language Data

lexical-forms-english

Total number of forms

180,000 forms 

    • Verbs: 50,000 forms (28%) 
    • Nouns: 110,000 forms (62%) 
    • Adjectives: 15,000 forms (8%) 
    • Other: 5,000 forms (2%) 
      number-of-lemmas-arabic-lexical

      Total number of lemmas

      60,000 lemmas 

      Features

      Each form is annotated with the lemma (root form), POS, and morphological attributes: tense, person, number, gender, degree and entity-type. 

      h

      Lemma

      The canonical form for the inflected word.

      {

      POS

      Part of Speech such as noun, verb, adjective, etc.

      v

      Voice

      Not applicable.

      +

      Tense

      Specifies when the action takes place such as past, present, future, etc.

      Aspect

      Not applicable.

      Mood

      Not applicable.

      Person

      Verb or pronoun refers to the first, second or third person.

      Number

      State of being singular, dual or plural.

      Gender

      Noun, verb or adjective forms are provided, masculine, feminine, neuter, etc.

      Case

      Not applicable.

      R

      Degree

      An adjective is specified as in its positive, comparative or superlative form.

      l

      Definiteness State

      Not applicable.

      O

      Negative

      Not applicable.

      |

      Contractions

      Shortened form of a word or group of words.

      Pronominal Clitics

      Not applicable.

      w

      Formality

      Not applicable.

      Frequency

      Relative frequency of the form based on a large general-purpose corpus.

      Named Entities

      Pre-defined entities are tagged as person names, places, organization, etc.

      r

      Offensive

      Indicates whether the form might be considered offensive in certain contexts.

      MADRID, SPAIN

      Camino de las Huertas, 20, 28223 Pozuelo
      Madrid, Spain

      SAN FRANCISCO, USA

      541 Jefferson Ave Ste 100, Redwood City
      CA 94063, USA