models
¶
SQLAlchemy Object Relational Mapper (ORM) declarations, implemented as a set of classes.
All class attributes are Column
objects representing
columns of a SQL database table. Data types are detailed in the Attributes
section of each class.
base_entity
¶
Base SQLAlchemy ORM entities.
- class soweego.importer.models.base_entity.BaseEntity(**kwargs)[source]¶
Minimal ORM structure for a target catalog entry. Each ORM entity should inherit this class.
Attributes:
internal_id (integer) - an internal primary key
catalog_id (string(50)) - a target catalog identifier
name (text) - a full name (person), or full title (work)
name_tokens (text) - a name tokenized through
tokenize()
born (date) - a birth (person), or publication (work) date
born_precision (integer) - a birth (person), or publication (work) date precision
died (date) - a death date. Only applies to a person
died_precision (integer) - a death date precision. Only applies to a person
- class soweego.importer.models.base_entity.BaseRelationship(from_catalog_id, to_catalog_id)[source]¶
Minimal ORM structure for a target catalog relationship between entries. Each ORM relationship entity should implement this interface.
You can build a relationship for different purposes: typically, to connect works with people, or groups with individuals.
Attributes:
from_catalog_id (string(50)) - a target catalog identifier
to_catalog_id (string(50)) - a target catalog identifier
base_link_entity
¶
Base SQLAlchemy ORM entity for URLs.
- class soweego.importer.models.base_link_entity.BaseLinkEntity(**kwargs)[source]¶
Minimal ORM structure for a target catalog link/URL. Each ORM link entity should inherit this class.
Attributes:
internal_id (integer) - an internal primary key
catalog_id (string(50)) - a target catalog identifier
url (text) - a full URL
is_wiki (boolean) - whether a URL is a Wiki link or not
url_tokens (text) - a url tokenized through
tokenize()
base_nlp_entity
¶
Base SQLAlchemy ORM entity for textual data that will undergo some natural language processing (NLP).
- class soweego.importer.models.base_nlp_entity.BaseNlpEntity(**kwargs)[source]¶
Minimal ORM structure for a target catalog piece of text. Each ORM NLP entity should inherit this class.
Attributes:
internal_id (integer) - an internal primary key
catalog_id (string(50)) - a target catalog identifier
description (text) - a text describing the main catalog entry
description_tokens (text) - a description tokenized through
tokenize()
discogs_entity
¶
Discogs SQLAlchemy ORM entities.
- class soweego.importer.models.discogs_entity.DiscogsArtistEntity(**kwargs)[source]¶
A Discogs artist: either a musician or a band. It comes from the
_artists.xml.gz
dataset. See the download page.All ORM entities describing Discogs people should inherit this class.
Attributes:
real_name (text) - a name in real life
data_quality (string(20)) - an indicator of data quality
- class soweego.importer.models.discogs_entity.DiscogsGroupEntity(**kwargs)[source]¶
A Discogs group, namely a band.
- born¶
- born_precision¶
- catalog_id¶
- data_quality¶
- died¶
- died_precision¶
- internal_id¶
- name¶
- name_tokens¶
- real_name¶
- class soweego.importer.models.discogs_entity.DiscogsGroupLinkEntity(**kwargs)[source]¶
A Discogs band Web link (URL).
- catalog_id¶
- internal_id¶
- is_wiki¶
- url¶
- url_tokens¶
- class soweego.importer.models.discogs_entity.DiscogsGroupNlpEntity(**kwargs)[source]¶
A Discogs band textual description.
- catalog_id¶
- description¶
- description_tokens¶
- internal_id¶
- class soweego.importer.models.discogs_entity.DiscogsMasterArtistRelationship(from_catalog_id, to_catalog_id)[source]¶
A relationship between a Discogs musical work and the Discogs musician or band who made it.
- from_catalog_id¶
- internal_id¶
- to_catalog_id¶
- class soweego.importer.models.discogs_entity.DiscogsMasterEntity(**kwargs)[source]¶
A Discogs master: a musical work, which can have multiple releases. It comes from the
_masters.xml.gz
dataset. See the download page.Attributes:
main_release_id (string(50)) - a Discogs identifier of the main release for this musical work
genres (text) - a string list of musical genres
- born¶
- born_precision¶
- catalog_id¶
- died¶
- died_precision¶
- internal_id¶
- name¶
- name_tokens¶
- class soweego.importer.models.discogs_entity.DiscogsMusicianEntity(**kwargs)[source]¶
A Discogs musician.
- born¶
- born_precision¶
- catalog_id¶
- data_quality¶
- died¶
- died_precision¶
- internal_id¶
- name¶
- name_tokens¶
- real_name¶
imdb_entity
¶
IMDb SQLAlchemy ORM entities, based on the datasets specifications.
_download page: https://datasets.imdbws.com/
- class soweego.importer.models.imdb_entity.IMDbActorEntity(**kwargs)[source]¶
An IMDb actor.
- born¶
- born_precision¶
- catalog_id¶
- died¶
- died_precision¶
- gender¶
- internal_id¶
- name¶
- name_tokens¶
- occupations¶
- class soweego.importer.models.imdb_entity.IMDbDirectorEntity(**kwargs)[source]¶
An IMDb director.
- born¶
- born_precision¶
- catalog_id¶
- died¶
- died_precision¶
- gender¶
- internal_id¶
- name¶
- name_tokens¶
- occupations¶
- class soweego.importer.models.imdb_entity.IMDbMusicianEntity(**kwargs)[source]¶
An IMDb musician.
- born¶
- born_precision¶
- catalog_id¶
- died¶
- died_precision¶
- gender¶
- internal_id¶
- name¶
- name_tokens¶
- occupations¶
- class soweego.importer.models.imdb_entity.IMDbNameEntity(**kwargs)[source]¶
An IMDb name: a person like an actor, director, producer, etc. It comes from the
name.basics.tsv.gz
dataset. See the download pageAll ORM entities describing IMDb people should inherit this class.
Attributes:
gender (string(10)) - a gender
occupations (string(255)) - a string list of Wikidata occupation QIDs
- class soweego.importer.models.imdb_entity.IMDbProducerEntity(**kwargs)[source]¶
An IMDb producer.
- born¶
- born_precision¶
- catalog_id¶
- died¶
- died_precision¶
- gender¶
- internal_id¶
- name¶
- name_tokens¶
- occupations¶
- class soweego.importer.models.imdb_entity.IMDbTitleEntity(**kwargs)[source]¶
An IMDb title: an audiovisual work like a movie, short, TV series episode, etc. It comes from the
title.basics.tsv.gz
dataset. See the download pageAll ORM entities describing IMDb works should inherit this class.
Attributes:
title_type (string(100)) - an audiovisual work type, like movie or short
primary_title (text) - the most popular title
original_title (text) - a title in the original language
is_adult (boolean) - whether the audiovisual work is for adults or not
runtime_minutes (integer) - a runtime in minutes
genres (string(255)) - a string list of audiovisual genres
- born¶
- born_precision¶
- catalog_id¶
- died¶
- died_precision¶
- internal_id¶
- name¶
- name_tokens¶
musicbrainz_entity
¶
MusicBrainz SQLAlchemy ORM entities, based on the database specifications.
- class soweego.importer.models.musicbrainz_entity.MusicBrainzArtistBandRelationship(from_catalog_id, to_catalog_id)[source]¶
A membership between a MusicBrainz artist and a MusicBrainz band.
- from_catalog_id¶
- internal_id¶
- to_catalog_id¶
- class soweego.importer.models.musicbrainz_entity.MusicBrainzArtistEntity(**kwargs)[source]¶
A MusicBrainz artist, namely a musician.
Attributes:
gender (string(10)) - a gender
birth_place (string(255)) - a birth place
death_place (string(255)) - a death place
- born¶
- born_precision¶
- catalog_id¶
- died¶
- died_precision¶
- internal_id¶
- name¶
- name_tokens¶
- class soweego.importer.models.musicbrainz_entity.MusicBrainzArtistLinkEntity(**kwargs)[source]¶
A MusicBrainz musician Web link (URL).
- catalog_id¶
- internal_id¶
- is_wiki¶
- url¶
- url_tokens¶
- class soweego.importer.models.musicbrainz_entity.MusicBrainzBandEntity(**kwargs)[source]¶
A MusicBrainz band.
Attributes:
birth_place (string(255)) - a place where the band was formed
death_place (string(255)) - a place where the band was disbanded
- born¶
- born_precision¶
- catalog_id¶
- died¶
- died_precision¶
- internal_id¶
- name¶
- name_tokens¶
- class soweego.importer.models.musicbrainz_entity.MusicBrainzBandLinkEntity(**kwargs)[source]¶
A MusicBrainz band Web link (URL).
- catalog_id¶
- internal_id¶
- is_wiki¶
- url¶
- url_tokens¶
- class soweego.importer.models.musicbrainz_entity.MusicBrainzReleaseGroupArtistRelationship(from_catalog_id, to_catalog_id)[source]¶
A relationship between a MusicBrainz musical work and the MusicBrainz musician or band who made it.
- from_catalog_id¶
- internal_id¶
- to_catalog_id¶
mix_n_match
¶
Mix’n’match SQLAlchemy ORM entities for catalogs that need curation.
They follow the catalog
and entry
tables of the s51434__mixnmatch_p
database located in
ToolsDB
under the Wikimedia
Toolforge
infrastructure. See how to
connect.