models

SQLAlchemy Object Relational Mapper (ORM) declarations, implemented as a set of classes.

All class attributes are Column objects representing columns of a SQL database table. Data types are detailed in the Attributes section of each class.

base_entity

Base SQLAlchemy ORM entities.

class soweego.importer.models.base_entity.BaseEntity(**kwargs)[source]

Minimal ORM structure for a target catalog entry. Each ORM entity should inherit this class.

Attributes:

  • internal_id (integer) - an internal primary key

  • catalog_id (string(50)) - a target catalog identifier

  • name (text) - a full name (person), or full title (work)

  • name_tokens (text) - a name tokenized through tokenize()

  • born (date) - a birth (person), or publication (work) date

  • born_precision (integer) - a birth (person), or publication (work) date precision

  • died (date) - a death date. Only applies to a person

  • died_precision (integer) - a death date precision. Only applies to a person

class soweego.importer.models.base_entity.BaseRelationship(from_catalog_id, to_catalog_id)[source]

Minimal ORM structure for a target catalog relationship between entries. Each ORM relationship entity should implement this interface.

You can build a relationship for different purposes: typically, to connect works with people, or groups with individuals.

Attributes:

  • from_catalog_id (string(50)) - a target catalog identifier

  • to_catalog_id (string(50)) - a target catalog identifier

base_nlp_entity

Base SQLAlchemy ORM entity for textual data that will undergo some natural language processing (NLP).

class soweego.importer.models.base_nlp_entity.BaseNlpEntity(**kwargs)[source]

Minimal ORM structure for a target catalog piece of text. Each ORM NLP entity should inherit this class.

Attributes:

  • internal_id (integer) - an internal primary key

  • catalog_id (string(50)) - a target catalog identifier

  • description (text) - a text describing the main catalog entry

  • description_tokens (text) - a description tokenized through tokenize()

discogs_entity

Discogs SQLAlchemy ORM entities.

class soweego.importer.models.discogs_entity.DiscogsArtistEntity(**kwargs)[source]

A Discogs artist: either a musician or a band. It comes from the _artists.xml.gz dataset. See the download page.

All ORM entities describing Discogs people should inherit this class.

Attributes:

  • real_name (text) - a name in real life

  • data_quality (string(20)) - an indicator of data quality

class soweego.importer.models.discogs_entity.DiscogsGroupEntity(**kwargs)[source]

A Discogs group, namely a band.

class soweego.importer.models.discogs_entity.DiscogsGroupLinkEntity(**kwargs)[source]

A Discogs band Web link (URL).

class soweego.importer.models.discogs_entity.DiscogsGroupNlpEntity(**kwargs)[source]

A Discogs band textual description.

class soweego.importer.models.discogs_entity.DiscogsMasterArtistRelationship(from_catalog_id, to_catalog_id)[source]

A relationship between a Discogs musical work and the Discogs musician or band who made it.

class soweego.importer.models.discogs_entity.DiscogsMasterEntity(**kwargs)[source]

A Discogs master: a musical work, which can have multiple releases. It comes from the _masters.xml.gz dataset. See the download page.

Attributes:

  • main_release_id (string(50)) - a Discogs identifier of the main release for this musical work

  • genres (text) - a string list of musical genres

class soweego.importer.models.discogs_entity.DiscogsMusicianEntity(**kwargs)[source]

A Discogs musician.

class soweego.importer.models.discogs_entity.DiscogsMusicianLinkEntity(**kwargs)[source]

A Discogs musician Web link (URL).

class soweego.importer.models.discogs_entity.DiscogsMusicianNlpEntity(**kwargs)[source]

A Discogs musician textual description.

imdb_entity

IMDb SQLAlchemy ORM entities, based on the datasets specifications.

class soweego.importer.models.imdb_entity.IMDbActorEntity(**kwargs)[source]

An IMDb actor.

class soweego.importer.models.imdb_entity.IMDbDirectorEntity(**kwargs)[source]

An IMDb director.

class soweego.importer.models.imdb_entity.IMDbMusicianEntity(**kwargs)[source]

An IMDb musician.

class soweego.importer.models.imdb_entity.IMDbNameEntity(**kwargs)[source]

An IMDb name: a person like an actor, director, producer, etc. It comes from the name.basics.tsv.gz dataset. See the download page

All ORM entities describing IMDb people should inherit this class.

Attributes:

  • gender (string(10)) - a gender

  • occupations (string(255)) - a string list of Wikidata occupation QIDs

class soweego.importer.models.imdb_entity.IMDbProducerEntity(**kwargs)[source]

An IMDb producer.

class soweego.importer.models.imdb_entity.IMDbTitleEntity(**kwargs)[source]

An IMDb title: an audiovisual work like a movie, short, TV series episode, etc. It comes from the title.basics.tsv.gz dataset. See the download page

All ORM entities describing IMDb works should inherit this class.

Attributes:

  • title_type (string(100)) - an audiovisual work type, like movie or short

  • primary_title (text) - the most popular title

  • original_title (text) - a title in the original language

  • is_adult (boolean) - whether the audiovisual work is for adults or not

  • runtime_minutes (integer) - a runtime in minutes

  • genres (string(255)) - a string list of audiovisual genres

class soweego.importer.models.imdb_entity.IMDbTitleNameRelationship(from_catalog_id, to_catalog_id)[source]

A relationship between an IMDb audiovisual work and an IMDb person who took part in it.

class soweego.importer.models.imdb_entity.IMDbWriterEntity(**kwargs)[source]

An IMDb writer.

musicbrainz_entity

MusicBrainz SQLAlchemy ORM entities, based on the database specifications.

class soweego.importer.models.musicbrainz_entity.MusicBrainzArtistBandRelationship(from_catalog_id, to_catalog_id)[source]

A membership between a MusicBrainz artist and a MusicBrainz band.

class soweego.importer.models.musicbrainz_entity.MusicBrainzArtistEntity(**kwargs)[source]

A MusicBrainz artist, namely a musician.

Attributes:

  • gender (string(10)) - a gender

  • birth_place (string(255)) - a birth place

  • death_place (string(255)) - a death place

class soweego.importer.models.musicbrainz_entity.MusicBrainzArtistLinkEntity(**kwargs)[source]

A MusicBrainz musician Web link (URL).

class soweego.importer.models.musicbrainz_entity.MusicBrainzBandEntity(**kwargs)[source]

A MusicBrainz band.

Attributes:

  • birth_place (string(255)) - a place where the band was formed

  • death_place (string(255)) - a place where the band was disbanded

class soweego.importer.models.musicbrainz_entity.MusicBrainzBandLinkEntity(**kwargs)[source]

A MusicBrainz band Web link (URL).

class soweego.importer.models.musicbrainz_entity.MusicBrainzReleaseGroupArtistRelationship(from_catalog_id, to_catalog_id)[source]

A relationship between a MusicBrainz musical work and the MusicBrainz musician or band who made it.

class soweego.importer.models.musicbrainz_entity.MusicBrainzReleaseGroupEntity(**kwargs)[source]

A MusicBrainz release group: a musical work, which is a group of releases.

class soweego.importer.models.musicbrainz_entity.MusicBrainzReleaseGroupLinkEntity(**kwargs)[source]

A MusicBrainz musical work Web link (URL).

mix_n_match

Mix’n’match SQLAlchemy ORM entities for catalogs that need curation.

They follow the catalog and entry tables of the s51434__mixnmatch_p database located in ToolsDB under the Wikimedia Toolforge infrastructure. See how to connect.

class soweego.importer.models.mix_n_match.MnMCatalog(**kwargs)[source]

A Mix’n’match catalog.

class soweego.importer.models.mix_n_match.MnMEntry(**kwargs)[source]

A Mix’n’match entry.