ingester

Take soweego output into Wikidata items.

wikidata_bot

A Wikidata bot that adds, deletes, or deprecates referenced statements. Here are typical output examples.

add_identifiers()
Reference: stated in, Discogs), (retrieved, TIMESTAMP
add_people_statements()
Reference: stated in, Discogs), (retrieved, TIMESTAMP
add_works_statements()
delete_or_deprecate_identifiers()

deletes or deprecates identifier statements.

soweego.ingester.wikidata_bot.add_identifiers(identifiers, catalog, entity, sandbox)[source]

Add identifier statements to existing Wikidata items.

Parameters
  • identifiers (dict) – a {QID: catalog_identifier} dictionary

  • catalog (str) – {'discogs', 'imdb', 'musicbrainz', 'twitter'}. A supported catalog

  • entity (str) – {'actor', 'band', 'director', 'musician', 'producer', 'writer', 'audiovisual_work', 'musical_work'}. A supported entity

  • sandbox (bool) – whether to perform edits on the Wikidata sandbox item

Return type

None

soweego.ingester.wikidata_bot.add_people_statements(statements, sandbox)[source]

Add statements to existing Wikidata people.

Statements typically come from validation criteria 2 or 3 as per soweego.validator.checks.links() and soweego.validator.checks.bio().

Parameters
  • statements (Iterable[+T_co]) – iterable of (subject, predicate, value) triples

  • sandbox (bool) –

    whether to perform edits on the Wikidata sandbox item

Return type

None

soweego.ingester.wikidata_bot.add_works_statements(statements, catalog, sandbox)[source]

Add statements to existing Wikidata works.

Statements typically come from soweego.validator.enrichment.generate_statements().

Parameters
  • statements (Iterable[+T_co]) – iterable of (work QID, predicate, person QID, person target ID) tuples

  • catalog (str) – {'discogs', 'imdb', 'musicbrainz', 'twitter'}. A supported catalog

  • sandbox (bool) –

    whether to perform edits on the Wikidata sandbox item

Return type

None

soweego.ingester.wikidata_bot.delete_or_deprecate_identifiers(action, catalog, entity, invalid, sandbox)[source]

Delete or deprecate invalid identifier statements from existing Wikidata items.

Deletion candidates come from validation criterion 1 as per soweego.validator.checks.dead_ids().

Deprecation candidates come from validation criteria 2 or 3 as per soweego.validator.checks.links() and soweego.validator.checks.bio().

Parameters
  • action (str) – {‘delete’, ‘deprecate’}

  • catalog (str) – {'discogs', 'imdb', 'musicbrainz', 'twitter'}. A supported catalog

  • entity (str) – {'actor', 'band', 'director', 'musician', 'producer', 'writer', 'audiovisual_work', 'musical_work'}. A supported entity

  • invalid (dict) – a {invalid_catalog_identifier: [list of QIDs]} dictionary

  • sandbox (bool) –

    whether to perform edits on the Wikidata sandbox item

Return type

None

mix_n_match_client

A client that uploads non-confident links to the Mix’n’match tool for curation.

It inserts data in the catalog and entry tables of the s51434__mixnmatch_p database located in ToolsDB under the Wikimedia Toolforge infrastructure. See how to connect.

soweego.ingester.mix_n_match_client.activate_catalog(catalog_id, catalog, entity)[source]

Activate a catalog.

Parameters
  • catalog_id (int) – the catalog id field of the catalog table in the s51434__mixnmatch_p Toolforge database

  • catalog (str) – {'discogs', 'imdb', 'musicbrainz', 'twitter'}. A supported catalog

  • entity (str) – {'actor', 'band', 'director', 'musician', 'producer', 'writer', 'audiovisual_work', 'musical_work'}. A supported entity

Return type

None

soweego.ingester.mix_n_match_client.add_catalog(catalog, entity)[source]

Add or update a catalog.

Parameters
  • catalog (str) – {'discogs', 'imdb', 'musicbrainz', 'twitter'}. A supported catalog

  • entity (str) – {'actor', 'band', 'director', 'musician', 'producer', 'writer', 'audiovisual_work', 'musical_work'}. A supported entity

Return type

int

Returns

the catalog id field of the catalog table in the s51434__mixnmatch_p Toolforge database

soweego.ingester.mix_n_match_client.add_matches(file_path, catalog_id, catalog, entity, confidence_range)[source]

Add or update matches to an existing catalog. Curated matches found in the catalog are kept as is.

Parameters
  • file_path (str) – path to a file with matches

  • catalog_id (int) – the catalog id field of the catalog table in the s51434__mixnmatch_p Toolforge database

  • catalog (str) – {'discogs', 'imdb', 'musicbrainz', 'twitter'}. A supported catalog

  • entity (str) – {'actor', 'band', 'director', 'musician', 'producer', 'writer', 'audiovisual_work', 'musical_work'}. A supported entity

  • confidence_range (Tuple[float, float]) – a pair of floats indicating the minimum and maximum confidence scores of matches that will be added/updated.

Return type

None