ingester¶
Take soweego output into Wikidata items.
wikidata_bot¶
A Wikidata bot that adds, deletes, or deprecates referenced statements. Here are typical output examples:
add_identifiers()add_people_statements()- Reference: (based on heuristic, record linkage),`(stated in <https://www.wikidata.org/wiki/Property:P248>`_, Discogs), (Discogs artist ID, 264375), (retrieved, TIMESTAMP)
add_works_statements()- Reference: (based on heuristic, record linkage),`(stated in <https://www.wikidata.org/wiki/Property:P248>`_, Discogs), (Discogs artist ID, 264375), (retrieved, TIMESTAMP)
delete_or_deprecate_identifiers()deletes or deprecates identifier statements.
- soweego.ingester.wikidata_bot.add_identifiers(identifiers, catalog, entity, sandbox)[source]¶
Add identifier statements to existing Wikidata items.
- Parameters
identifiers (
dict) – a{QID: catalog_identifier}dictionarycatalog (
str) –{'discogs', 'imdb', 'musicbrainz', 'twitter'}. A supported catalogentity (
str) –{'actor', 'band', 'director', 'musician', 'producer', 'writer', 'audiovisual_work', 'musical_work'}. A supported entitysandbox (
bool) – whether to perform edits on the Wikidata sandbox 2 item
- Return type
- soweego.ingester.wikidata_bot.add_people_statements(catalog, statements, criterion, sandbox)[source]¶
Add statements to existing Wikidata people.
Statements typically come from validation criteria 2 or 3 as per
soweego.validator.checks.links()andsoweego.validator.checks.bio().- Parameters
catalog (
str) –{'discogs', 'imdb', 'musicbrainz', 'twitter'}. A supported catalogstatements (
Iterable) – iterable of (subject, predicate, value, catalog ID) tuplescriterion (
str) –{'links', 'bio'}. A supported validation criterionsandbox (
bool) – whether to perform edits on the Wikidata sandbox 2 item
- Return type
- soweego.ingester.wikidata_bot.add_works_statements(statements, catalog, sandbox)[source]¶
Add statements to existing Wikidata works.
Statements typically come from
soweego.validator.enrichment.generate_statements().
- soweego.ingester.wikidata_bot.delete_or_deprecate_identifiers(action, catalog, entity, invalid, sandbox)[source]¶
Delete or deprecate invalid identifier statements from existing Wikidata items.
Deletion candidates come from validation criterion 1 as per
soweego.validator.checks.dead_ids().Deprecation candidates come from validation criteria 2 or 3 as per
soweego.validator.checks.links()andsoweego.validator.checks.bio().- Parameters
action (
str) – {‘delete’, ‘deprecate’}catalog (
str) –{'discogs', 'imdb', 'musicbrainz', 'twitter'}. A supported catalogentity (
str) –{'actor', 'band', 'director', 'musician', 'producer', 'writer', 'audiovisual_work', 'musical_work'}. A supported entityinvalid (
dict) – a{invalid_catalog_identifier: [list of QIDs]}dictionarysandbox (
bool) – whether to perform edits on the Wikidata sandbox 2 item
- Return type
mix_n_match_client¶
A client that uploads non-confident links to the Mix’n’match tool for curation.
It inserts data in the catalog and entry tables of the s51434__mixnmatch_p
database located in ToolsDB under the Wikimedia Toolforge infrastructure.
See how to connect.
- soweego.ingester.mix_n_match_client.activate_catalog(catalog_id, catalog, entity)[source]¶
Activate a catalog.
- Parameters
catalog_id (
int) – the catalog id field of the catalog table in the s51434__mixnmatch_p Toolforge databasecatalog (
str) –{'discogs', 'imdb', 'musicbrainz', 'twitter'}. A supported catalogentity (
str) –{'actor', 'band', 'director', 'musician', 'producer', 'writer', 'audiovisual_work', 'musical_work'}. A supported entity
- Return type
- soweego.ingester.mix_n_match_client.add_catalog(catalog, entity)[source]¶
Add or update a catalog.
- Parameters
- Return type
- Returns
the catalog id field of the catalog table in the s51434__mixnmatch_p Toolforge database
- soweego.ingester.mix_n_match_client.add_matches(file_path, catalog_id, catalog, entity, confidence_range)[source]¶
Add or update matches to an existing catalog. Curated matches found in the catalog are kept as is.
- Parameters
file_path (
str) – path to a file with matchescatalog_id (
int) – the catalog id field of the catalog table in the s51434__mixnmatch_p Toolforge databasecatalog (
str) –{'discogs', 'imdb', 'musicbrainz', 'twitter'}. A supported catalogentity (
str) –{'actor', 'band', 'director', 'musician', 'producer', 'writer', 'audiovisual_work', 'musical_work'}. A supported entityconfidence_range (
Tuple[float,float]) – a pair of floats indicating the minimum and maximum confidence scores of matches that will be added/updated.
- Return type