ingester¶
Take soweego output into Wikidata items.
wikidata_bot¶
A Wikidata bot that adds, deletes, or deprecates referenced statements. Here are typical output examples.
add_identifiers()add_people_statements()add_works_statements()delete_or_deprecate_identifiers()deletes or deprecates identifier statements.
-
soweego.ingester.wikidata_bot.add_identifiers(identifiers, catalog, entity, sandbox)[source]¶ Add identifier statements to existing Wikidata items.
- Parameters
identifiers (
dict) – a{QID: catalog_identifier}dictionarycatalog (
str) –{'discogs', 'imdb', 'musicbrainz', 'twitter'}. A supported catalogentity (
str) –{'actor', 'band', 'director', 'musician', 'producer', 'writer', 'audiovisual_work', 'musical_work'}. A supported entitysandbox (
bool) – whether to perform edits on the Wikidata sandbox item
- Return type
None
-
soweego.ingester.wikidata_bot.add_people_statements(statements, sandbox)[source]¶ Add statements to existing Wikidata people.
Statements typically come from validation criteria 2 or 3 as per
soweego.validator.checks.links()andsoweego.validator.checks.bio().- Parameters
statements (
Iterable[+T_co]) – iterable of (subject, predicate, value) triplessandbox (
bool) –whether to perform edits on the Wikidata sandbox item
- Return type
None
-
soweego.ingester.wikidata_bot.add_works_statements(statements, catalog, sandbox)[source]¶ Add statements to existing Wikidata works.
Statements typically come from
soweego.validator.enrichment.generate_statements().- Parameters
statements (
Iterable[+T_co]) – iterable of (work QID, predicate, person QID, person target ID) tuplescatalog (
str) –{'discogs', 'imdb', 'musicbrainz', 'twitter'}. A supported catalogsandbox (
bool) –whether to perform edits on the Wikidata sandbox item
- Return type
None
-
soweego.ingester.wikidata_bot.delete_or_deprecate_identifiers(action, catalog, entity, invalid, sandbox)[source]¶ Delete or deprecate invalid identifier statements from existing Wikidata items.
Deletion candidates come from validation criterion 1 as per
soweego.validator.checks.dead_ids().Deprecation candidates come from validation criteria 2 or 3 as per
soweego.validator.checks.links()andsoweego.validator.checks.bio().- Parameters
action (
str) – {‘delete’, ‘deprecate’}catalog (
str) –{'discogs', 'imdb', 'musicbrainz', 'twitter'}. A supported catalogentity (
str) –{'actor', 'band', 'director', 'musician', 'producer', 'writer', 'audiovisual_work', 'musical_work'}. A supported entityinvalid (
dict) – a{invalid_catalog_identifier: [list of QIDs]}dictionarysandbox (
bool) –whether to perform edits on the Wikidata sandbox item
- Return type
None
mix_n_match_client¶
A client that uploads non-confident links to the Mix’n’match tool for curation.
It inserts data in the catalog and entry tables of the s51434__mixnmatch_p
database located in ToolsDB under the Wikimedia Toolforge infrastructure.
See how to connect.
-
soweego.ingester.mix_n_match_client.activate_catalog(catalog_id, catalog, entity)[source]¶ Activate a catalog.
- Parameters
catalog_id (
int) – the catalog id field of the catalog table in the s51434__mixnmatch_p Toolforge databasecatalog (
str) –{'discogs', 'imdb', 'musicbrainz', 'twitter'}. A supported catalogentity (
str) –{'actor', 'band', 'director', 'musician', 'producer', 'writer', 'audiovisual_work', 'musical_work'}. A supported entity
- Return type
None
-
soweego.ingester.mix_n_match_client.add_catalog(catalog, entity)[source]¶ Add or update a catalog.
- Parameters
- Return type
- Returns
the catalog id field of the catalog table in the s51434__mixnmatch_p Toolforge database
-
soweego.ingester.mix_n_match_client.add_matches(file_path, catalog_id, catalog, entity, confidence_range)[source]¶ Add or update matches to an existing catalog. Curated matches found in the catalog are kept as is.
- Parameters
file_path (
str) – path to a file with matchescatalog_id (
int) – the catalog id field of the catalog table in the s51434__mixnmatch_p Toolforge databasecatalog (
str) –{'discogs', 'imdb', 'musicbrainz', 'twitter'}. A supported catalogentity (
str) –{'actor', 'band', 'director', 'musician', 'producer', 'writer', 'audiovisual_work', 'musical_work'}. A supported entityconfidence_range (
Tuple[float,float]) – a pair of floats indicating the minimum and maximum confidence scores of matches that will be added/updated.
- Return type
None