ingester
¶
Take soweego output into Wikidata items.
wikidata_bot
¶
A Wikidata bot that adds, deletes, or deprecates referenced statements. Here are typical output examples:
add_identifiers()
add_people_statements()
- Reference: (based on heuristic, record linkage),`(stated in <https://www.wikidata.org/wiki/Property:P248>`_, Discogs), (Discogs artist ID, 264375), (retrieved, TIMESTAMP)
add_works_statements()
- Reference: (based on heuristic, record linkage),`(stated in <https://www.wikidata.org/wiki/Property:P248>`_, Discogs), (Discogs artist ID, 264375), (retrieved, TIMESTAMP)
delete_or_deprecate_identifiers()
deletes or deprecates identifier statements.
- soweego.ingester.wikidata_bot.add_identifiers(identifiers, catalog, entity, sandbox)[source]¶
Add identifier statements to existing Wikidata items.
- Parameters
identifiers (
dict
) – a{QID: catalog_identifier}
dictionarycatalog (
str
) –{'discogs', 'imdb', 'musicbrainz', 'twitter'}
. A supported catalogentity (
str
) –{'actor', 'band', 'director', 'musician', 'producer', 'writer', 'audiovisual_work', 'musical_work'}
. A supported entitysandbox (
bool
) – whether to perform edits on the Wikidata sandbox 2 item
- Return type
- soweego.ingester.wikidata_bot.add_people_statements(catalog, statements, criterion, sandbox)[source]¶
Add statements to existing Wikidata people.
Statements typically come from validation criteria 2 or 3 as per
soweego.validator.checks.links()
andsoweego.validator.checks.bio()
.- Parameters
catalog (
str
) –{'discogs', 'imdb', 'musicbrainz', 'twitter'}
. A supported catalogstatements (
Iterable
) – iterable of (subject, predicate, value, catalog ID) tuplescriterion (
str
) –{'links', 'bio'}
. A supported validation criterionsandbox (
bool
) – whether to perform edits on the Wikidata sandbox 2 item
- Return type
- soweego.ingester.wikidata_bot.add_works_statements(statements, catalog, sandbox)[source]¶
Add statements to existing Wikidata works.
Statements typically come from
soweego.validator.enrichment.generate_statements()
.
- soweego.ingester.wikidata_bot.delete_or_deprecate_identifiers(action, catalog, entity, invalid, sandbox)[source]¶
Delete or deprecate invalid identifier statements from existing Wikidata items.
Deletion candidates come from validation criterion 1 as per
soweego.validator.checks.dead_ids()
.Deprecation candidates come from validation criteria 2 or 3 as per
soweego.validator.checks.links()
andsoweego.validator.checks.bio()
.- Parameters
action (
str
) – {‘delete’, ‘deprecate’}catalog (
str
) –{'discogs', 'imdb', 'musicbrainz', 'twitter'}
. A supported catalogentity (
str
) –{'actor', 'band', 'director', 'musician', 'producer', 'writer', 'audiovisual_work', 'musical_work'}
. A supported entityinvalid (
dict
) – a{invalid_catalog_identifier: [list of QIDs]}
dictionarysandbox (
bool
) – whether to perform edits on the Wikidata sandbox 2 item
- Return type
mix_n_match_client
¶
A client that uploads non-confident links to the Mix’n’match tool for curation.
It inserts data in the catalog
and entry
tables of the s51434__mixnmatch_p
database located in ToolsDB under the Wikimedia Toolforge infrastructure.
See how to connect.
- soweego.ingester.mix_n_match_client.activate_catalog(catalog_id, catalog, entity)[source]¶
Activate a catalog.
- Parameters
catalog_id (
int
) – the catalog id field of the catalog table in the s51434__mixnmatch_p Toolforge databasecatalog (
str
) –{'discogs', 'imdb', 'musicbrainz', 'twitter'}
. A supported catalogentity (
str
) –{'actor', 'band', 'director', 'musician', 'producer', 'writer', 'audiovisual_work', 'musical_work'}
. A supported entity
- Return type
- soweego.ingester.mix_n_match_client.add_catalog(catalog, entity)[source]¶
Add or update a catalog.
- Parameters
- Return type
- Returns
the catalog id field of the catalog table in the s51434__mixnmatch_p Toolforge database
- soweego.ingester.mix_n_match_client.add_matches(file_path, catalog_id, catalog, entity, confidence_range)[source]¶
Add or update matches to an existing catalog. Curated matches found in the catalog are kept as is.
- Parameters
file_path (
str
) – path to a file with matchescatalog_id (
int
) – the catalog id field of the catalog table in the s51434__mixnmatch_p Toolforge databasecatalog (
str
) –{'discogs', 'imdb', 'musicbrainz', 'twitter'}
. A supported catalogentity (
str
) –{'actor', 'band', 'director', 'musician', 'producer', 'writer', 'audiovisual_work', 'musical_work'}
. A supported entityconfidence_range (
Tuple
[float
,float
]) – a pair of floats indicating the minimum and maximum confidence scores of matches that will be added/updated.
- Return type