ingester
¶
Take soweego output into Wikidata items.
wikidata_bot
¶
A Wikidata bot that adds, deletes, or deprecates referenced statements. Here are typical output examples.
add_identifiers()
add_people_statements()
add_works_statements()
delete_or_deprecate_identifiers()
deletes or deprecates identifier statements.
-
soweego.ingester.wikidata_bot.
add_identifiers
(identifiers, catalog, entity, sandbox)[source]¶ Add identifier statements to existing Wikidata items.
- Parameters
identifiers (
dict
) – a{QID: catalog_identifier}
dictionarycatalog (
str
) –{'discogs', 'imdb', 'musicbrainz', 'twitter'}
. A supported catalogentity (
str
) –{'actor', 'band', 'director', 'musician', 'producer', 'writer', 'audiovisual_work', 'musical_work'}
. A supported entitysandbox (
bool
) – whether to perform edits on the Wikidata sandbox item
- Return type
None
-
soweego.ingester.wikidata_bot.
add_people_statements
(statements, sandbox)[source]¶ Add statements to existing Wikidata people.
Statements typically come from validation criteria 2 or 3 as per
soweego.validator.checks.links()
andsoweego.validator.checks.bio()
.- Parameters
statements (
Iterable
[+T_co]) – iterable of (subject, predicate, value) triplessandbox (
bool
) –whether to perform edits on the Wikidata sandbox item
- Return type
None
-
soweego.ingester.wikidata_bot.
add_works_statements
(statements, catalog, sandbox)[source]¶ Add statements to existing Wikidata works.
Statements typically come from
soweego.validator.enrichment.generate_statements()
.- Parameters
statements (
Iterable
[+T_co]) – iterable of (work QID, predicate, person QID, person target ID) tuplescatalog (
str
) –{'discogs', 'imdb', 'musicbrainz', 'twitter'}
. A supported catalogsandbox (
bool
) –whether to perform edits on the Wikidata sandbox item
- Return type
None
-
soweego.ingester.wikidata_bot.
delete_or_deprecate_identifiers
(action, catalog, entity, invalid, sandbox)[source]¶ Delete or deprecate invalid identifier statements from existing Wikidata items.
Deletion candidates come from validation criterion 1 as per
soweego.validator.checks.dead_ids()
.Deprecation candidates come from validation criteria 2 or 3 as per
soweego.validator.checks.links()
andsoweego.validator.checks.bio()
.- Parameters
action (
str
) – {‘delete’, ‘deprecate’}catalog (
str
) –{'discogs', 'imdb', 'musicbrainz', 'twitter'}
. A supported catalogentity (
str
) –{'actor', 'band', 'director', 'musician', 'producer', 'writer', 'audiovisual_work', 'musical_work'}
. A supported entityinvalid (
dict
) – a{invalid_catalog_identifier: [list of QIDs]}
dictionarysandbox (
bool
) –whether to perform edits on the Wikidata sandbox item
- Return type
None
mix_n_match_client
¶
A client that uploads non-confident links to the Mix’n’match tool for curation.
It inserts data in the catalog
and entry
tables of the s51434__mixnmatch_p
database located in ToolsDB under the Wikimedia Toolforge infrastructure.
See how to connect.
-
soweego.ingester.mix_n_match_client.
activate_catalog
(catalog_id, catalog, entity)[source]¶ Activate a catalog.
- Parameters
catalog_id (
int
) – the catalog id field of the catalog table in the s51434__mixnmatch_p Toolforge databasecatalog (
str
) –{'discogs', 'imdb', 'musicbrainz', 'twitter'}
. A supported catalogentity (
str
) –{'actor', 'band', 'director', 'musician', 'producer', 'writer', 'audiovisual_work', 'musical_work'}
. A supported entity
- Return type
None
-
soweego.ingester.mix_n_match_client.
add_catalog
(catalog, entity)[source]¶ Add or update a catalog.
- Parameters
- Return type
- Returns
the catalog id field of the catalog table in the s51434__mixnmatch_p Toolforge database
-
soweego.ingester.mix_n_match_client.
add_matches
(file_path, catalog_id, catalog, entity, confidence_range)[source]¶ Add or update matches to an existing catalog. Curated matches found in the catalog are kept as is.
- Parameters
file_path (
str
) – path to a file with matchescatalog_id (
int
) – the catalog id field of the catalog table in the s51434__mixnmatch_p Toolforge databasecatalog (
str
) –{'discogs', 'imdb', 'musicbrainz', 'twitter'}
. A supported catalogentity (
str
) –{'actor', 'band', 'director', 'musician', 'producer', 'writer', 'audiovisual_work', 'musical_work'}
. A supported entityconfidence_range (
Tuple
[float
,float
]) – a pair of floats indicating the minimum and maximum confidence scores of matches that will be added/updated.
- Return type
None