Experiments

Default evaluation technique

Applies to all experiments:

  • stratified 5-fold cross validation over training/test splits;

  • mean performance scores over the folds.

Single-layer perceptron optimizers

https://github.com/Wikidata/soweego/issues/285

Setting

  • run: May 3 2019;

  • output folder: soweego-2.eqiad.wmflabs:/srv/dev/20190503/;

  • head commit: d0d390e622f2782a49a1bd0ebfc64478ed34aa0c;

  • command: python -m soweego linker evaluate slp ${Dataset} ${Entity} optimizer=${Optimizer}.

Discogs band

Optimizer

Precision

Recall

F-score

sgd

.782

.945

.856

rmsprop

.801

.930

.860

nadam

.805

.925

.861

adamax

.795

.938

.861

adam

.800

.929

.860

adagrad

.802

.927

.859

adadelta

.799

.934

.861

Discogs musician

Optimizer

Precision

Recall

F-score

sgd

.815

.985

.892

rmsprop

.816

.985

.893

nadam

.816

.986

.893

adamax

.817

.985

.893

adam

.816

.985

.893

adagrad

.816

.986

.893

adadelta

.815

.986

.892

IMDb director

Optimizer

Precision

Recall

F-score

sgd

.918

.954

.936

rmsprop

.895

.954

.923

nadam

.908

.954

.930

adamax

.907

.955

.930

adam

.909

.953

.931

adagrad

.867

.950

.907

adadelta

.902

.954

.927

IMDb musician

Optimizer

Precision

Recall

F-score

sgd

.912

.927

.920

rmsprop

.913

.929

.921

nadam

.913

.929

.921

adamax

.913

.928

.921

adam

.913

.928

.921

adagrad

.873

.860

.866

adadelta

.913

.928

.921

IMDb producer

Optimizer

Precision

Recall

F-score

sgd

.917

.942

.929

rmsprop

.916

.938

.927

nadam

.916

.938

.927

adamax

.916

.940

.928

adam

.916

.938

.927

adagrad

.852

.684

.756

adadelta

.916

.939

.928

IMDb writer

Optimizer

Precision

Recall

F-score

sgd

.929

.943

.936

rmsprop

.927

.940

.934

nadam

.930

.940

.935

adamax

.930

.941

.935

adam

.930

.940

.935

adagrad

.872

.923

.896

adadelta

.931

.941

.936

MusicBrainz band

Optimizer

Precision

Recall

F-score

sgd

.952

.869

.909

rmsprop

.949

.875

.911

nadam

.949

.877

.911

adamax

.952

.871

.910

adam

.951

.875

.911

adagrad

.932

.886

.909

adadelta

.952

.874

.911

MusicBrainz musician

Optimizer

Precision

Recall

F-score

sgd

.942

.957

.949

rmsprop

.941

.958

.949

nadam

.941

.958

.949

adamax

.941

.958

.949

adam

.941

.958

.949

adagrad

.946

.953

.950

adadelta

.941

.958

.950

Takeaways

  • All optimizers seem to do a similar job;

  • no specific impact on the performance.

Max Levenshtein VS average Levenshtein

https://github.com/Wikidata/soweego/issues/176

Setting

  • run: May 7 2019;

  • output folder: soweego-2.eqiad.wmflabs:/srv/dev/20190507/;

  • head commit: ddd5d719793ea217267413a52d1d2e5b90c341a7;

  • command: python -m soweego linker evaluate ${Algorithm} ${Dataset} ${Entity}.

Discogs band

Algorithm

Precision

Recall

F-score

nb max

.787

.955

.863

nb avg

.789

.941

.859

lsvm max

.780

.960

.861

lsvm avg

.785

.946

.858

svm max

.777

.963

.860

svm avg

.777

.963

.860

slp max

.784

.954

.861

slp avg

.776

.956

.857

mlp max

.822

.925

.870

Discogs musician

Algorithm

Precision

Recall

F-score

nb max

.831

.975

.897

nb avg

.836

.958

.893

lsvm max

.818

.985

.894

lsvm avg

.814

.986

.892

svm max

.815

.985

.892

svm avg

.815

.985

.892

slp max

.821

.983

.895

slp avg

.815

.985

.892

mlp max

.852

.963

.904

IMDb director

Algorithm

Precision

Recall

F-score

nb max

.896

.971

.932

nb avg

.897

.971

.932

lsvm max

.919

.943

.931

lsvm avg

.919

.942

.930

svm max

.911

.950

.930

svm avg

.908

.958

.932

slp max

.917

.953

.935

slp avg

.867

.953

.908

mlp max

.913

.964

.938

IMDb musician

Algorithm

Precision

Recall

F-score

nb max

.889

.962

.924

nb avg

.891

.960

.924

lsvm max

.917

.938

.927

lsvm avg

.917

.937

.927

svm max

.904

.944

.924

svm avg

.908

.942

.924

slp max

.924

.929

.926

slp avg

.922

.914

.918

mlp max

.912

.951

.931

IMDb producer

Algorithm

Precision

Recall

F-score

nb max

.870

.971

.918

nb avg

.871

.970

.918

lsvm max

.920

.940

.930

lsvm avg

.920

.938

.929

svm max

.923

.927

.925

svm avg

.923

.926

.925

slp max

.914

.940

.927

slp avg

.862

.914

.883

mlp max

.911

.956

.933

IMDb writer

Algorithm

Precision

Recall

F-score

nb max

.904

.975

.938

nb avg

.910

.961

.935

lsvm max

.936

.949

.943

lsvm avg

.936

.948

.942

svm max

.932

.954

.943

svm avg

.932

.954

.943

slp max

.938

.946

.942

slp avg

.903

.955

.928

mlp max

.930

.963

.946

MusicBrainz band

Algorithm

Precision

Recall

F-score

nb max

.821

.987

.896

nb avg

.822

.985

.896

lsvm max

.944

.879

.910

lsvm avg

.943

.888

.914

svm max

.930

.891

.910

svm avg

.939

.893

.915

slp max

.953

.865

.907

slp avg

.930

.885

.907

mlp max

.906

.918

.911

MusicBrainz musician

Algorithm

Precision

Recall

F-score

nb max

.955

.936

.946

nb avg

.955

.936

.946

lsvm max

.941

.963

.952

lsvm avg

.941

.962

.952

svm max

.951

.938

.944

svm avg

.950

.938

.944

slp max

.942

.957

.949

slp avg

.943

.956

.949

mlp max

.939

.970

.954

Takeaways

Max Levenshtein has the following impact:

  • NB is always improved or left untouched;

  • LSVM is always improved, left untouched for IMDb director, but worsens for MusicBrainz band;

  • SVM is often left untouched, but worsens for IMDb director and MusicBrainz band;

  • SLP is always improved with the highest impact, left untouched for MusicBrainz;

  • conclusion: max Levenshtein should replace the average one.

String kernel feature

https://github.com/Wikidata/soweego/issues/174

Setting

  • run: May 8 2019;

  • output folder: soweego-2.eqiad.wmflabs:/srv/dev/20190508/;

  • head commit: 0c5137fc4fe446abdb6df6dbde277b7aa15881c5;

  • command: python -m soweego linker evaluate ${Algorithm} ${Dataset} ${Entity}.

Discogs band

Algorithm

Precision

Recall

F-score

nb +sk

.788

.942

.859

nb

.789

.941

.859

lsvm +sk

.785

.946

.858

lsvm

.785

.946

.858

svm +sk

.778

.963

.861

svm

.777

.963

.860

slp +sk

.783

.947

.857

slp

.776

.956

.857

mlp +sk

.848

.913

.879

Discogs musician

Algorithm

Precision

Recall

F-score

nb +sk

.836

.958

.893

nb

.836

.958

.893

lsvm +sk

.816

.985

.892

lsvm

.814

.986

.892

svm +sk

.815

.985

.892

svm

.815

.985

.892

slp +sk

.820

.978

.892

slp

.815

.985

.892

mlp +sk

.868

.948

.906

IMDb director

Algorithm

Precision

Recall

F-score

nb +sk

.897

.971

.932

nb

.897

.971

.932

lsvm +sk

.923

.949

.935

lsvm

.919

.942

.930

svm +sk

.914

.950

.931

svm

.908

.958

.932

slp +sk

.918

.955

.936

slp

.867

.953

.908

mlp +sk

.918

.964

.941

IMDb musician

Algorithm

Precision

Recall

F-score

nb +sk

.891

.961

.924

nb

.891

.960

.924

lsvm +sk

.922

.941

.931

lsvm

.917

.937

.927

svm +sk

.910

.949

.929

svm

.908

.942

.924

slp +sk

.922

.934

.928

slp

.922

.914

.918

mlp +sk

.914

.958

.935

IMDb producer

Algorithm

Precision

Recall

F-score

nb +sk

.871

.970

.918

nb

.871

.970

.918

lsvm +sk

.921

.943

.932

lsvm

.920

.938

.929

svm +sk

.923

.927

.925

svm

.923

.926

.925

slp +sk

.916

.942

.929

slp

.862

.914

.883

mlp +sk

.912

.959

.935

IMDb writer

Algorithm

Precision

Recall

F-score

nb +sk

.910

.961

.935

nb

.910

.961

.935

lsvm +sk

.938

.953

.945

lsvm

.936

.948

.942

svm +sk

.933

.957

.945

svm

.932

.954

.943

slp +sk

.939

.948

.943

slp

.903

.955

.928

mlp +sk

.931

.968

.949

MusicBrainz band

Algorithm

Precision

Recall

F-score

nb +sk

.821

.985

.896

nb

.822

.985

.896

lsvm +sk

.940

.895

.917

lsvm

.943

.888

.914

svm +sk

.937

.899

.918

svm

.939

.893

.915

slp +sk

.952

.873

.911

slp

.930

.885

.907

mlp +sk

.937

.904

.920

MusicBrainz musician

Algorithm

Precision

Recall

F-score

nb +sk

.955

.936

.946

nb

.955

.936

.946

lsvm +sk

.938

.965

.951

lsvm

.941

.962

.952

svm +sk

.951

.938

.944

svm

.950

.938

.944

slp +sk

.941

.958

.950

slp

.943

.956

.949

mlp +sk

.939

.972

.955

Takeaways

The string kernel feature:

  • has the most positive impact on SLP;

  • slightly improves performance in most cases, but sligthly worsens:

    • precision in 1 case, i.e., NB for MusicBrainz band;

    • recall in 3 cases, i.e., SLP for Discogs band, LSVM & SLP for Discogs musician;

    • f-score in 2 cases, i.e., SVM for IMDb director, LSVM for MusicBrainz musician.

  • conclusion: the string kernel feature should be added.