- Do [[fine-tuning]] of [Japanese BERT
- Iāve even loaded a trained model of BERT on hand to vectorize the sentences and train a discriminative model.
- It was 72.1%, so I want to experiment with fine-tuning this and see how far it goes up.
- Read this.
- Replicate finetune-to-livedoor-corpus.ipynb to your own Google Colab environment
- It says to gcutil cp from within GCP, but I decided to mount Google Drive. - View Drive from Google Colab
- Options for run_classifier:.
--model_file=../model/wiki-ja.model \
--vocab_file=../model/wiki-ja.vocab \
- These specify the SentencePiece model.
- If you follow the ipynb procedure, it is copied from GCP to Colabās FS in advance.
- Iām not doing it, so Iāll read from Google Drive.
- like this
--model_file=/content/gdrive/My\ Drive/bert-wiki-ja/wiki-ja.model \
--vocab_file=/content/gdrive/My\ Drive/bert-wiki-ja/wiki-ja.vocab \
--init_checkpoint=/content/gdrive/My\ Drive/bert-wiki-ja/model.ckpt-1400000 \
- Google Colabās FS root is /content.
- I didnāt want to make the mistake caused by relative paths, so I used absolute paths.
- Note: BERT models must be loaded and results exported to GCP
- init_checkpoint will fail if written above because TPU tries to read it.
- Where data is being read
train_examples = processor.get_train_examples(FLAGS.data_dir)
- The processor is
processor = processors[task_name]()
and--livedoorProcessor
is selected by--task_name=livedoor
passed as a command line argument. LivedoorProcessor
extendsDataProcessor
and calls the class method_read_tsv
of the parent class, which simply opens the file, reads it as TSV and returns each line in a list- The result is used to create an
InputExample
object inLivedoorProcessor#_create_examples
and put it in the list. InputExample
with text and label
- The processor is
- Model Creation
model_fn_builder(...) Create
model_fnwith
tf.contrib.tpu.TPUEstimator, then wrap it with
tf.contrib.tpu.TPUEstimatorand call it
estimator`. - Where to read the learned modelmodel_fn_builder(...)
call `create_model- This is done by creating the original BERT model and then
output_layer = model.get_pooled_output()
.- I explained what this
get_pooled_output
is in 5db89bb6aff09e0000d4c214.
- I explained what this
- It looks like you are implementing a simple logistic regression with
get_pooled_output
as input, but Iām not sure why you had to implement it on your own. - Iād like to customize things in the future, but for now, I guess Iāll just accept āthatās the way it isā and move on.
- The slight difference in my case this time is that the label is 0/1.
- mounting
- Import run_clussifier.py and create your own
LivedoorProcessor
equivalent and rewrite main to call it! - Use the mapping from title(unique id) to body text in honbun_data.json
- The
LivedoorProcessor
reads different files for each method call, but in my case one- Create data outside of the class and have methods just return references
- Negative-positive pairs are in negaposi.json, so use these.
-
Iād like to customize things in the future, but for now, I guess Iāll just accept āthatās the way it isā and move on.
- Iām starting to think I canāt do this.
- Itās not impossible, but implementing a negative-positive decision by receiving two pairs of values in LR isā¦
- No quicksand because there is no combination predispositionā¦
- Well, letās do it as a simple means to an end.
- In the first place, do we need
pooled_output
for each of the two input sentences?- Should I put it in BERT text_a and text_b?
- The original
_create_examples
doestokenization.convert_to_unicode
, but in my case itās 0/1, so I guess I can put it in int.
- Import run_clussifier.py and create your own
- He died after a short run.
- Cannot write to Google Colabās local FS from TPU
- āservice-***@cloud-tpu.iam.gserviceaccount.com does not have storage.objects.list access to ***.appspot.com.ā
- The cell labeled āCheck TPUā in ipynb even grants GCP Bucket access to the TPU
- If it doesnāt work, this is where youāve failed and you need to start over. :
- āservice-***@cloud-tpu.iam.gserviceaccount.com does not have storage.objects.list access to ***.appspot.com.ā
***** Eval results *****
eval_accuracy = 0.9525939
eval_loss = 0.40050572
global_step = 1047
loss = 0.39979258
CPU times: user 2.1 s, sys: 313 ms, total: 2.41 s
Wall time: 7min 54s
classification_report
precision recall f1-score support
0 0.94 0.96 0.95 1118
1 0.95 0.94 0.95 1117
accuracy 0.95 2235
macro avg 0.95 0.95 0.95 2235
weighted avg 0.95 0.95 0.95 2235
confusion_matrix
[[1068 50]
[ 68 1049]]
- It makes me want to spit on my eyebrows to see if itās true.
summary
- Fine-tuning of BERT with 6705 data can be done in about 7 minutes using TPU
- 72% to 95% accuracy
- However, there is a problem that the results of processing B cannot be cached
- I havenāt measured it, but Iām concerned that the processing time will go up by an order of magnitude or two.
- I want to try the right shape.
- This is an addition caution
- I want to try inner product attention and dimensionality reduction attention.
- However, there is a problem that the results of processing B cannot be cached
This page is auto-translated from /nishio/ę„ę¬čŖBERTć®fine-tuning using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. Iām very happy to spread my thought to non-Japanese readers.