Train For Language translation #1600
-
I have csv file and want to train for language translation what is dataset format for it and how can load csv file to train? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
that's a json format. csv is comma separated. |
Beta Was this translation helpful? Give feedback.
-
Hello, sorry for very late reply. I just got chance to go through past discussion. Leaving this here for future readers. There's multiple ways to go at this. One common method is to do a pretraining (training on large corpus of your desired language with some EN mixed in), followed by supervised fine tuning of translation tasks or other tasks. You can find many papers about this if you want inspiration. Axolotl supports both pretraining and supervised fine-tuning https://axolotl-ai-cloud.github.io/axolotl/docs/dataset-formats/ As mentioned by Respaired, your sample is a jsonl format. However, we support either: datasets:
- path: path/to/file.extension
ds_type: csv # or json |
Beta Was this translation helpful? Give feedback.
Hello, sorry for very late reply. I just got chance to go through past discussion. Leaving this here for future readers.
There's multiple ways to go at this. One common method is to do a pretraining (training on large corpus of your desired language with some EN mixed in), followed by supervised fine tuning of translation tasks or other tasks.
You can find many papers about this if you want inspiration.
Axolotl supports both pretraining and supervised fine-tuning https://axolotl-ai-cloud.github.io/axolotl/docs/dataset-formats/
As mentioned by Respaired, your sample is a jsonl format. However, we support either: