Skip to content

Train For Language translation #1600

Closed Answered by NanoCode012
nichellehouston asked this question in Q&A
Discussion options

You must be logged in to vote

Hello, sorry for very late reply. I just got chance to go through past discussion. Leaving this here for future readers.

There's multiple ways to go at this. One common method is to do a pretraining (training on large corpus of your desired language with some EN mixed in), followed by supervised fine tuning of translation tasks or other tasks.

You can find many papers about this if you want inspiration.

Axolotl supports both pretraining and supervised fine-tuning https://axolotl-ai-cloud.github.io/axolotl/docs/dataset-formats/

As mentioned by Respaired, your sample is a jsonl format. However, we support either:

datasets:
   - path: path/to/file.extension
     ds_type: csv # or json

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by NanoCode012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants