Skip to content

zhaoxu98/Chao_Satellite_Image

Repository files navigation

LLaVA for precipitation nowcasting on SEVIR

All modifications are placed in the following two directories:

  • EarthLMM/llava/datasets/sevir for the SEVIR dataloader.
  • EarthLMM/scripts/sevir for all running scripts.

Installation

Follow the official guide to install LLaVA in dev mode. The needed packages for SEVIR are already included in the pyproject.toml.

  1. Clone this repository and navigate to LLaVA folder
git clone https://github.com/gaozhihan/EarthLMM.git
cd EarthLMM
  1. Install Package
conda create -n earthlmm python=3.10 -y
conda activate earthlmm
pip install --upgrade pip  # enable PEP 660 support
pip install -e .
  1. Install additional packages for training cases
pip install -e ".[train]"
pip install flash-attn --no-build-isolation

SEVIR Data Preparation

The official website of SEVIR.

Specifically, we only need the VIL data for precipitation nowcasting. Download it according to the official guide

aws s3 cp --no-sign-request s3://sevir/CATALOG.csv CATALOG.csv
aws s3 sync --no-sign-request s3://sevir/data/vil .

It takes around 138GB storage.

Then, place the data in the following structure:

EarthLMM
└── playground
    └── data
        └── sevir
            ├── CATALOG.csv
            └── data
                └── vil
                    └── ...

Convert Raw SEVIR data to LLaVA format

For multi input single output task, run

cd ROOT_DIR/EarthLMM
python ./scripts/sevir/convert_sevir.py --save sevir_convert_save_dir

It will load the configurations in ./scripts/sevir/sevir_cfg.yaml. Modify the following args to config the data conversion:

  • in_len: the number of frames in the input sequence.
  • out_len: the model is required to predict the out_len-th future frame.
  • seq_len: should be the sum of in_len and out_len.
  • stride: the stride between two adjacent sampled sequences. I use 8 for all my experiments.
  • frame_stride: the stride between two adjacent frames in the same sequence. E.g., 1 for 5-minute interval, 4 for 20-minute interval.
  • start_date: null for training and ID test. [2019, 6, 1] for OOD test.
  • end_date: [2019, 6, 1] for training and ID test. null for OOD test.

For multi input multi output task, run

cd ROOT_DIR/EarthLMM
python ./scripts/sevir/convert_sevir_multi_out.py --save sevir_convert_multi_out_save_dir

It will load the same config file: ./scripts/sevir/sevir_multi_out_cfg.yaml.

LoRA Fine-Tuning on SEVIR

Run

cd ROOT_DIR/EarthLMM
sh ./scripts/sevir/finetune_sevir_lora.sh

Remember to config the data and the save path via:

  • --data_path ./playground/data/sevir_convert_save_dir/sevir_llava.json.
  • --output_dir ./checkpoints/llava-v1.5-7b-sevir-lora.

It will save all checkpoints in the ./checkpoints/llava-v1.5-7b-sevir-lora directory.

Evaluating LLaVA with LoRA on SEVIR

Run

cd ROOT_DIR/EarthLMM
sh ./scripts/sevir/sevir_vqa_lora.sh

to generate predictions in corresponding data directories. Remember to config the script via:

  • --model-path ./checkpoints/llava-v1.5-7b-sevir-lora: point to the saved LoRA weights.
  • --question-file ./playground/data/sevir_convert_save_dir/sevir_questions.jsonl.
  • --answers-file ./playground/data/sevir_convert_save_dir/lora_answer.jsonl: to save the answers generated by LLaVA-LoRA.

Multi input multi output task uses the same script.

Once the answer is generated, the performance can be evaluated via

cd ROOT_DIR/EarthLMM
python scripts/sevir/eval_baseline.py --data sevir_convert_save_dir --answer lora_answer.jsonl

The performance of multi input multi output task can be evaluated via

cd ROOT_DIR/EarthLMM
python scripts/sevir/eval_multi_out.py --data sevir_convert_multi_out_save_dir --answer lora_answer.jsonl

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published