Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote R hanging due to stty settings #1229

Open
epruesse opened this issue Oct 10, 2022 · 18 comments
Open

Remote R hanging due to stty settings #1229

epruesse opened this issue Oct 10, 2022 · 18 comments

Comments

@epruesse
Copy link

Related to possibly #1110, #1162, #1152 and others.

After an update of emacs and ESS my remote R sessions started to hang my entire life emacs. It took most of a day to figure out why, posting here mostly to document workaround and potentially inform a fix in ESS.

Background: I am using TRAMP to SSH into a compute cluster and running a small wrapper script to launch R on a compute node. The cluster is running SLURM, so the wrapper script essentially does exec srun --mem 50G -c16 -p interactive --pty PATH_TO_R --no-save --vanilla "$@". This worked fine with my 2020 Emacs+ESS, but failed after I brazenly upgraded to get |> indentation support.

The cause of the issue was stty settings. TRAMP sends a series of stty command early on:

stty -inlcr -onlcr -echo kill '^U' erase '^H'
stty tab0
stty iutf8

These settings are not kept by SLURM when redirecting to a compute node via srun. So within that R session on the cluster, echo was on again and erase set to backspace. Somehow this led to the hangs within ESS.

A functioning workaround for slurm is running this within the wrapper:

stty_settings="$(stty --save)"
exec srun --x11 --pty --mem 50G --cpus-per-task 16 bash -c "
stty $stty_settings 2>/dev/null
R --no-save --vanilla $*
"

It might help to simply run

system("command -v stty >/dev/null && stty -echo -inlcr -onlcr kill '^U' erase '^H' iutf8 tab0")

as the very first bit inside of LOADREMOTE or .load.R to ensure the pty settings are as expected.

I can make a PR to that end, but this is fragile territory, what fixes things for me might break them for others.

@ghost
Copy link

ghost commented Oct 27, 2022

@epruesse I am running a similar setup, but without the a wrapper script to launch R, and the cluster is running a different software. Right now I run the exec srun --mem ... manually, then ess-remote inside R. Would you be able to share your script/setup? Thanks!

This problem might be related to:
#1217
doomemacs/doomemacs#6695

@ghost
Copy link

ghost commented Oct 27, 2022

I had partial success with this issue by using export PS1="$ " in my .bashrc.
It helped with the case where I launch R with M-x R. This command still launches a remote session of R in the cluster, but uses the login node in the cluster. I can work in this setting, but only for small projects with small compute requirements.

The problem shows up again when using a shell to connect to the cluster, launching R, and calling ess-remote.

@ghost
Copy link

ghost commented Oct 27, 2022

I was testing your solution by running command -v stty >/dev/null && stty -echo -inlcr -onlcr kill '^U' erase '^H' iutf8 tab0 after getting a compute node, and system("command -v stty >/dev/null && stty -echo -inlcr -onlcr kill '^U' erase '^H' iutf8 tab0") after launching R from the compute node. However, my life emacs is still hanging. This is the error from toggle-debug-on-quit:
Screen Shot 2022-10-27 at 3 51 00 PM

@epruesse
Copy link
Author

@gui-salome I am using bioconda so I've got my R inside of a conda environment. I set inferior-R-program-name to a script that activates the environment (running the absolute path works too, but only if you don't install.packages), and jumps on a node if srun is present on the server I'm on.

init.el:

(use-package ess
  :defer t
  :ensure t
  :custom
  (inferior-R-program-name "~/bin/run_R" "Script to redirect to cluster when on discovery")
  [...]

~/bin/run_R:

RFLAGS="--no-save --no-restore-history --quiet"
CORES=16
RAM=50G
ENV=R4

echo "Activating $ENV conda environment"
. ~/miniconda3/etc/profile.d/conda.sh
conda activate $ENV

if command -v srun >/dev/null 2>/dev/null; then
    echo "Submitting to cluster ($CORES cores, $RAM)"
    # Save stty settings (necessary for emacs/tramp)
    stty_settings=$(stty --save) 
    exec srun --x11 --pty --mem 50G --cpus-per-task 16 bash -c "
    echo Running on $HOSTNAME
    echo Fixing stty...
    stty $stty_settings 2>/dev/null
    echo Launching $(which R) $RFLAGS $*
    R --version | head -n1
    R $RFLAGS $*
    "
else
    echo Running locally on $HOSTNAME
    echo Launching $(which R) $RFLAGS $*
    R --version | head -n1
    exec R $RFLAGS "$@"
fi

~/.ssh/config for completeness

Host discovery
  ForwardX11 yes
  ControlPath ~/.ssh/cm-%r@%h:%p
  ControlMaster auto
  ControlPersist yes
  ServerAliveInterval 15
  ServerAliveCountMax 8

So I usually just ^x ^f to /ssh:discovery:/path/to/some.R and then use ^RET to launch R.

@epruesse
Copy link
Author

Oh, also rather at the top of ~/.bashrc

# Fast exit if coming in via emacs tramp
[ "$TERM" = "dumb" ] && return

This is to prevent it loading conda stuff (slow) for mere tramp sessions and to leave PS1 trivial.

@epruesse
Copy link
Author

The problem shows up again when using a shell to connect to the cluster, launching R, and calling ess-remote.
If the terminal in emacs looks different after moving to the cluster node, specifically if you now have an empty line between prompts when you just press return, then it's the stty settings not getting copied. TRAMP sets "echo" to off, but that setting is lost when you do srun. Don't know how regular terminals handle it - it's not happening in iterm2, so something in emacs/tramp isn't doing quite the right thing.

@ghost
Copy link

ghost commented Oct 28, 2022

@gui-salome I am using bioconda so I've got my R inside of a conda environment. I set inferior-R-program-name to a script that activates the environment (running the absolute path works too, but only if you don't install.packages), and jumps on a node if srun is present on the server I'm on.

Oh, this is interesting, I didn't know that was a possibility. In my case, I was just passing the path to the right version of R I want to use. I am going to try using a similar approach here. Thanks for sharing!!

@ghost
Copy link

ghost commented Oct 31, 2022

The problem is still persisting after testing it more. As soon as ess-remote is run, Emacs starts freezing over and over. There should be a simpler solution, 99% of what needs to happen is copying text from one buffer to another and hitting RET.

@ghost
Copy link

ghost commented Nov 3, 2022

@epruesse Have you tried reverting to the older version of Emacs you used? What version was it by the way? I did not have this issue either on the version I installed back in 2020, but I am not sure what version it was.

@epruesse
Copy link
Author

epruesse commented Nov 3, 2022

Oh, this is interesting, I didn't know that was a possibility.

You can always exec some_command which will replace bash with the command you run. Although it's a minor optimization. Just running works. Lots of things have shell wrappers to do setup. Pretty much any java thing, but also e.g. firefox.

Have you tried reverting to the older version of Emacs you used?

No. Don't know what it was.

There should be a simpler solution, 99% of what needs to happen is copying text from one buffer to another and hitting RET.

ESS can, and does, do way more. Although it would be good to have a fall back option to just turn everything off in case things break. Ultimately, anything rendering an entire emacs session unusable isn't good. Just know way too little about ESS and TRAMP to understand why this is happening. There are time out guards in place...

@ghost
Copy link

ghost commented Nov 7, 2022

I probably haven't used the "way more" from ESS to miss that much, but the simple execution should definitely have a fallback or simpler mechanism.

@ghost
Copy link

ghost commented Nov 11, 2022

I have been testing ESS without tramp (running R locally instead of going through ssh), and I am still having the same issues (emacs hangs while editing local files). This is the output of debug on quit:
Screenshot 2022-11-11 at 9 30 19 AM

What is odd is that I haven't even launched R with M-x R, nor used ess-remote. Somehow ess is still causing emacs to hang.

@lionel-
Copy link
Member

lionel- commented Nov 11, 2022

This is eldoc. Try setting ess-can-eval-in-background to nil.

What version of ESS do you have? I recommend always running the latest devel version.

@ghost
Copy link

ghost commented Nov 11, 2022

I am using doomemacs and this is the version of ess installed:
Screenshot 2022-11-11 at 10 16 42 AM

I will try ess-can-eval-in-background. Will that mess up other operations?
Thanks for helping!

@lionel-
Copy link
Member

lionel- commented Nov 11, 2022

ok you have the latest, so there seems to be a bad interaction between ESS and some property of your setup.

Will that mess up other operations?

This disables all contextual help, like eldoc and company completions.

@ghost
Copy link

ghost commented Nov 11, 2022

ok you have the latest, so there seems to be a bad interaction between ESS and some property of your setup.

Will that mess up other operations?

This disables all contextual help, like eldoc and company completions.

My set up is vanilla. I am using the out of the box configuration of doomemacs, and enabling ess, that's it, no other changes.

@lionel-
Copy link
Member

lionel- commented Apr 15, 2023

Do you still see that behaviour with latest dev ESS? When I run R -d lldb I also get an echoing stty and it no longer hangs.

@malcook
Copy link

malcook commented Apr 26, 2024

In my hands, that stty -echo nl removes the echoed commands and spurious carriage-returns which for me result in ess-remote to fail to fully/correctly initialize, resulting in further downstream problems trying to use it.

I had been experiencing such problems when running R within a bash session on a interactive slurm-allocated compute host on our_HPC_Cluster, viz:

meta-x shell

$ ssh our_HPC_Cluster
$ srun -p interactive --pty bash
$ # stty -echo nl
$ module load R # which actually defines R as a script that wraps R within singularity
$ R

meta-x ess-remote

I find ess-remote fails unless I uncomment the stty -echo nl in the above.

I am unsure if there is a profitable way of ensuring that ess-remote runs in an environment in which stty is configured as required. For now I know to call stty -echo nl from within my interactive slurm bash session.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants