Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker-machine-driver-triton#13 Made machine driver compatible with Rancher #14

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

blackwood821
Copy link

  • Added the ability to pass in the private SSH key as a base64 encoded string
  • Added code to wait for the IP address to become available so that Rancher doesn't try to SSH into the new node until it has all the necessary info

Details in #13.

…pass in the private SSH key as a base64 encoded string
@bahamat bahamat changed the title Made machine driver compatible with Rancher docker-machine-driver-triton#13 Made machine driver compatible with Rancher Jun 10, 2022
Copy link

@bahamat bahamat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but I'd like to see some test output in both rancher and non-rancher configurations, just to make sure we aren't breaking any existing users.

@blackwood821
Copy link
Author

This looks good, but I'd like to see some test output in both rancher and non-rancher configurations, just to make sure we aren't breaking any existing users.

Sound good, I should be able to get to that in the next few days.

@blackwood821
Copy link
Author

blackwood821 commented Jun 17, 2022

@bahamat Here is the rancher log snippet from creating a new Triton node via the Rancher UI:

2022/06/17 20:10:26 [INFO] Creating jail for c-r8zg4
2022/06/17 20:10:26 [INFO] Provisioning node worker-4
2022/06/17 20:10:26 [INFO] [node-controller-rancher-machine] Creating CA: /management-state/node/nodes/worker-4/certs/ca.pem
2022/06/17 20:10:26 [INFO] [node-controller-rancher-machine] Creating client certificate: /management-state/node/nodes/worker-4/certs/cert.pem
2022/06/17 20:10:26 [INFO] [node-controller-rancher-machine] Running pre-create checks...
2022/06/17 20:10:27 [INFO] [node-controller-rancher-machine] (worker-4) resolved image "debian-10" to "9bcfe5cc-007d-4f23-bc8a-7e7b4d0c537e" (exact name match)
2022/06/17 20:10:27 [INFO] [node-controller-rancher-machine] Creating machine...
2022/06/17 20:10:27 [INFO] [node-controller-rancher-machine] (worker-4) creating SSH key...
2022/06/17 20:10:30 [INFO] [node-controller-rancher-machine] (worker-4) waiting for ip address to become available
2022/06/17 20:10:37 [INFO] [node-controller-rancher-machine] (worker-4) got the IP Address: "192.168.0.211"
2022/06/17 20:10:37 [INFO] [node-controller-rancher-machine] Waiting for machine to be running, this may take a few minutes...
2022/06/17 20:10:43 [INFO] [node-controller-rancher-machine] Detecting operating system of created instance...
2022/06/17 20:10:43 [INFO] [node-controller-rancher-machine] Waiting for SSH to be available...
2022/06/17 20:10:55 [INFO] [node-controller-rancher-machine] Detecting the provisioner...
2022/06/17 20:10:56 [INFO] [node-controller-rancher-machine] Provisioning with debian...
2022/06/17 20:11:48 [INFO] [node-controller-rancher-machine] Copying certs to the local machine directory...
2022/06/17 20:11:49 [INFO] [node-controller-rancher-machine] Copying certs to the remote machine...
2022/06/17 20:11:52 [INFO] [node-controller-rancher-machine] Setting Docker configuration on the remote daemon...
2022/06/17 20:11:56 [INFO] [node-controller-rancher-machine] Checking connection to Docker...
2022/06/17 20:11:56 [INFO] [node-controller-rancher-machine] Docker is up and running!
2022/06/17 20:12:12 [INFO] Provisioning node worker-4 done

and here is what I saw in the process tree:

rancher-machine create -d triton --engine-install-url https://releases.rancher.com/install-docker/20.10.sh --engine-opt data-root=/data/var/lib/docker --triton-account chad --triton-image debian-10 --triton-key-id <REDACTED> --triton-key-material <REDACTED> --triton-package bhyve-flexible-4G-50G-2CPU --triton-ssh-user root --triton-url https://cloudapi.example.com worker-4

The VM was successfully created by Rancher:

Chads-MacBook-Pro:~ chad$ triton ls brand=bhyve | head -1; triton ls brand=bhyve | grep worker
SHORTID   NAME       IMG                                  STATE    FLAGS  AGE
87cffb89  worker-1   debian-10@20200508                   running  B      5w
fddd96de  worker-2   debian-10@20200508                   running  B      1w
74f18a21  worker-3   debian-10@20200508                   running  B      17m
05777a67  worker-4   debian-10@20200508                   running  B      15m

@blackwood821
Copy link
Author

blackwood821 commented Jun 17, 2022

@bahamat

Here is the output from using docker-machine locally outside of Rancher:

Chads-MacBook-Pro:docker-machine-driver-triton chad$ docker-machine create -d triton --engine-install-url https://releases.rancher.com/install-docker/20.10.sh --engine-opt data-root=/data/var/lib/docker --triton-account chad --triton-image debian-10 --triton-key-id <REDACTED> --triton-key-material <REDACTED> --triton-package bhyve-flexible-4G-50G-2CPU --triton-ssh-user root --triton-url https://cloudapi.example.com worker-5
Creating CA: /Users/chad/.docker/machine/certs/ca.pem
Creating client certificate: /Users/chad/.docker/machine/certs/cert.pem
Running pre-create checks...
(worker-5) resolved image "debian-10" to "9bcfe5cc-007d-4f23-bc8a-7e7b4d0c537e" (exact name match)
Creating machine...
(worker-5) creating SSH key...
(worker-5) waiting for ip address to become available
(worker-5) got the IP Address: "192.168.0.217"
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with debian...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Error creating machine: Error running provisioning: Unable to verify the Docker daemon is listening: Maximum number of retries (10) exceeded

But it appears to have succeeded despite the error in the output:

Chads-MacBook-Pro:~ chad$ triton ls brand=bhyve | head -1; triton ls brand=bhyve | grep worker
SHORTID   NAME       IMG                                  STATE    FLAGS  AGE
87cffb89  worker-1   debian-10@20200508                   running  B      5w
fddd96de  worker-2   debian-10@20200508                   running  B      1w
74f18a21  worker-3   debian-10@20200508                   running  B      17m
05777a67  worker-4   debian-10@20200508                   running  B      15m
4df7cae1  worker-5   debian-10@20200508                   running  B      5m
Chads-MacBook-Pro:~ chad$ triton ssh worker-5
Warning: Permanently added '192.168.0.217' (ED25519) to the list of known hosts.
Linux worker-5 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1+deb10u1 (2020-04-27) x86_64
   __        .                   .
 _|  |_      | .-. .  . .-. :--. |-
|_    _|     ;|   ||  |(.-' |  | |
  |__|   `--'  `-' `;-| `-' '  ' `-'
                   /  ;  Instance (Debian 10.3 20200508)
                   `-'   https://docs.joyent.com/images/linux/debian

Last login: Fri Jun 17 20:24:04 2022 from 192.168.0.40
root@worker-5:~# systemctl status docker
● docker.service - Docker Application Container Engine
   Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/docker.service.d
           └─10-machine.conf
   Active: active (running) since Fri 2022-06-17 20:21:47 UTC; 4min 44s ago
     Docs: https://docs.docker.com
 Main PID: 16818 (dockerd)
    Tasks: 8
   Memory: 37.3M
   CGroup: /system.slice/docker.service
           └─16818 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock

root@worker-5:~# docker ps
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES

The only difference is that in Rancher rancher-machine is used which is Rancher's fork of docker-machine (https://github.com/rancher/machine) whereas the test I ran outside of Rancher is using https://github.com/docker/machine. I wonder if Rancher's fork waits longer. I'll have to check the source.

@blackwood821
Copy link
Author

I checked the docker machine source code and it runs if ! type netstat 1>/dev/null; then ss -tln; else netstat -tln; fi via SSH:

root@worker-5:~# if ! type netstat 1>/dev/null; then ss -tln; else netstat -tln; fi
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN     
tcp6       0      0 :::22                   :::*                    LISTEN

And then it does a regex match against the output for the docker port which doesn't find a match because by default the docker daemon only listens on a unix socket and not on tcp:

root@worker-5:~# ps -aux | grep dockerd
root     16818  0.0  2.1 1311868 84588 ?       Ssl  Jun17   0:17 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock

So that explains why it thinks Docker isn't running but it is. I'm just not sure how the rancher machine version gets around this because looking at the source code it appears to do the same thing but it must handle it differently/correctly.

@blackwood821
Copy link
Author

blackwood821 commented Jun 20, 2022

Looks like this is an issue that was never closed in the docker machine repo:
docker/machine#4567

root@worker-5:/etc/systemd/system/docker.service.d# cat 10-machine.conf 
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd -H tcp://0.0.0.0:2376 -H unix:///var/run/docker.sock --storage-driver overlay2 --tlsverify --tlscacert /etc/docker/ca.pem --tlscert /etc/docker/server.pem --tlskey /etc/docker/server-key.pem --label provider=triton --data-root=/data/var/lib/docker 
Environment=

The docker service invocations has -H tcp://0.0.0.0:2376 -H unix:///var/run/docker.sock which is why it's not using TCP.

Anyway, @bahamat my changes in this pull request don't seem to have any negative impact as far as I can tell.

@blackwood821
Copy link
Author

I ran the same test with the driver from the master branch and the same issue occurs which confirms that my changes didn't introduce this issue:

Chads-MacBook-Pro:docker-machine-driver-triton chad$ pwd
/Users/chad/go/src/github.com/TritonDataCenter/docker-machine-driver-triton
Chads-MacBook-Pro:docker-machine-driver-triton chad$ docker-machine create -d triton --engine-install-url https://releases.rancher.com/install-docker/20.10.sh --engine-opt data-root=/data/var/lib/docker --triton-account chad --triton-image debian-10 --triton-key-id <REDACTED> --triton-package bhyve-flexible-4G-50G-2CPU --triton-ssh-user root --triton-url https://cloudapi.example.com worker-6
Running pre-create checks...
(worker-6) resolved image "debian-10" to "9bcfe5cc-007d-4f23-bc8a-7e7b4d0c537e" (exact name match)
Creating machine...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with debian...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Error creating machine: Error running provisioning: Unable to verify the Docker daemon is listening: Maximum number of retries (10) exceeded
root@worker-6:~# systemctl status docker
● docker.service - Docker Application Container Engine
   Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/docker.service.d
           └─10-machine.conf
   Active: active (running) since Mon 2022-06-20 16:35:17 UTC; 5min ago
     Docs: https://docs.docker.com
 Main PID: 16816 (dockerd)
    Tasks: 8
   Memory: 32.5M
   CGroup: /system.slice/docker.service
           └─16816 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock

Jun 20 16:35:16 worker-6 dockerd[16816]: time="2022-06-20T16:35:16.759804240Z" level=warning msg="Your kernel 
Jun 20 16:35:16 worker-6 dockerd[16816]: time="2022-06-20T16:35:16.759865874Z" level=warning msg="Your kernel 
Jun 20 16:35:16 worker-6 dockerd[16816]: time="2022-06-20T16:35:16.760683451Z" level=info msg="Loading contain
Jun 20 16:35:17 worker-6 dockerd[16816]: time="2022-06-20T16:35:17.110462048Z" level=info msg="Default bridge 
Jun 20 16:35:17 worker-6 dockerd[16816]: time="2022-06-20T16:35:17.180801033Z" level=info msg="Loading contain
Jun 20 16:35:17 worker-6 dockerd[16816]: time="2022-06-20T16:35:17.210967401Z" level=info msg="Docker daemon" 
Jun 20 16:35:17 worker-6 dockerd[16816]: time="2022-06-20T16:35:17.211221176Z" level=info msg="Daemon has comp
Jun 20 16:35:17 worker-6 systemd[1]: Started Docker Application Container Engine.
Jun 20 16:35:17 worker-6 dockerd[16816]: time="2022-06-20T16:35:17.224181830Z" level=info msg="API listen on /
Jun 20 16:35:17 worker-6 systemd[1]: docker.service: Current command vanished from the unit file, execution of
lines 1-22/22 (END)

This also confirms that my changes don't break the existing functionality when used without the new --triton-key-material argument.

blackwood821 and others added 6 commits June 20, 2022 11:45
…ion not to validate the CloudAPI SSL certificate in development setups.
Specifying a single tag is functionally the same as node-triton's `-t` argument, Multiple tags can be specified in a comma-delimited format, such as `<tag1>=<value1>,<tag2>=<value2>`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make machine driver compatible with Rancher Machine driver for Rancher
3 participants