Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make redis_exporter cluster-aware #185

Closed
KushalP opened this issue Aug 13, 2018 · 10 comments
Closed

Make redis_exporter cluster-aware #185

KushalP opened this issue Aug 13, 2018 · 10 comments

Comments

@KushalP
Copy link

KushalP commented Aug 13, 2018

Problem statement
Many cloud providers offer a redis cluster as a service. These serviced clusters don't allow access to all of the underlying redis instances in a simple way. Instead, they expose a single endpoint that cluster-aware clients need to connect to.

Desired outcome
Make redis_exporter cluster-aware such that it can connect to a single instance in the cluster and track metrics for all nodes in the cluster.

@oliver006
Copy link
Owner

Thanks for raising this issue.
I don't run Redis in a clustered setup or via e.g. Amazon ElastiCache so I'm not familiar with what metrics you're missing out on that could be extracted.
What does redis-cli INFO ALL look like on such a clustered setup? Does it really expose all the stats from all the nodes?

@KushalP
Copy link
Author

KushalP commented Aug 15, 2018

Working with redis in clustered mode

When redis is run in "clustered" mode you can connect to a single node and run CLUSTER NODES to see all nodes in the cluster. Below is an example of a 6 node cluster, with 3 shards (masters) and a replica (slave) for each shard (master):

> CLUSTER NODES
5ad08ee539856eab6d1e42a31f6df4a0df6e112b 10.225.51.158:6379@1122 master - 0 1534326728200 3 connected 0-5461
28abc4fac6118e44341ced5adce40291a267dc70 10.225.51.113:6379@1122 slave 6b10b51298ae576bfa0d05d4f287c2d713758fac 0 1534326727193 2 connected
6b10b51298ae576bfa0d05d4f287c2d713758fac 10.225.51.48:6379@1122 master - 0 1534326730211 2 connected 10923-16383
8005f38d45750fc851ad48c4e9f4e674674b74be 10.225.51.75:6379@1122 master - 0 1534326729205 0 connected 5462-10922
6d0d97db233dd5b2546fc490461f3838e59a5b72 10.225.51.57:6379@1122 slave 8005f38d45750fc851ad48c4e9f4e674674b74be 0 1534326727000 0 connected
2e28959d10cdce16c7e8343e9102a9753dbb9068 10.225.51.103:6379@1122 myself,slave 5ad08ee539856eab6d1e42a31f6df4a0df6e112b 0 1534326728000 1 connected

You could then use this information to connect to each node and fetch any metrics you needed.

Output of INFO ALL

The output of INFO ALL on a single node looks like the following:

> INFO ALL
# Server
redis_version:4.0.10
redis_git_sha1:0
redis_git_dirty:0
redis_build_id:0
redis_mode:cluster
os:Amazon ElastiCache
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:0.0.0
process_id:1
run_id:8cbcf867668f2cab57decbc48dd874cd6464ab7b
tcp_port:6379
uptime_in_seconds:1092
uptime_in_days:0
hz:10
lru_clock:7599966
executable:-
config_file:-

# Clients
connected_clients:5
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0

# Memory
used_memory:5721024
used_memory_human:5.46M
used_memory_rss:7548928
used_memory_rss_human:7.20M
used_memory_peak:5842976
used_memory_peak_human:5.57M
used_memory_peak_perc:97.91%
used_memory_overhead:5618390
used_memory_startup:4452192
used_memory_dataset:102634
used_memory_dataset_perc:8.09%
used_memory_lua:37888
used_memory_lua_human:37.00K
maxmemory:436469760
maxmemory_human:416.25M
maxmemory_policy:volatile-lru
mem_fragmentation_ratio:1.32
mem_allocator:jemalloc-4.0.3
active_defrag_running:0
lazyfree_pending_objects:0

# Persistence
loading:0
rdb_changes_since_last_save:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1534325530
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:0
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0

# Stats
total_connections_received:8
total_commands_processed:2853
instantaneous_ops_per_sec:1
total_net_input_bytes:117735
total_net_output_bytes:4385101
instantaneous_input_kbps:0.04
instantaneous_output_kbps:0.05
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:0
migrate_cached_sockets:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0

# Replication
role:slave
master_host:10.225.51.158
master_port:6379
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:54135
repl_sync_enabled:1
slave_read_reploff:54135
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:1a7790e576771d8e386ae5df05d55d0cc1eb14cb
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:54135
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1739
repl_backlog_histlen:52397

# CPU
used_cpu_sys:0.33
used_cpu_user:0.75
used_cpu_sys_children:0.00
used_cpu_user_children:0.00

# Commandstats
cmdstat_replconf:calls=963,usec=1419,usec_per_call=1.47
cmdstat_clusteradmin:calls=802,usec=135273,usec_per_call=168.67
cmdstat_info:calls=671,usec=36128,usec_per_call=53.84
cmdstat_config:calls=5,usec=68,usec_per_call=13.60
cmdstat_ping:calls=411,usec=247,usec_per_call=0.60
cmdstat_command:calls=1,usec=316,usec_per_call=316.00

# SSL
ssl_enabled:no
ssl_connections_to_previous_certificate:0
ssl_connections_to_current_certificate:0
ssl_current_certificate_not_before_date:(null)
ssl_current_certificate_not_after_date:(null)
ssl_current_certificate_serial:0

# Cluster
cluster_enabled:1

# Keyspace

The tell that you're in a redis cluster is that the above includes the following:

# Cluster
cluster_enabled:1

@oliver006
Copy link
Owner

Thanks, that's helpful.

Just to clarify something (again, not really familiar with Amazon ElastiCache): in CLUSTER NODES I see IP addresses like 10.225.51.103:6379 - are these valid IP addresses that I can connect to as long as my exporter runs on a VM within the same VPC?

And a more general thought: the prometheus' way of doing things is to keep the service discovery aspect outside of the exporter (see #174 ) for related discussion so not really sure if we should start pulling this into the exporter or if this should be the job of your orchestration/service discovery system.

@KushalP
Copy link
Author

KushalP commented Aug 16, 2018

are these valid IP addresses that I can connect to as long as my exporter runs on a VM within the same VPC?

Yes.

the prometheus' way of doing things is to keep the service discovery aspect outside of the exporter

I would prefer to do this, to find the targets, but I've not been able to figure out a sane way to query the individuals nodes out. I'm going to look into it now.

@KushalP
Copy link
Author

KushalP commented Aug 16, 2018

It's possible to get the endpoint for all elasticache endpoints it seems by using the following query:

aws elasticache describe-cache-clusters --show-cache-node-info

I'm not entirely sure how to get Prometheus to do the same query as it uses the EC2 service discovery tooling.

@oliver006
Copy link
Owner

I think the way to do this would be to use the file_sd_config and have a little script periodically pull the new info via the aws elasticache command and then update the file and trigger a config reload of the Prometheus server.

@KushalP
Copy link
Author

KushalP commented Aug 20, 2018

Closing this issue for now as the file_sd_config might be enough to go on.

@KushalP KushalP closed this as completed Aug 20, 2018
@SuperQ
Copy link
Contributor

SuperQ commented Aug 22, 2018

One possible change is to allow a target param to the exporter's /metrics endpoint. This would create a "proxy" exporter, similar to the snmp_exporter and blackbox_exporter. This would allow Prometheus to continue to drive service discovery, while supporting several targets.

Another option, is we could pressure Amazon to include Prometheus metrics in their cloud service. 😁

@oliver006
Copy link
Owner

The target param is worth looking into but would need a bit of refactoring to reduce/remove global state (see #126 ).
If you could get Amazon to straight up export Prometheus metrics, that'd be my preferred solution ;-)

@Yagyansh
Copy link

Hi @oliver006 . Was is the conclusion for this? Have we found a way to get metrics from all the nodes of a Redis Cluster without using a script?

@KushalP How is the script thing working out for you and do you see any overheads?

Thanks. Looking for a solution for this, have the exact same use-case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants