At Rackslab, we are always looking for ways to make your infrastructure management more efficient and powerful. In RacksDB v0.5.0, we’ve introduced a seamless integration between RacksDB and ClusterShell, offering you a new level of flexibility and control when managing your HPC & AI clusters.
What is RacksDB?
For those new to RacksDB, it’s an open-source software that helps you modeling and visualize hardware resources in your datacenters, with simple schema-validated YAML files. It’s designed to store information about your physical infrastructure, including racks, servers, networking equipment, and much more. With RacksDB, you can build a reference description of your IT infrastructure, easy to access and request by any components in your software stack.
What is ClusterShell?
ClusterShell is a powerful tool for managing and automating distributed systems. It provides an easy-to-use interface to execute commands on multiple nodes, organize nodes into groups, and automate repetitive tasks across clusters. If you work with large-scale infrastructure, ClusterShell can significantly reduce the complexity of managing and executing commands across many nodes at once.
RacksDB as External Groups
In RacksDB, servers and all other racks equipment can be associated to an arbitrary set of tags. ClusterShell has a notion of group of nodes, ie. a symbolic name referencing a set of nodes. It is easy to leverage on the tags in RacksDB to form groups in ClusterShell (ex: the group compute represents all nodes associated to this tag in RacksDB).
As an example, consider a cluster infrastructure named atlas in RacksDB. You
can create a ClusterShell external group definition file
/etc/clustershell/groups.conf.d/racksdb.conf
with this content:
[racksdb]
map: racksdb nodes --infrastructure atlas --tags $GROUP --list
all: racksdb nodes --infrastructure atlas --list
list: racksdb tags --infrastructure atlas --on-nodes
reverse: racksdb tags --node $NODE
With this simple configuration, ClusterShell can extract groups and nodes from
RacksDB. For example, with nodeset
command:
- Get the list of groups:
$ nodeset -s racksdb -l
@racksdb:admin
@racksdb:compute
@racksdb:login
- Get all nodes in compute group (ie. associated to compute tag in database):
$ nodeset -f @racksdb:compute
atcn[1-2]
- Get all nodes in this cluster:
$ nodeset -f -s racksdb -a
atadmin,atcn[1-2],atlogin
ClusterShell can execute commands on nodes from these groups with clush
:
# clush -bw @racksdb:compute uname
---------------
atcn[1-2] (2)
---------------
Linux
More than one Infrastructure?
If you need to extract groups from multiple infrastructures in RacksDB, a simple
approach if to use ClusterShell
multiple sources sections
feature and $GROUP
variable. For example, in
/etc/clustershell/groups.conf.d/racksdb.conf
:
[trinity,atlas]
map: racksdb nodes --infrastructure $SOURCE --tags $GROUP --list
all: racksdb nodes --infrastructure $SOURCE --list
list: racksdb tags --infrastructure $SOURCE --on-nodes
reverse: racksdb tags --node $NODE
This way, RacksDB commands are executed for multiple infrastructures:
$ nodeset -f @trinity:compute
tricn[1-4]
$ nodeset -f @atlas:compute
atcn[1-2]
Yet more Generic!
Another more generic approach to split the group source with shell in ClusterShell external group definition file:
[racksdb]
map: GRP=$GROUP; racksdb nodes --infrastructure ${GRP%:*} --tags ${GRP#*:} --list
all: racksdb nodes --list
reverse: racksdb tags --node $NODE
This way, infrastructures names are not mentioned in ClusterShell configuration file and all infrastructures defined in RacksDB can be requested:
$ nodeset -f @racksdb:trinity:compute
tricn[1-4]
$ nodeset -f @racksdb:atlas:compute
atcn[1-2]
Conclusion
With this approach, node groups in ClusterShell are always kept synchronized with RacksDB. Modify your infrastructures in RacksDB, all scripts and tools are automatically updated. Never miss a node anymore!
For more detailed explanations, you can refer to RacksDB documentation and ClusterShell documentation.
Need support? Rackslab offers commercial support and professionnal services for RacksDB (eg. training, features development, etc). Please contact us for more details. You can also ask for help from the community.