LabKey was modified to initiate pipeline jobs on local machine using trigger script machinery. The process was split between two pieces of software:
Processing PCs are kept distinct from the database PCs. LabKey requires such processing PCs to run an equivalent LabKey software, which makes the infrastructure overwhelming. Some thoughts:
analysisModule
formats the call, but sends it to socket rather than executes it directly. Since combining sockets over multiple programming language may be cumbersome, probably best to still start a shell command, but there should be a flag that tells analysisInterface
to start the job remotely.analysisInterface
; this is identical to current system call performed by analysisModule
, except in python, which might enable some shortcuts, ie starting from python. The nice thing about shells is that they run as new threads. However, previous item already has analysisInterface
running socket client, so it might be best for analysisInterface
to use sockets directly.analysisInterface
runs on processing PC and manages the (local) logs and execution. The status on the initiating job is updated via embedded labkey/python interface and no additional sockets are needed. Log file can be transmitted at the end of the job, although running updates might be of interest, which may be handled by analysisInterface
using smart uploading strategies that append rather than transmit full files.analysisInterface
to make the socket itself as transparent as possible. Which begs the question on how could processes initiated by different users be aware of each other. But wait - the user running the socket will be the single user that will execute the code, hence a plain json database is fine. Speaking of databases - it might as well use the originating database, which will have to be modified anyhow also as a queue, eliminating the need for local json or other overhead.analysisInterface
gets a submit job request via socket. It checks back to server whether it has any jobs running. Here we could apply filters that would allow multiple non-interfering jobs to be run simultaneously, but prevent interfering jobs to be started. The python instance that waits in a low budget loop and checks whether its turn has come. To perserve order all jobs issued previously must reach a conclusive state (DONE/FAILED) and no QUEUED job should be ahead in queue. Then the loop completes and shell command is issued, the loop is switched to wait for completion, part of which a potential log update could be. Once job is completed, status should be changed, now critical, since further jobs might await that flag.ss -tulpn | grep LISTEN
iptables -I INPUT -p tcp -s X.X.X.X/32 --dport 8765 -j ACCEPT
iptables -A INPUT -p tcp -s 0.0.0.0/0 --dport 8765 -j DROP
sudo iptables -D INPUT -m conntrack --ctstate INVALID -j DROP
analysisInterface
should hold a mapping of server-configuration maps. Does websockets report caller id? It does and can be used - websocket.remote_address[0]
Processor side:
serviceScripts/env.sh
to change IPSERVER
and IPCLIENT
.labkey/setup.json
for proper paths and venv. Particularly, if softwareSrc
is set in paths.$HOME/software/src/websocket/serviceScripts/start.sh
sudo $HOME/software/src/websocket/serviceScripts/open_port.sh
Client (labkey) side:
pip3 install websockets
and running: $HOME/software/src/websocket/send.py AA.BB.CC.DD:TEST:X
, where AA.BB.CC.DD
is the ip address of the server or its name, if set by DNS.tomcat8
user.Check iptables!