问题

由于同步节点由dfuseeos本身管理和运行,因此从测试的角度来看,dfuseeos的稳定性会对同步节点产生影响。如何避免这种关联导致的异常退出?

./dfuseeos start
Starting dfuse for EOSIO with config file './dfuse.yaml' 
Launching applications: abicodec,apiproxy,blockmeta,booter,dashboard,dgraphql,eosq,eosws,merger,mindreader,relayer,search-archive,search-forkresolver,search-indexer,search-live,search-router,statedb,tokenmeta,trxdb-loader 
Your instance should be ready in a few seconds, here some relevant links:

  Dashboard:        http://localhost:8081

  Explorer & APIs:  http://localhost:8080
  GraphiQL:         http://localhost:8080/graphiql

instance stopped, attempting restore from source (operator/operator.go:154) {"source": "snapshot", "command": "nodeos --config-dir=./mindreader --data-dir=/home/surou/Documents/Test_Dfuse/eosio/eos/programs/dfuseeos/dfuse-data/mindreader/data --pause-on-startup"}
<4>warn  2021-01-21T02:43:51.432 nodeos    chain_plugin.cpp:1199         plugin_initialize    ] 13 St13runtime_error: "state" database dirty flag set (log_plugin/to_zap_log_plugin.go:107) 
command terminated with non-zero status (superviser/superviser.go:179) {"status": {"Cmd":"nodeos","PID":4049750,"Exit":2,"Error":{"Stderr":null},"StartTs":1611197031417829539,"StopTs":1611197031434658318,"Runtime":0.016828781,"Stdout":null,"Stderr":null}}
<3>error 2021-01-21T02:43:51.433 nodeos    main.cpp:153                  main                 ] database dirty flag set (likely due to unclean shutdown): replay required (log_plugin/to_zap_log_plugin.go:107) 
cannot find latest snapshot, will replay from blocks.log (superviser/snapshot.go:153) 
restarting node from snapshot, the restart will perform the actual snapshot restoration (operator/operator.go:393) 
Received termination signal, quitting 
Waiting for all apps termination... 
app trxdb-loader triggered clean shutdown 

解决方案

第一条建议是mindreader独立于其余堆栈运行。这将大大减少dfuse-eosio的异常退出(由于其他部分)而影响mindreader操作的可能性,这对于node-manager管理nodeos进程的应用程序也是如此。

下一步是通过拍摄快照和自动恢复来定义良好的恢复策略。即使没有为EOSIO设置dfuse,nodeos也存在不干净关机的风险,例如由于内存不足错误,服务器意外重启以及其他原因。

如果您还没有自动快照获取机制,则本部分中的建议是node-manager在侧面独立运行应用程序。它将包含链的数据和状态的另一个同步副本,也可以用于服务Nodeos RPC API。这个程序负责定期拍摄自动快照。

# Storage bucket with path prefix where state snapshots should be done. Ex: gs://example/snapshots
node-manager-snapshot-store-url: <storage location, local path or supported cloud provider bucket>
# Enables restore from the latest snapshot when `nodeos` is unable to start.
node-manager-auto-restore-source: snaphost
#  If non-zero, a snapshot will be taken every {auto-snapshot-modulo} block.
node-manager-auto-snapshot-modulo: 100000 # Decrease for network with heavier traffic to take snapshot more often and shrink time to catch up from latest snapshot to HEAD
# If non-zero, after a successful snapshot, older snapshots will be deleted to only keep that number of recent snapshots
node-manager-number-of-snapshots-to-keep: 5 # Uses 0 to keep them all, useful for eventually regenerating dfuse merged blocks in parallel (not very likely but possible) 

当这些快照存在时,您现在可以将mindreader应用程序配置为使用它们,以在无法启动该nodeos过程(也几乎可以通过快照还原解决)时自动使用它们进行还原,mindreader会在过去启动并赶上来。所需的添加设置为:

# Storage bucket where `node-manager` wrote its snapshot, must be shared with `mindreader` app.
mindreader-snapshot-store-url: <storage location, local path or supported cloud provider bucket>
# Enables restore from the latest snapshot when `nodeos` is unable to start.
mindreader-auto-restore-source: snaphost

一切都可以在同一台计算机上运行,​​并可以启动不同的进程。例如,它也可以被容器化以在Kubernetes中运行。

另一个选择是使用该mindreader-stdin应用程序。此应用程序与mindreader应用程序类似,但不管理nodeos流程。相反,它nodeos通过stdin管道消耗深层数据,调用看起来像nodes -c | dfuseeos start mindreader-stdin <flag or -c config.yaml file>(可能不是确切的调用,如果需要,可以将您链接到文档)。

转载自:https://github.com/dfuse-io/dfuse-eosio/issues/202