您正在查看: Surou 发布的文章

问题背景

9月12日消息，据zkSync Era区块链浏览器显示，zkSync Era主网疑似出现宕机情况，zkSync Era提交给以太坊的最新批次为#208455，时间为14:14，区块高度已暂停于#13641404，已暂停出块37分钟。

跟进缘由

由于现有部分项目基于zkSync Era，所以需要确认下问题起因是什么，是否存在官方新版修复，目前现有版本会不会同样存在问题

确认问题

先从浏览器数据，确认下，是否存在消息描述问题，以及分析下问题位置（区块浏览器/链节点）

确认下Batch高度时间

Batch高度	区块时间	链接	位置
208455	2023-09-12 14:14	https://explorer.zksync.io/batch/208455	前
208456	2023-09-12 14:14	https://explorer.zksync.io/batch/208456	后
208457	2023-09-12 14:15	https://explorer.zksync.io/batch/208457	后+1

208455 与 208456 相差时间符合预期

确认下Block高度时间

Block高度	Batch高度	Committed时间	链接	位置
13641404	208456	2023-09-12 14:14	https://explorer.zksync.io/block/13641404	前
13641405	208456	2023-09-12 14:14	https://explorer.zksync.io/block/13641405	后
13641406	208456	2023-09-12 14:14	https://explorer.zksync.io/block/13641406	后+1

确认下Batch Commit时间

Batch高度	Commit时间	Commit tx hash	位置
208455	Sep-12-2023 06:15:59 AM +UTC	https://etherscan.io/tx/0x369446bc9d99087aa1160d426b7af372dce91bb7d372724b7c529f2e3ff30ecd	前
208456	Sep-12-2023 06:16:35 AM +UTC	https://etherscan.io/tx/0x902b3b0eee2e82ef048e8de8ec0417d7875c0930b5b0b893de48f8b5b59f8944	后
208457	Sep-12-2023 06:17:59 AM +UTC	https://etherscan.io/tx/0x7ce6d03ead9117a0a7268042c6e19637c702df2382070f04442df75602461661	后+1

分析结果

从节点Batch和Block生成时间，以及Batch Commit，对比消息中的#208455前后时间差，综合来看，链方面数据无宕机，
大概率是当时区块浏览器服务方面，或者连接的某些提供数据RPC节点，出现了区块同步不及时问题。

如何避免

浏览器和提供查询的RPC节点做多个主备灾备, 实时高度状态检查和线路自动切换

显卡驱动与CUDA版本对照

2023-09-14　 Surou　阅读　 zkSync　 0 评论

Driver Version	CUDA Version	docker image
510	11.6	docker pull nvidia/cuda:11.6.2-runtime-ubuntu20.04
525	12.0	docker pull nvidia/cuda:12.0.0-runtime-ubuntu20.04
530	12.1	docker pull nvidia/cuda:12.1.0-runtime-ubuntu20.04
535	12.2	docker pull nvidia/cuda:12.2.0-runtime-ubuntu20.04

所在服务器安装对应的显卡驱动（一般都有了），需要使用对应的 cuda image与其对应

对外prover docker统一使用ubuntu 20.04

base:从 CUDA 9.0 开始，包含部署预构建 CUDA 应用程序的最低限度（libcudart）。如果您想手动选择要安装的 CUDA 软件包，请使用此映像。
runtime:通过添加 CUDA 工具包中的所有共享库来扩展基础映像。如果您有使用多个 CUDA 库的预构建应用程序，请使用此映像。
devel:通过添加编译器工具链、调试工具、标头和静态库来扩展运行时映像。使用此映像从源代码编译 CUDA 应用程序

数据来源：https://hub.docker.com/r/nvidia/cuda/tags

问题描述

当prover异常，或者某些原因导致各个服务不稳定时，prover_jobs 卡在in_gpu_proof 状态，跳过的证明无法自恢复进行补全。

测试数据

主节点

zksync_local=# select l1_batch_number, status, created_at from prover_jobs order by created_at desc;
 l1_batch_number |    status    |         created_at         
-----------------+--------------+----------------------------
            1147 | successful   | 2023-09-06 05:58:18.199706
            1146 | successful   | 2023-09-06 02:57:31.196466
            1145 | successful   | 2023-09-06 02:19:07.253418
            1144 | successful   | 2023-09-06 01:46:06.18188
            1143 | in_gpu_proof | 2023-09-05 11:27:41.230814
            1142 | in_gpu_proof | 2023-09-05 11:27:37.413038
            1141 | in_gpu_proof | 2023-09-05 11:27:31.75103
            1140 | in_gpu_proof | 2023-09-05 11:27:28.767585
            1139 | in_gpu_proof | 2023-09-05 11:27:23.971785

扩展节点

zksync_local=# select l1_batch_number, status, created_at from prover_jobs order by created_at desc;
 l1_batch_number |    status    |         created_at         
-----------------+--------------+----------------------------
            1147 | in_gpu_proof | 2023-09-07 10:59:47.841166
            1146 | in_gpu_proof | 2023-09-07 10:59:46.586542
            1145 | in_gpu_proof | 2023-09-07 10:59:45.218178
            1144 | in_gpu_proof | 2023-09-07 10:59:43.862459
            1143 | in_gpu_proof | 2023-09-07 10:59:42.610455
            1142 | in_gpu_proof | 2023-09-07 10:59:41.344974
            1141 | in_gpu_proof | 2023-09-07 10:59:39.956309
            1140 | in_gpu_proof | 2023-09-07 10:59:38.589833
            1139 | in_gpu_proof | 2023-09-07 10:59:37.219

跟进代码

先确认下当前zkSync是否有自恢复的逻辑

查看数据操作core/lib/dal/src/prover_dal.rs，找到自恢复数据库操作requeue_stuck_jobs

找到具体执行服务模块housekeeper，并以默认加载

#[arg(
     long,
     default_value = "api,tree,eth,data_fetcher,state_keeper,witness_generator,housekeeper"
 )]
 components: ComponentsToRun,

2023-09-08T12:43:43.561336Z  INFO zksync_core: Starting the components: [HttpApi, WsApi, ExplorerApi, Tree, EthWatcher, EthTxAggregator, EthTxManager, DataFetcher, StateKeeper, WitnessGenerator(None, BasicCircuits), WitnessGenerator(None, LeafAggregation), WitnessGenerator(None, NodeAggregation), WitnessGenerator(None, Scheduler), Housekeeper]

housekeeper初始化

2023-09-08T12:43:47.244008Z  INFO zksync_core::house_keeper::periodic_job: Starting periodic job: GcsBlobCleaner with frequency: 60000 ms
2023-09-08T12:43:47.244008Z  INFO zksync_core::house_keeper::periodic_job: Starting periodic job: WitnessGeneratorStatsReporter with frequency: 10000 ms
2023-09-08T12:43:47.244016Z  INFO zksync_core::house_keeper::periodic_job: Starting periodic job: GpuProverQueueMonitor with frequency: 10000 ms
2023-09-08T12:43:47.244037Z  INFO zksync_core::house_keeper::periodic_job: Starting periodic job: ProverStatsReporter with frequency: 5000 ms
2023-09-08T12:43:47.244048Z  INFO zksync_core::house_keeper::periodic_job: Starting periodic job: WaitingToQueuedWitnessJobMover with frequency: 30000 ms
2023-09-08T12:43:47.244050Z  INFO zksync_core::house_keeper::periodic_job: Starting periodic job: ProverJobRetryManager with frequency: 300000 ms
2023-09-08T12:43:47.244030Z  INFO zksync_core::house_keeper::periodic_job: Starting periodic job: L1BatchMetricsReporter with frequency: 10000 ms
2023-09-08T12:43:47.244171Z  INFO zksync_core::house_keeper::periodic_job: Starting periodic job: FriProverJobRetryManager with frequency: 30000 ms
2023-09-08T12:43:47.244281Z  INFO zksync_core::house_keeper::periodic_job: Starting periodic job: WaitingToQueuedFriWitnessJobMover with frequency: 40000 ms
2023-09-08T12:43:47.244293Z  INFO zksync_core::house_keeper::periodic_job: Starting periodic job: SchedulerCircuitQueuer with frequency: 40000 ms
2023-09-08T12:43:47.244294Z  INFO zksync_core::house_keeper::periodic_job: Starting periodic job: WitnessGeneratorStatsReporter with frequency: 10000 ms
2023-09-08T12:43:47.244300Z  INFO zksync_core::house_keeper::periodic_job: Starting periodic job: FriWitnessGeneratorJobRetryManager with frequency: 30000 ms
2023-09-08T12:43:47.244296Z  INFO zksync_core::house_keeper::periodic_job: Starting periodic job: FriProverStatsReporter with frequency: 30000 ms

此时zksync-server日志循环等待新证明
2023-09-08T12:43:48.092403Z  INFO zksync_core::eth_sender::opside_send_plug::manage_proof_plug: submit_pending_proofs finish
2023-09-08T12:43:48.092417Z  INFO zksync_core::eth_sender::opside_send_plug::manage_proof_plug: manage proof start try_fetch_proof_to_send
2023-09-08T12:43:48.092433Z  INFO zksync_core::eth_sender::opside_send_plug::manage_proof_plug: manage proof: process_resend receiver for start signal
2023-09-08T12:43:48.445424Z  INFO zksync_core::eth_sender::opside_send_plug::manage_proof_plug: try_fetch_proof_to_send: check dest block:681 commit
2023-09-08T12:44:08.927334Z  INFO zksync_core::eth_sender::opside_send_plug::manage_proof_plug: sequence_batch_info.block_number == 0 at batch:681

判断housekeeper检查服务是否正常被执行

async fn run_routine_task(&mut self) {
 let stuck_jobs = self
     .prover_connection_pool
     .access_storage()
     .await
     .prover_dal()
     .requeue_stuck_jobs(self.processing_timeout, self.max_attempts)
     .await;
 let job_len = stuck_jobs.len();
 for stuck_job in stuck_jobs {
     vlog::info!("re-queuing prover job {:?}", stuck_job);
 }
 // vlog::info!("server.prover.requeued_jobs{}", job_len as u64);
 // vlog::info!("server.prover.self.processing_timeout{}", self.processing_timeout.as_secs() as u64);
 // vlog::info!("server.prover.self.max_attempts{}", self.max_attempts as u64);
 metrics::counter!("server.prover.requeued_jobs", job_len as u64);
}

self.processing_timeout ： 2700s （45分钟）
self.max_attempts： 1

经过测试，run_routine_task被正常定期执行，但是没有执行任务

2023-09-08T13:13:17.988849Z  INFO zksync_core::house_keeper::prover_job_retry_manager: server.prover.requeued_jobs0
2023-09-08T13:13:17.988864Z  INFO zksync_core::house_keeper::prover_job_retry_manager: server.prover.self.processing_timeout2700
2023-09-08T13:13:17.988873Z  INFO zksync_core::house_keeper::prover_job_retry_manager: server.prover.self.max_attempts1

查看run_routine_task具体数据库操作

UPDATE prover_jobs
SET status = 'queued', attempts = attempts + 1, updated_at = now(), processing_started_at = now()
WHERE (status = 'in_progress' AND  processing_started_at <= now() - $1::interval AND attempts < $2)
OR (status = 'in_gpu_proof' AND  processing_started_at <= now() - $1::interval AND attempts < $2)
OR (status = 'failed' AND attempts < $2)
RETURNING id, status, attempts

对照681数据

总结分析

对于zkSync本身存在自恢复逻辑housekeeper->run->run_routine_task->requeue_stuck_jobs
然后根据self.processing_timeout, self.max_attempts两个配置参数，进行数据库查询

self.processing_timeout ： 2700s （45分钟）
self.max_attempts： 1

UPDATE prover_jobs
SET status = 'queued', attempts = attempts + 1, updated_at = now(), processing_started_at = now()
WHERE (status = 'in_progress' AND  processing_started_at <= now() - $1::interval AND attempts < $2)
OR (status = 'in_gpu_proof' AND  processing_started_at <= now() - $1::interval AND attempts < $2)
OR (status = 'failed' AND attempts < $2)
RETURNING id, status, attempts

初始任务状态为 queued,当任务成功下发到prover时，状态变更为in_gpu_proof，如果当前任务没有在设置时间内processing_timeout完成，则将当前状态改回 queued，重新下发prover(可能是其他prover)
注意：当状态变更为in_gpu_proof时，attempts 会加1
问题：由于默认max_attempts等于1，所以无法进行run_routine_task
解决：增大max_attempts，5
同时需要注意processing_timeout，应大于支持最低显卡类型，单显卡，运行的时间，否则可能导致算力浪费。

PROVER_NON_GPU_MAX_ATTEMPTS=5
PROVER_NON_GPU_GENERATION_TIMEOUT_IN_SECS=3600

方案验证

修改环境变量后

2023-09-08T14:00:44.990770Z  INFO zksync_core::house_keeper::prover_job_retry_manager: re-queuing prover job StuckProverJobs { id: 1149, status: "queued", attempts: 2 }

相应in_gpu_proof任务状态更新为queued
链接的prover开始proof生成

zkSync block revert

2023-09-14　 Surou　阅读　 zkSync　 0 评论

异常场景

注意：无法回滚已执行的高度，否则报Attempt to revert already executed blocks
此时，强制停止 prover，并发送交易，制造落后场景

查询最新数据

zk f cargo run --bin block_reverter --release -- print-suggested-values --json

返回数据中包含几个信息

committed 最新高度
verified 已验证高度
executed 本地已执行高度
last_executed_l1_batch_number：推荐回滚高度
nonce: 执行交易nonce

priority_fee: 推荐fee

2023-09-13T10:25:08.901399Z  INFO zksync_core::block_reverter: Last L1 batch numbers on contract: committed 1171, verified 1170, executed 1170
{"last_executed_l1_batch_number":1170,"nonce":3867,"priority_fee":1000000000}

此时链上最新高度为committed：1171

测试回滚

先停止下zksync_server/zksync_external_node

合约回滚

1171回滚到1170

cd $ZKSYNC_HOME && zk f cargo run --bin block_reverter --release -- send-eth-transaction --l1-batch-number 1170 --nonce 3871  --priority-fee-per-gas 1000000000

此时会向合约发送revertBlocks交易，无需手动执行合约
此时再次查询

zk f cargo run --bin block_reverter --release -- print-suggested-values --json

此时链上最新高度为committed：1170

本地数据库回滚

注意：需要先停止下zksync_server/zksync_external_node, 否则报 ./db/main/tree/LOCK

cd $ZKSYNC_HOME && zk f cargo run --bin block_reverter --release -- rollback-db --l1-batch-number 1170 --rollback-postgres --rollback-tree --rollback-sk-cache

附加

core/bin/zksync_core/src/bin/block_reverter.rs
Command

print-suggested-values
显示要使用的建议值。
send-eth-transaction
将恢复事务发送到 L1。
rollback-db
将内部数据库状态恢复到前一个块。
clear-failed-transactions
清除失败的 L1 事务。

测试用例

core/tests/revert-test/tests/revert-and-restart.test.ts
const executedProcess = await utils.exec(
    'cd $ZKSYNC_HOME && ' +
        'RUST_LOG=off cargo run --bin block_reverter --release -- print-suggested-values --json'
    // ^ Switch off logs to not pollute the output JSON
);
const suggestedValuesOutput = executedProcess.stdout;
const { lastL1BatchNumber, nonce, priorityFee } = parseSuggestedValues(suggestedValuesOutput);
expect(lastL1BatchNumber < blocksCommittedBeforeRevert, 'There should be at least one block for revert').to.be
    .true;

console.log(
    Reverting with parameters: last unreverted L1 batch number: ${lastL1BatchNumber}, nonce: ${nonce}, priorityFee: ${priorityFee}
);

console.log('Sending ETH transaction..');
await utils.spawn(
    cd $ZKSYNC_HOME && cargo run --bin block_reverter --release -- send-eth-transaction --l1-batch-number ${lastL1BatchNumber} --nonce ${nonce} --priority-fee-per-gas ${priorityFee}
);

console.log('Rolling back DB..');
await utils.spawn(
    cd $ZKSYNC_HOME && cargo run --bin block_reverter --release -- rollback-db --l1-batch-number ${lastL1BatchNumber} --rollback-postgres --rollback-tree --rollback-sk-cache
);

let blocksCommitted = await mainContract.getTotalBlocksCommitted();
expect(blocksCommitted.eq(lastL1BatchNumber), 'Revert on contract was unsuccessful').to.be.true;

zkSync revert block后，扩展节点启动报错

2023-09-14　 Surou　阅读　 zkSync　 0 评论

问题背景

前面测试了zkSync的revert block, 然后在别的同步的节点，同步时报以下错误

2023-09-14T12:46:52.271426230Z 2023-09-14T12:46:52.271274Z  INFO zksync_core::sync_layer::fetcher: New batch: 3205. Timestamp: 1694603829
2023-09-14T12:46:52.271447030Z 2023-09-14T12:46:52.271296Z  INFO zksync_core::sync_layer::fetcher: New miniblock: 6891 / 8176
2023-09-14T12:46:52.385460894Z 2023-09-14T12:46:52.385324Z  INFO zksync_core::consistency_checker: Batch 3195 is consistent with L1
2023-09-14T12:46:52.408615713Z 2023-09-14T12:46:52.408488Z  INFO zksync_core::consistency_checker: Checking commit tx 0xa07f…353d for batch 3196
2023-09-14T12:46:52.526929804Z 2023-09-14T12:46:52.526783Z  INFO zksync_core::sync_layer::fetcher: New miniblock: 6892 / 8176
2023-09-14T12:46:52.726526535Z thread 'tokio-runtime-worker' panicked at 'index out of bounds: the len is 1 but the index is 1', /usr/src/zksync/core/bin/zksync_core/src/consistency_checker/mod.rs:111:27
2023-09-14T12:46:52.726566126Z stack backtrace:
2023-09-14T12:46:52.744782978Z    0:     0x5637f01373ea - std::backtrace_rs::backtrace::libunwind::trace::h79937bc171ada62c
2023-09-14T12:46:52.744804819Z                                at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
2023-09-14T12:46:52.744811549Z    1:     0x5637f01373ea - std::backtrace_rs::backtrace::trace_unsynchronized::h2292bca8571cb919
2023-09-14T12:46:52.744816889Z                                at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
2023-09-14T12:46:52.744822060Z    2:     0x5637f01373ea - std::sys_common::backtrace::_print_fmt::h9c461f248e4ae90d
2023-09-14T12:46:52.744827090Z                                at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/sys_common/backtrace.rs:65:5
2023-09-14T12:46:52.744832010Z    3:     0x5637f01373ea - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::he9fe6bf1a39182e1
2023-09-14T12:46:52.744837660Z                                at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/sys_common/backtrace.rs:44:22
2023-09-14T12:46:52.746411727Z    4:     0x5637f015caee - core::fmt::write::h032658c119c720d7
2023-09-14T12:46:52.746431598Z                                at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/core/src/fmt/mod.rs:1208:17
2023-09-14T12:46:52.746438008Z    5:     0x5637f0131875 - std::io::Write::write_fmt::h299fc90dfae41c0d
2023-09-14T12:46:52.746443078Z                                at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/io/mod.rs:1682:15
2023-09-14T12:46:52.746448138Z    6:     0x5637f01371b5 - std::sys_common::backtrace::_print::heb70d25df9937e3f
2023-09-14T12:46:52.746453018Z                                at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/sys_common/backtrace.rs:47:5
2023-09-14T12:46:52.746473369Z    7:     0x5637f01371b5 - std::sys_common::backtrace::print::had745c0a76b8b521
2023-09-14T12:46:52.746492989Z                                at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/sys_common/backtrace.rs:34:9
2023-09-14T12:46:52.746513250Z    8:     0x5637f0138e0f - std::panicking::default_hook::{{closure}}::h1ea782cdfa2fd097
2023-09-14T12:46:52.746530751Z                                at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/panicking.rs:267:22
2023-09-14T12:46:52.746567952Z    9:     0x5637f0138b4b - std::panicking::default_hook::h1cc3af63455a163c
2023-09-14T12:46:52.746574652Z                                at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/panicking.rs:286:9
2023-09-14T12:46:52.746619113Z   10:     0x5637f013951c - std::panicking::rust_panic_with_hook::h5cafdc4b3bfd5528
2023-09-14T12:46:52.746638164Z                                at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/panicking.rs:688:13
2023-09-14T12:46:52.746644624Z   11:     0x5637f01392b9 - std::panicking::begin_panic_handler::{{closure}}::hf31c60f40775892c
2023-09-14T12:46:52.746649704Z                                at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/panicking.rs:579:13
2023-09-14T12:46:52.746690595Z   12:     0x5637f013789c - std::sys_common::backtrace::__rust_end_short_backtrace::h28a5c7be595826cd
2023-09-14T12:46:52.746696666Z                                at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/sys_common/backtrace.rs:137:18
2023-09-14T12:46:52.746702006Z   13:     0x5637f0138fc2 - rust_begin_unwind
2023-09-14T12:46:52.746712406Z                                at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/panicking.rs:575:5
2023-09-14T12:46:52.746735127Z   14:     0x5637ee6f3ba3 - core::panicking::panic_fmt::h8fa27a0b37dd98b7
2023-09-14T12:46:52.746740467Z                                at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/core/src/panicking.rs:64:14
2023-09-14T12:46:52.746760237Z   15:     0x5637ee6f3cf2 - core::panicking::panic_bounds_check::hd27fa6e100ea4568
2023-09-14T12:46:52.746792558Z                                at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/core/src/panicking.rs:147:5
2023-09-14T12:46:52.746802909Z   16:     0x5637eead1f9a - zksync_core::consistency_checker::ConsistencyChecker::run::{{closure}}::h449bcdaefdc936fd
2023-09-14T12:46:52.746839150Z   17:     0x5637eeb35af5 - tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut::h91ab53e454932f64
2023-09-14T12:46:52.746864871Z   18:     0x5637eebe0dbd - tokio::runtime::task::core::Core<T,S>::poll::h795cb0b9385c1329
2023-09-14T12:46:52.746885391Z   19:     0x5637eeb4df5e - tokio::runtime::task::harness::Harness<T,S>::poll::hc2ab6dba6d5ba4ae
2023-09-14T12:46:52.746909382Z   20:     0x5637f00eefee - tokio::runtime::scheduler::multi_thread::worker::Context::run_task::h6fb307488dc375ee
2023-09-14T12:46:52.746937683Z   21:     0x5637f00ee143 - tokio::runtime::scheduler::multi_thread::worker::Context::run::h175658cfae89590d
2023-09-14T12:46:52.746969794Z   22:     0x5637f00da9b9 - tokio::macros::scoped_tls::ScopedKey<T>::set::h2b771be14cbef94d
2023-09-14T12:46:52.746992634Z   23:     0x5637f00ede49 - tokio::runtime::scheduler::multi_thread::worker::run::ha39c9ec4dce0c89d
2023-09-14T12:46:52.747062136Z   24:     0x5637f00f8158 - tokio::runtime::task::core::Core<T,S>::poll::h6feb9b8dd5ca027e
2023-09-14T12:46:52.747090037Z   25:     0x5637f00ce6ff - tokio::runtime::task::harness::Harness<T,S>::poll::hb45689cdecd9b901
2023-09-14T12:46:52.747152079Z   26:     0x5637f00dc338 - tokio::runtime::blocking::pool::Inner::run::hc09fcd48a7633fbf
2023-09-14T12:46:52.747159409Z   27:     0x5637f00dd82a - std::sys_common::backtrace::__rust_begin_short_backtrace::hf8e4ebb56e2acb86
2023-09-14T12:46:52.747167950Z   28:     0x5637f00f050b - core::ops::function::FnOnce::call_once{{vtable.shim}}::h4834ddb4b804c368
2023-09-14T12:46:52.747235692Z   29:     0x5637f013c093 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::hb77d8d72ebcf79c4
2023-09-14T12:46:52.747246432Z                                at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/alloc/src/boxed.rs:2000:9
2023-09-14T12:46:52.747251792Z   30:     0x5637f013c093 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::hc08c3353e1568487
2023-09-14T12:46:52.747270633Z                                at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/alloc/src/boxed.rs:2000:9
2023-09-14T12:46:52.747282563Z   31:     0x5637f013c093 - std::sys::unix::thread::Thread::new::thread_start::h7168e596cd5e5ce6
2023-09-14T12:46:52.747294573Z                                at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/sys/unix/thread.rs:108:17
2023-09-14T12:46:52.747489009Z   32:     0x7f5cddf78fa3 - start_thread
2023-09-14T12:46:52.747671095Z   33:     0x7f5cddd2006f - clone
2023-09-14T12:46:52.747677365Z   34:                0x0 - <unknown>
2023-09-14T12:46:52.764575588Z 2023-09-14T12:46:52.764414Z  INFO zksync_core::sync_layer::fetcher: New batch: 3206. Timestamp: 1694603945
2023-09-14T12:46:52.764596038Z 2023-09-14T12:46:52.764439Z  INFO zksync_core::sync_layer::fetcher: New miniblock: 6893 / 8176
2023-09-14T12:46:53.009199118Z 2023-09-14T12:46:53.009043Z  INFO zksync_core::sync_layer::fetcher: New miniblock: 6894 / 8176
2023-09-14T12:46:53.066490673Z 2023-09-14T12:46:53.066347Z  INFO zksync_core::sync_layer::batch_status_updater: Batch 3202: committed
2023-09-14T12:46:53.066924736Z 2023-09-14T12:46:53.066789Z  INFO zksync_core::sync_layer::batch_status_updater: Commit status change: number 3202, hash 0xa07f…353d, happened at 2023-09-07 10:46:55.753857 UTC
2023-09-14T12:46:53.109595076Z 2023-09-14T12:46:53.109454Z  INFO zksync_core::sync_layer::external_io: Sealing the batch
2023-09-14T12:46:53.109620647Z 2023-09-14T12:46:53.109471Z DEBUG zksync_core::state_keeper::keeper: L1 batch #3203 should be sealed unconditionally as per sealing rules
2023-09-14T12:46:53.113694988Z 2023-09-14T12:46:53.113561Z  INFO zksync_core::state_keeper::io::seal_logic: Sealing miniblock 6888 (L1 batch 3203) with 0 (0 L2 + 0 L1) txs, 1 events, 13 reads, 4 writes
2023-09-14T12:46:53.157928775Z 2023-09-14T12:46:53.157781Z DEBUG zksync_core::state_keeper::io::seal_logic: miniblock execution stage insert_storage_logs took 42.366601ms with count Some(4)
2023-09-14T12:46:53.161347507Z 2023-09-14T12:46:53.161215Z DEBUG zksync_core::state_keeper::io::seal_logic: sealed miniblock 6888 in 47.653779ms
2023-09-14T12:46:53.161386708Z 2023-09-14T12:46:53.161237Z DEBUG zksync_core::state_keeper::io::seal_logic: L1 batch execution stage fictive_miniblock took 47.68931ms with count None
2023-09-14T12:46:53.161400299Z 2023-09-14T12:46:53.161341Z  INFO zksync_core::state_keeper::io::seal_logic: Sealing L1 batch 3203 with 1 (1 L2 + 0 L1) txs, 1 l2_l1_logs, 4 events, 90 reads (21 deduped), 18 writes (5 deduped)
2023-09-14T12:46:53.209996525Z 2023-09-14T12:46:53.209850Z DEBUG zksync_core::state_keeper::io::seal_logic: L1 batch execution stage insert_protective_reads took 44.407542ms with count Some(21)
2023-09-14T12:46:53.211452098Z 2023-09-14T12:46:53.211323Z DEBUG zksync_core::state_keeper::io::seal_logic: sealed l1 batch 3203 in 98.10967ms
2023-09-14T12:46:53.211476579Z 2023-09-14T12:46:53.211362Z  INFO zksync_core::sync_layer::external_io: Batch 3203 is sealed
2023-09-14T12:46:53.211484819Z 2023-09-14T12:46:53.211377Z DEBUG zksync_core::sync_layer::external_io: Waiting for the new batch params
2023-09-14T12:46:53.211490459Z 2023-09-14T12:46:53.211385Z  INFO zksync_core::sync_layer::external_io: Getting previous L1 batch hash
2023-09-14T12:46:53.240026509Z 2023-09-14T12:46:53.239805Z  INFO zksync_core::sync_layer::fetcher: New batch: 3207. Timestamp: 1694604066
2023-09-14T12:46:53.240053310Z 2023-09-14T12:46:53.239824Z  INFO zksync_core::sync_layer::fetcher: New miniblock: 6895 / 8176
2023-09-14T12:46:53.263598861Z 2023-09-14T12:46:53.263459Z  INFO zksync_core::metadata_calculator::updater: Loading blocks with numbers 3203..=3203 to update Merkle tree
2023-09-14T12:46:53.267278250Z 2023-09-14T12:46:53.267144Z  INFO zksync_core::metadata_calculator::updater: Processing L1 batches #3203..=3203 with 26 total logs
2023-09-14T12:46:53.267494317Z 2023-09-14T12:46:53.267361Z  INFO zksync_merkle_tree::domain: Extending Merkle tree with batch #3203 with 26 ops in full mode
2023-09-14T12:46:53.273107584Z 2023-09-14T12:46:53.272966Z  INFO zksync_merkle_tree::domain: Processed batch #3203; root hash is 0x7f78…d4a0, 9078 leaves in total, 1 initial writes, 4 repeated writes
2023-09-14T12:46:53.275036721Z 2023-09-14T12:46:53.274902Z  INFO zksync_core::metadata_calculator::updater: Saved witnesses for L1 batch #3203 to object storage at `merkel_tree_paths_3203.bin`
2023-09-14T12:46:53.276841885Z 2023-09-14T12:46:53.276649Z  INFO zksync_core::metadata_calculator::updater: Updated metadata for L1 batch #3203 in Postgres
2023-09-14T12:46:53.276882016Z 2023-09-14T12:46:53.276733Z  INFO zksync_merkle_tree::domain: Flushing L1 batches #[3203] to RocksDB
2023-09-14T12:46:53.277228036Z 2023-09-14T12:46:53.277107Z  INFO zksync_core::metadata_calculator::metrics: L1 batches #3203..=3203 processed in tree
2023-09-14T12:46:53.314666571Z 2023-09-14T12:46:53.314511Z  INFO zksync_core::sync_layer::external_io: Previous L1 batch hash: 57655874197922256061346220318188548076559386516802029495849130083054353765536
2023-09-14T12:46:53.340176950Z 2023-09-14T12:46:53.340044Z DEBUG zksync_state::rocksdb: loading storage for l1 batch number 3203
2023-09-14T12:46:53.340206721Z 2023-09-14T12:46:53.340091Z DEBUG zksync_state::rocksdb: loading state changes for l1 batch 3203
2023-09-14T12:46:53.341542481Z 2023-09-14T12:46:53.341421Z DEBUG zksync_state::rocksdb: loading factory deps for l1 batch 3203
2023-09-14T12:46:53.342740826Z 2023-09-14T12:46:53.342505Z  INFO zksync_core::state_keeper::batch_executor: Secondary storage for batch 3204 initialized, size is 9101
2023-09-14T12:46:53.342764707Z 2023-09-14T12:46:53.342543Z DEBUG zksync_core::sync_layer::external_io: Waiting for the new tx, next action is Some(Tx(Transaction(0x5f4f2a2fb69447e75db65b222e50f30a0b439c882223ce6e890a8cf5c46d55ff)))
2023-09-14T12:46:53.342772307Z 2023-09-14T12:46:53.342578Z  INFO zksync_core::state_keeper::batch_executor: Starting executing batch #3204
2023-09-14T12:46:53.352470206Z 2023-09-14T12:46:53.352342Z  INFO zksync_core::sync_layer::external_io: Sealing miniblock
2023-09-14T12:46:53.352495827Z 2023-09-14T12:46:53.352360Z DEBUG zksync_core::state_keeper::keeper: Miniblock #6889 (L1 batch #3204) should be sealed as per sealing rules
2023-09-14T12:46:53.355504267Z 2023-09-14T12:46:53.355373Z  INFO zksync_core::state_keeper::io::seal_logic: Sealing miniblock 6889 (L1 batch 3204) with 1 (1 L2 + 0 L1) txs, 3 events, 77 reads, 14 writes
2023-09-14T12:46:53.401962459Z 2023-09-14T12:46:53.401823Z DEBUG zksync_core::state_keeper::io::seal_logic: miniblock execution stage insert_storage_logs took 42.33107ms with count Some(14)
2023-09-14T12:46:53.405522575Z 2023-09-14T12:46:53.405396Z DEBUG zksync_core::state_keeper::io::seal_logic: sealed miniblock 6889 in 50.024698ms
2023-09-14T12:46:53.406345130Z 2023-09-14T12:46:53.406143Z  INFO zksync_core::sync_layer::external_io: Miniblock 6890 is sealed
2023-09-14T12:46:53.406393921Z 2023-09-14T12:46:53.406163Z DEBUG zksync_core::state_keeper::keeper: Initialized new miniblock #6890 (L1 batch #3204) with timestamp 2023-09-13 11:16:06 UTC
2023-09-14T12:46:53.406401451Z 2023-09-14T12:46:53.406180Z DEBUG zksync_core::sync_layer::external_io: Waiting for the new tx, next action is Some(SealBatch)
2023-09-14T12:46:53.472626763Z 2023-09-14T12:46:53.472431Z  INFO zksync_core::sync_layer::fetcher: New miniblock: 6896 / 8176
2023-09-14T12:46:53.708788983Z 2023-09-14T12:46:53.708623Z  INFO zksync_core::sync_layer::fetcher: New batch: 3208. Timestamp: 1694604126
2023-09-14T12:46:53.708815224Z 2023-09-14T12:46:53.708642Z  INFO zksync_core::sync_layer::fetcher: New miniblock: 6897 / 8176
2023-09-14T12:46:53.903700525Z 2023-09-14T12:46:53.903557Z  INFO zksync_external_node: initialized ETH-TxManager in 4.725550326s
2023-09-14T12:46:53.913371443Z 2023-09-14T12:46:53.913245Z  INFO zksync_external_node: initializing Opside send plug
2023-09-14T12:46:53.936266844Z 2023-09-14T12:46:53.936049Z  INFO zksync_external_node: initialized Opside send plug4.758046524s
2023-09-14T12:46:53.936291305Z 2023-09-14T12:46:53.936056Z  INFO zksync_external_node: initializing Opside manage plug
2023-09-14T12:46:53.936298395Z 2023-09-14T12:46:53.936070Z  INFO zksync_core::eth_sender::opside_send_plug::send_proof_plug: start handle_caches_event loop
2023-09-14T12:46:53.936304385Z 2023-09-14T12:46:53.936109Z  INFO zksync_core::eth_sender::opside_send_plug::send_proof_plug: start handle_monit_tx_event_loop loop
2023-09-14T12:46:53.936309966Z 2023-09-14T12:46:53.936112Z  INFO zksync_core::eth_sender::opside_send_plug::send_proof_plug: handle_chans_event: start handle_chans_event loop
2023-09-14T12:46:53.943476709Z 2023-09-14T12:46:53.943336Z  INFO zksync_core::eth_sender::opside_send_plug::manage_proof_plug: manage proof start try_fetch_proof_to_send
2023-09-14T12:46:53.943501999Z 2023-09-14T12:46:53.943373Z  INFO zksync_core::eth_sender::opside_send_plug::manage_proof_plug: manage proof: process_resend wait for start signal
2023-09-14T12:46:53.943512100Z 2023-09-14T12:46:53.943444Z  INFO zksync_core::eth_sender::opside_send_plug::send_proof_plug: start into handle history proof txs
2023-09-14T12:46:53.944207140Z 2023-09-14T12:46:53.943940Z  INFO zksync_utils::wait_for_tasks: One of the tokio actors unexpectedly finished with error: index out of bounds: the len is 1 but the index is 1
2023-09-14T12:46:53.944242241Z 2023-09-14T12:46:53.943968Z  INFO zksync_storage::db: Waiting for all the RocksDB instances to be dropped, 2 remaining
2023-09-14T12:46:53.944249652Z 2023-09-14T12:46:53.943986Z  INFO zksync_core::metadata_calculator::updater: Stop signal received, metadata_calculator is shutting down
2023-09-14T12:46:53.944290583Z 2023-09-14T12:46:53.944210Z  INFO zksync_core::api_server::web3: Stop signal received, web3 HTTP JSON RPC API is shutting down
2023-09-14T12:46:53.944302103Z 2023-09-14T12:46:53.944224Z  INFO zksync_core::api_server::web3: Stop signal received, WS JSON RPC API is shutting down
2023-09-14T12:46:53.945073626Z 2023-09-14T12:46:53.944947Z  INFO zksync_storage::db: Waiting for all the RocksDB instances to be dropped, 1 remaining
2023-09-14T12:46:53.945101567Z 2023-09-14T12:46:53.945013Z  INFO zksync_core::sync_layer::fetcher: New miniblock: 6898 / 8176
2023-09-14T12:46:53.945111027Z 2023-09-14T12:46:53.945033Z  INFO zksync_core::sync_layer::fetcher: Stop signal received, exiting the fetcher routine
2023-09-14T12:46:53.949720505Z 2023-09-14T12:46:53.949609Z  INFO zksync_core::eth_sender::opside_send_plug::send_proof_plug: finish handle history proof txs
2023-09-14T12:46:53.949748195Z 2023-09-14T12:46:53.949638Z  INFO zksync_core::eth_sender::opside_send_plug::send_proof_plug: Stop signal received, eth_tx_aggregator is shutting down
2023-09-14T12:46:54.135835514Z 2023-09-14T12:46:54.135694Z  INFO zksync_core::api_server::web3::pubsub_notifier: Stop signal received, pubsub_tx_notifier is shutting down
2023-09-14T12:46:54.136480683Z 2023-09-14T12:46:54.136390Z  INFO zksync_core::api_server::web3::pubsub_notifier: Stop signal received, pubsub_block_notifier is shutting down
2023-09-14T12:46:54.141903295Z 2023-09-14T12:46:54.141774Z  INFO zksync_core::api_server::web3::pubsub_notifier: Stop signal received, pubsub_logs_notifier is shutting down
2023-09-14T12:46:54.218539427Z 2023-09-14T12:46:54.218338Z  INFO zksync_core::eth_sender::eth_tx_manager: Stop signal received, eth_tx_manager is shutting down
2023-09-14T12:46:54.358294949Z 2023-09-14T12:46:54.358130Z  INFO zksync_core::l1_gas_price::main_node_fetcher: Stop signal received, MainNodeGasPriceFetcher is shutting down
2023-09-14T12:46:54.417717566Z 2023-09-14T12:46:54.417570Z  INFO zksync_core::state_keeper::keeper: Stop signal received, state keeper is shutting down
2023-09-14T12:46:54.417754358Z 2023-09-14T12:46:54.417607Z  INFO zksync_core::state_keeper::batch_executor: State keeper exited with an unfinished batch
2023-09-14T12:46:54.420950213Z 2023-09-14T12:46:54.420815Z  INFO zksync_storage::db: All the RocksDB instances are dropped
2023-09-14T12:46:55.500606305Z 2023-09-14T12:46:55.500432Z  INFO zksync_core::eth_sender::opside_send_plug::manage_proof_plug: get proofBatchNumFinal == 0 &&  proofHashBatchNumFinal == 0 not do commit pending proof
2023-09-14T12:46:55.500631526Z 2023-09-14T12:46:55.500460Z  INFO zksync_core::eth_sender::opside_send_plug::manage_proof_plug: manage proof start try_fetch_proof_to_

关键错误信息

2023-09-14T12:46:52.726526535Z thread 'tokio-runtime-worker' panicked at 'index out of bounds: the len is 1 but the index is 1', /usr/src/zksync/core/bin/zksync_core/src/consistency_checker/mod.rs:111:27

归档

March 2025 February 2025 January 2025 December 2024 November 2024 October 2024 September 2024 August 2024 July 2024 June 2024 May 2024 April 2024 March 2024 January 2024 December 2023 November 2023 October 2023 September 2023 August 2023 July 2023 June 2023 April 2023 March 2023 February 2023 January 2023 December 2022 November 2022 October 2022 August 2022 July 2022 June 2022 May 2022 March 2022 February 2022 January 2022 December 2021 November 2021 October 2021 September 2021 August 2021 July 2021 June 2021 May 2021 April 2021 March 2021 February 2021 January 2021 December 2020 November 2020 October 2020 September 2020 July 2020 June 2020 May 2020 April 2020 March 2020 February 2020 January 2020 December 2019 November 2019 October 2019 September 2019 August 2019 July 2019 June 2019 May 2019 April 2019 March 2019 February 2019 January 2019 December 2018 November 2018 October 2018 September 2018 August 2018 July 2018 June 2018

其他

文章 RSS 评论 RSS

zkSync Era宕机问题排查

问题背景

跟进缘由

确认问题

确认下Batch高度时间

确认下Block高度时间

确认下Batch Commit时间

分析结果

如何避免

显卡驱动与CUDA版本对照

proof没有按顺序生成问题分析和解决

问题描述

测试数据

主节点

扩展节点

跟进代码

总结分析

方案验证

zkSync block revert

异常场景

查询最新数据

测试回滚

合约回滚

本地数据库回滚

附加

测试用例

zkSync revert block后，扩展节点启动报错

问题背景

最新文章

最新回复

标签

归档

其他