随手搓 01：MC 物品数据统计

本文最后更新于 2023年11月24日晚上

随便写的物品统计脚本

序

最近看到一个比较老旧的 GitHub repo：SciCrafft/mc-scanner

于是就顺手拿到自己的服务器上试试看，欸，效果不错，但是数据统计似乎不是那么完全

潜影盒里的物品几乎没有统计到，出来的也不是我想要的数据

怎么办呢，原理上看起来挺简单的，无非就是暴力搜索，那就自己写一个吧（。＾▽＾）

注意一点就是，这篇并不是一个教程，而是我编程时的心路历程，所以会发现排版似乎有点混乱

(●’◡’●)

基本概念

稍微了解 Minecraft 的存档逻辑的话，写起来也是非常轻松的

下面是一个最基本的 Minecraft 存档结构

.
└── world/
    ├── playerdata/
    │   └── .dat
    ├── region/
    │   └── .mca
    ├── DIM1/
    │   └── region/
    │       └── .mca
    ├── DIM-1/
    │   └── region/
    │       └── .mca
    └── ...

这次物品统计所要读取的只有位于 \playerdata 的 .dat 玩家文件以及位于 \region 的 .mca 区域文件

其中 .dat 文件包含了所有有关玩家的资料，这里就不多叙说，想了解更多的话可以查看 Wiki

而 .mca 文件就包含了所有在这个维度的区块信息，想了解更多也可以查看 Wiki

这次将会使用 nbtlib 来处理 .dat 玩家资料

以及魔改的 anvilparser 来处理 .mca 区块文件，至于为什么要用魔改的，我稍后会解释 😉

关于数据

playerdata

随便用一个 nbt 编辑器打开位于 \playerdata 的玩家文件

这里我推荐使用 NBT Studio

文件名是一串玩家 UUID

如果是启用了 online mode 的服务器将会是官方的 UUID，可以通过这个网站找到

打开文件后，可以看到以玩家 UUID 为名的根标签，以及其下的另一堆标签

而我们需要的玩家物品栏数据存放于名为 Inventory 的标签之下

玩家末影箱的物品栏数据则存放于名为 EnderItems 的标签之下

所需的数据结构如下

UUID
├── Inventory
│   └── ...
├── EnderItems
│   └── ...
└── ...

物品数据

物品数据在 NBT 里算是一个复合标签类型 （Compound）

复合标签类型，为一个包含属性值对的有序列表。其中的每个标签可以为任意类型。

其中分别为：

Slot：身处的物品格编号
id：Minecraft 物品 id，这也是我们需要统计的
tag：物品标签，这个后面会再提到，另外不一定每一个物品都会有
Count：物品数量

获取玩家数据

nbtlib repo

先过一遍文档，基本运用的第一项就是读取文件

那就照着写来试试

1
2
3

import nbtlib

target_data = nbtlib.load("[UUID].dat")

读取完之后试着打印一下其内容

1	`print(target_data)`

似乎有全部信息，但是该如何获取信息呢？

试试用一下 for 循环

1 2	`for i in target_data: print(i)`

seenCredits
DeathTime
foodTickTimer
recipeBook
XpTotal
...

返回项目里面有着全部标签名，不错

那能否使用类似字典那样获取 item 呢？

这个库看起来挺简单的，那就来翻翻源码吧

首先找到 nbt.py

一开始不是用了 nbtlib.load() 这个 function吗，那就试着从这里着手

找到 def load() 了，这个 function 是在 class File 里面

局部代码如下

class File(Compound):
	def __init__(...):
		...
	...
	def load(...):
		...

可见，File class 是从 Compound class 遗传下来，那就再往上找吧

代码文件有些乱，但是上面有一句 import 差点被忽略掉了

1	`from .tag import BYTE, Compound, ...`

那就试着从同级文件的 tag.py 里找找吧

class Compund 代码片段如下

class Compound(...):
	def __contains__(self, item):
		...
	def __getitem__(self, key):
		...
	def __setitem__(self, key):
		...
	def __delitem__(self, key):
		...
	...

看得到，几乎全部符合 python indexer [] 格式的使用都有合适的 implementation，那就可以放心用了

注意一个小细节，一开始不是用了一次 for 循环嘛，循环语句的 in 也能在这里的 __contains__ 看到是有合适的 implementation !(*￣(￣　*)

那就来写些统计代码吧

首先来实现一些简单的计数功能

item_record: dict = {} # initalize 物品记录


def count_item(item_id, item_count: int):
	item_id: str = str(item_id)
	if item_id not in item_record:
		item_record[item_id] = 0
	item_record[item_id] += item_count

再来写个玩家物品栏的暴力搜索

def player_inventory_search(data_file):
    target_file = nbtlib.load(data_file)

    for i in target_file["Inventory"]:
        count_item(i["id"], int(i["Count"]))
        print(i["id"], " ", int(i["Count"]))

    for i in target_file["EnderItems"]:
        count_item(i["id"], int(i["Count"]))
        print(i["id"], " ", int(i["Count"]))

这里就不那么专业用 logging 库来处理日志，随便 print 一下就好了

看着 console 不断刷屏，突然有几个字符引起了我的注意

shulker_box

对吼，我忘记处理潜影盒了！（≧□≦）ノ

潜影盒处理

再次启动 NBT Studio，找到一个潜影盒的数据结构

潜影盒内部物品的数据是储存在 tag 标签之下的，具体结构如下

...
id: minecraft:shulker_box
tag:
└── BlockEntitiyTag:
    ├── id: minecraft:shulker_box
    └── Items:
        └── ...

其实也不复杂，无非就是把物品数据加进去了而已

加上考虑多一种情况就是潜影盒没有任何物品的时候，Items 标签并不会存在

那就重构一下代码，加多一个专门处理潜影盒的 function

def player_inventory_search(data_file):
    def search_shulker_box_item(shulker_box):
        if "Items" not in shulker_box["tag"]["BlockEntityTag"]:
            return
        for item in shulker_box["tag"]["BlockEntityTag"]["Items"]:
            count_item(item["id"], int(item["Count"]))
            print(item["id"], " ", item["Count"])

    target_file = nbtlib.load(data_file)

    for i in target_file["Inventory"]:
        count_item(i["id"], int(i["Count"]))
        print(i["id"], " ", int(i["Count"]))
        if str(i["id"]).endswith("shulker_box") and "tag" in i:
            search_shulker_box_item(i)

    for i in target_file["EnderItems"]:
        count_item(i["id"], int(i["Count"]))
        print(i["id"], " ", int(i["Count"]))
        if str(i["id"]).endswith("shulker_box") and "tag" in i:
            search_shulker_box_item(i)

试着跑一下，看起来效果不错

看来玩家数据这一块算是搞腚力 (～o￣3￣)～

区域文件

Minecraft 的 .mca 区域文件其实是由多个区块组成，每个区块以 NBT 标签的形式储存

也就是说，同样可以用 NBT 编辑器查看区域文件

众所周知，游戏中容器其实属于一个方块实体，所以很容易就能得知要查看哪个标签

所需的方块实体标签如下

Chunk [x, y] in world at (x, y):
├── block_entities:
│   └── ...
└── ...

其中，根标签中的 [x, y] 为区块编号，各自从 0 - 31，而后面的那个 (x, y) 则是区块坐标，要注意二者的不同

除了容器以外，其实还有很多其他方块是属于方块实体的，例如告示牌

因此我们要注意区分，下面我给个容器和非容器的数据标签作为例子

...
├── z: 0
├── x: 0
├── id: minecraft:bed
├── y: 10
└── keepPacked: 0
...
├── z: 1
├── x: 1
├── id minecraft:dropper
├── y: 10
├── Items:
│   └── ...
└── keepPacked: 0
...
├── z: 2
├── x: 2
├── TransferCooldown: 0
├── id: minecraft:hopper
├── y: 10
├── Items:
│   └── ...
└── keepPacked: 0

从上可见，属于容器的数据标签均有他们的 Items 标签，所以要区分他们也不难

而 Items 标签内的物品数据和之前在玩家物品栏里的数据是通用、一样的

这就没什么难度了，重用读取物品那部分的代码即可

获取区块容器数据

anvil parser repo

同样，先看一遍 readme，运用的第一项也是读取文件

照葫芦画瓢

import anvil

target_region = anvil.Region.from_file("r.0.0.mca")

target_chunk = anvil.Chunk.from_region(target_region, 0, 0)

注意看，这 arguments 里的 0, 0 是指区块编号，也就是从 0 到 31 那个，而不是区块坐标

打印一下内容

1	`print(target_chunk)`

1	`<anvil.chunk.Chunk object at 0x000001320B34AAC0>`

欸，这次的返回信息里没什么内容，虽然同样是 NBT 格式，难道这个库解析时没按照 NBT 来封装吗？

直接查源码

可以从 readme 里的示例发现，我们从库中的 Chunk class 调用了一个 from_region 的 function

那就试试从 anvil\chunk.py 这个文件里看看吧

果然，找到了些有趣的东西，片段代码如下

class Chunk:
	def __init__(...):
		...
	... # 中间有很多其他 function，这里不一一列出
	def get_tile_entity(...):
		...
		return tile_entity
	@classmethod
	def from_region(cls, ...):
		...
		return cls(nbt_data)

from_region 是属于 class method，所以返回值中还可以继续调用这个 class 里的 function

而其中，我找到了个 get_tile_entity 的 function，作者注释如下

"""
Returns the tile entity at given coordinates, or ``None`` if there isn't a tile entity

To iterate through all tile entities in the chunk, use :class:`Chunk.tile_entities`
"""

那就按照注释所说，试试看吧

if target_chunk.tile_entities:  
    te = target_chunk.tile_entities  
    for k in te:  
        print(k)

嚯，雀食能读到些东西出来

再翻翻在这个 class 里，返回的 nbt_data 能怎么使用

欸，都有 python indexer [] 的 implementation

那就好办了，同样直接暴力搜索

def region_search(region_file):
    def search_shulker_box_item(shulker_box):
        if "Items" not in shulker_box["tag"]["BlockEntityTag"]:
            return
        for item in shulker_box["tag"]["BlockEntityTag"]["Items"]:
            count_item(item["id"], int(str(item["Count"])))
            print(item["id"], " ", item["Count"])

    target_region = anvil.Region.from_file(region_file)

	for i in range(0, 32):
		for j in range(0, 32):
			chunk = anvil.Chunk.from_region(target_region, i, j)
			if chunk.tile_entities:
				te = chunk.tile_entities
				for k in te:
					if "Items" not in k:
						continue
					for a in k["Items"]:
						count_item(a["id"], int(str(a["Count"])))
						# print(a["id"], " ", a["Count"])
						if str(a["id"]).endswith("shulker_box") and "tag" in a:
							search_shulker_box_item(a)

哎，不错不错，这个脚本的核心就差不多写完了

不过，在尝试几次之后就发现了异常，抛了个 IndexError

通过编辑器打开有问题的文件看看，似乎时文件本身就有点问题

管那么多干嘛，直接 except 就完事了 =￣ω￣=

不久后又有个异常

1	`anvil.errors.ChunkNotFound: Could not find chunk (0, 0)`

有趣，顺着文件我找了下成因：

由于 Minecraft 区块生成机制是一个区块一个区块来的，而不是整个区域

因此处于探索边界地区的区域文件并没有含有全部区块，而是部分

所以在尝试读取未生成的区块时，就抛出了这个异常

最后的最后，这个区域文件搜索终于做完了。。。

def region_search(region_file):
    def search_shulker_box_item(shulker_box):
        if "Items" not in shulker_box["tag"]["BlockEntityTag"]:
            return
        for item in shulker_box["tag"]["BlockEntityTag"]["Items"]:
            count_item(item["id"], int(str(item["Count"])))
            # print(item["id"], " ", item["Count"])

    target_region = anvil.Region.from_file(region_file)

    try:
        for i in range(0, 32):
            for j in range(0, 32):
                try:
                    chunk = anvil.Chunk.from_region(target_region, i, j)
                    if chunk.tile_entities:
                        te = chunk.tile_entities
                        for k in te:
                            if "Items" not in k:
                                continue
                            for a in k["Items"]:
                                count_item(a["id"], int(str(a["Count"])))
                                # print(a["id"], " ", a["Count"])
                                if str(a["id"]).endswith("shulker_box") and "tag" in a:
                                    search_shulker_box_item(a)
                except ChunkNotFound:
                    pass
    except IndexError as e:
        print(f"{e}, when searching region {region_file}")

整合

这个搜索程序的俩个核心模块已经完成了，是时候做个数据输出

该怎么让这个脚本读取数据存档呢

我想到的解决方案是以命令行的方式来使用脚本，通过添加执行参数来确定存档位置

类似这样

1	`python mc-item-canner.py -w <world directory> ...`

为了实现这个功能，这里使用了 argparse ，一个命令行选项的解析器

用法也很简单，看俩下文档就会了，这里就不浪费篇幅

我需要的参数也就俩个，一个是存档位置，另一个是输出

局部代码如下

import argparse

aparser = argparse.ArgumentParser(description=r"path of \world")  
aparser.add_argument('-o', '--outdirectory', default=r'.\\')  
aparser.add_argument('-w', '--worlddirectory', default=r'.\\')  
  
args = aparser.parse_args()  
PATH_OUTDIR = args.outdirectory  
PATH_WORLD = args.worlddirectory

功能也就那些：

解析存档目录并保存到 PATH_WORLD
解析输出目录并保存到 PATH_OUTDIR

数据输出

直接将没有任何处理得数据扔出去并不算是做了数据处理

~~虽然这里也只是把一堆字典变成 .csv 格式罢了~~

这边随便调用下 csv 的库

import csv

item_record: dict = {} # 储存物品记录

...
...
	# 记录输出部分
	with open(fr'{PATH_OUTDIR}\results.csv', 'w', encoding='utf8', newline='') as out:  
	    writer = csv.writer(out)  
	    writer.writerow(["item", "count"])  
	    for i in item_record.items():  
	        writer.writerow(list(i))

很简单，物品输出就做完了

至于为什么要用 .csv 格式，因为方便啊！

原始数据并不需要过多的处理就可以直接变成 csv 文件，而且这个文件格式可以直接拿到 excel 里做其他分析，不香吗？

main()

主函数要做的只有三件事

收集并过滤数据
数据处理
输出

也就是过滤数据那里有些麻烦罢了

def main():
	# 检查路径，路径有问题就报错
    if not os.path.isdir(PATH_OUTDIR):  
        raise NotADirectoryError("the outdirectory specified is not a directory")  

	# 检查文件是否为存档
    if "level.dat" not in os.listdir(PATH_WORLD):  
        raise FileNotFoundError("file: level.dat cannot be found in the world directory!")  

	# 几个连续的搜索尝试
    try:  
        print(r"searching \playerdata")  
        for f in os.listdir(fr"{PATH_WORLD}\playerdata"):  
            if f.endswith(".dat"):  
                player_inventory_search(fr"{PATH_WORLD}\playerdata\{f}")  
    except FileNotFoundError:  
        print(r"directory: \playerdata cannot be found in the world directory")  
  
    try:  
        print(r"searching \region")  
        for f in os.listdir(fr"{PATH_WORLD}\region"):  
            if f.endswith(".mca"):  
                region_search(fr"{PATH_WORLD}\region\{f}")  
    except FileNotFoundError:  
        print(r"directory: \region cannot be found in the world directory")  
  
    try:  
        print(r"searching \DIM1\region")  
        for f in os.listdir(fr"{PATH_WORLD}\DIM1\region"):  
            if f.endswith(".mca"):  
                region_search(fr"{PATH_WORLD}\DIM1\region\{f}")  
    except FileNotFoundError:  
        print(r"directory: \DIM1\region cannot be found in the world directory")  
  
    try:  
        print(r"searching \DIM-1\region")  
        for f in os.listdir(fr"{PATH_WORLD}\DIM-1\region"):  
            if f.endswith(".mca"):  
                region_search(fr"{PATH_WORLD}\DIM-1\region\{f}")  
    except FileNotFoundError:  
        print(r"directory: \DIM-1\region cannot be found in the world directory")  

	# 文件输出
    with open(fr'{PATH_OUTDIR}\results.csv', 'w', encoding='utf8', newline='') as out:  
        writer = csv.writer(out)  
        writer.writerow(["item", "count"])  
        for i in item_record.items():  
            writer.writerow(list(i))

到这里，所有代码都写完了

完整文件如下，或者可以直接到我的 Github 仓库查看

import nbtlib  
import anvil  
from anvil.errors import ChunkNotFound  
import os  
import argparse  
import csv  
  
  
aparser = argparse.ArgumentParser(description=r"path of \world")  
aparser.add_argument('-o', '--outdirectory', default=r'.\\')  
aparser.add_argument('-w', '--worlddirectory', default=r'.\\')  
  
args = aparser.parse_args()  
PATH_OUTDIR = args.outdirectory  
PATH_WORLD = args.worlddirectory  
  
item_record: dict = {}  
  
  
def count_item(item_id, item_count: int):  
    item_id: str = str(item_id)  
    if item_id not in item_record:  
        item_record[item_id] = 0  
    item_record[item_id] += item_count  
  
  
def region_search(region_file):  
    def search_shulker_box_item(shulker_box):  
        if "Items" not in shulker_box["tag"]["BlockEntityTag"]:  
            return  
        for item in shulker_box["tag"]["BlockEntityTag"]["Items"]:  
            count_item(item["id"], int(str(item["Count"])))  
            # print(item["id"], " ", item["Count"])  
  
    target_region = anvil.Region.from_file(region_file)  
  
    try:  
        for i in range(0, 32):  
            for j in range(0, 32):  
                try:  
                    chunk = anvil.Chunk.from_region(target_region, i, j)  
                    if chunk.tile_entities:  
                        te = chunk.tile_entities  
                        for k in te:  
                            if "Items" not in k:  
                                continue  
                            for a in k["Items"]:  
                                count_item(a["id"], int(str(a["Count"])))  
                                # print(a["id"], " ", a["Count"])  
                                if str(a["id"]).endswith("shulker_box") and "tag" in a:  
                                    search_shulker_box_item(a)  
                except ChunkNotFound:  
                    pass  
    except IndexError as e:  
        print(f"{e}, when searching region {region_file}")  
  
  
def player_inventory_search(data_file):  
    def search_shulker_box_item(shulker_box):  
        if "Items" not in shulker_box["tag"]["BlockEntityTag"]:  
            return  
        for item in shulker_box["tag"]["BlockEntityTag"]["Items"]:  
            count_item(item["id"], int(item["Count"]))  
            # print(item["id"], " ", item["Count"])  
  
    target_file = nbtlib.load(data_file)  
  
    for i in target_file["Inventory"]:  
        count_item(i["id"], int(i["Count"]))  
        # print(i["id"], " ", int(i["Count"]))  
        if str(i["id"]).endswith("shulker_box") and "tag" in i:  
            search_shulker_box_item(i)  
  
    for i in target_file["EnderItems"]:  
        count_item(i["id"], int(i["Count"]))  
        # print(i["id"], " ", int(i["Count"]))  
        if str(i["id"]).endswith("shulker_box") and "tag" in i:  
            search_shulker_box_item(i)  
  
  
def main():  
    if not os.path.isdir(PATH_OUTDIR):  
        raise NotADirectoryError("the outdirectory specified is not a directory")  
  
    if "level.dat" not in os.listdir(PATH_WORLD):  
        raise FileNotFoundError("file: level.dat cannot be found in the world directory!")  
  
    try:  
        print(r"searching \playerdata")  
        for f in os.listdir(fr"{PATH_WORLD}\playerdata"):  
            if f.endswith(".dat"):  
                player_inventory_search(fr"{PATH_WORLD}\playerdata\{f}")  
    except FileNotFoundError:  
        print(r"directory: \playerdata cannot be found in the world directory")  
  
    try:  
        print(r"searching \region")  
        for f in os.listdir(fr"{PATH_WORLD}\region"):  
            if f.endswith(".mca"):  
                region_search(fr"{PATH_WORLD}\region\{f}")  
    except FileNotFoundError:  
        print(r"directory: \region cannot be found in the world directory")  
  
    try:  
        print(r"searching \DIM1\region")  
        for f in os.listdir(fr"{PATH_WORLD}\DIM1\region"):  
            if f.endswith(".mca"):  
                region_search(fr"{PATH_WORLD}\DIM1\region\{f}")  
    except FileNotFoundError:  
        print(r"directory: \DIM1\region cannot be found in the world directory")  
  
    try:  
        print(r"searching \DIM-1\region")  
        for f in os.listdir(fr"{PATH_WORLD}\DIM-1\region"):  
            if f.endswith(".mca"):  
                region_search(fr"{PATH_WORLD}\DIM-1\region\{f}")  
    except FileNotFoundError:  
        print(r"directory: \DIM-1\region cannot be found in the world directory")  
  
    # print(item_record)  
  
    with open(fr'{PATH_OUTDIR}\results.csv', 'w', encoding='utf8', newline='') as out:  
        writer = csv.writer(out)  
        writer.writerow(["item", "count"])  
        for i in item_record.items():  
            writer.writerow(list(i))  
  
  
if __name__ == '__main__':  
    main()

小结

以上就是我第一个小项目的编程心得

很明显，逻辑相对来说比较混乱，因为一开始并没有一个明确的目标

这点虽然在平时这种脚本中并没有什么大碍，但是我认为有更好的编程思路可以去学习，参考

『学不可以已』，一切都只是一个开始，希望在日后的的项目中能有所改进 ψ(｀∇´)ψ

笔记

#服务器 #笔记 #Minecraft #Python #随手搓

随手搓 01：MC 物品数据统计

https://blissfulalloy79.github.io/07-simplecode01/

作者

BlissfulAlloy79

发布于

2023年7月5日

许可协议

随手搓 02：学校时间表爬取上一篇

Hexo 个人美化合集下一篇