前不久裁判文书网又双叒叕更新了。这次除了更新strToLong,还把ListContent响应的文书id加密了。列表页响应如下

1
"[{\"RunEval\":\"w61Zw41uwoJAEH4WwozCh8Odw5DDtAXCiCcfwqHDh8OJwoYYwrTClUPCpVnDqcOJw7jDrgVqKcOIw6hiw5nCpcKLfgkZw4PDrsO8fMOzw43DjMKSwo3Ds23CusOew6wTwp1+w6TCi8KXXMKnwrvCt8OnV8Kdwr0vwrcrwr3DjMOWGxEGISkIwq/DhRNIwoB4wowOwrnCh0QeZFxxKsKhwrYQw6Adw4xCwqB6YAzDhUXDvsOIGsKpwoMYJAwSw4AOwpgAHcOgCcOJISVUAALCjcOww69CRlHCvMKYJcOZbsKfw6vDjyTDj8O0LMKKKVbDhSNEeDjClmrDtXM4SsO6wrbCpEIoFcKKwqDDmsKQw5TDtlluwph3wqrDl8O5w68/w7Ekw4rDpULCg8KCcsKHFBU/ZwHDm8Oqw4HDicOgcsKwM8OcwqzDrlVMTW/DvHodwqpCw53CpcKoA8KBw43DocKKw68aZD9GG8KvRsOYPTVuUcK7WcO3bwYDwqzChsKawrZ7w6vCp03DucKuw6PDqmchwrrCocOnLDhnQV8eNFsRXcK4c8Omw5PCnmNnw6hGw7A+HRbDuMK5wrQ3LMKuw5HCux3DtsKpVMOUfMK+wrbCtMKYw7PDmcO5NMKMGsOHRTAHX8KrwqnCtMOcwqhlM0ccH8KOwrPCsH4fVcO3w6HDmMOzw5gnAMKdwqPCusOPw73DizfClsKYwpvClsOhE8OEwq82HcOSwoDCu8KiH8K8AcKKwpJCRl8=\",\"Count\":\"1\"},{\"裁判要旨段原文\":\"本院认为:原告李雅君申请撤诉符合法律规定。依照《中华人民共和国民事诉讼法》第一百四十五条第一款规定,裁定如下\",\"不公开理由\":\"\",\"案件类型\":\"2\",\"裁判日期\":\"2018-01-24\",\"案件名称\":\"李雅君与庄彪、庄春买卖合同纠纷一审民事裁定书\",\"文书ID\":\"FcOMwrkBA0EIw4DDgMKWw7jCnxAWw6jCvyTCn0MFI8OcEsKxwrxWbsOJLMK7acKDw6EyWXdXwqt8U8KVIcKwwqhtw4dIXxXCqGknwqQsPkLDrmHDg8KsJ8OwRsKBw47CgzbCgSfChsKGwpXCrXzCoDvCulZOw4F4XUx9woNCw7lJw7bCgTIiw4fDkVXClcKow5/DuzPCqgHCqWVow4HCiMKBw61ew43DmcOSAHHDhA5+HxcSw6nCmMK6Z3RAU0oQw5/Cu8K4fg==\",\"审判程序\":\"一审\",\"案号\":\"(2018)内2223民初567号\",\"法院名称\":\"扎赉特旗人民法院\"}]"

其中的RunEval, 文书ID需要解密。经过分析发现这次更新的反爬逻辑为:

  1. 通过RunEval释放解密密钥
  2. 用得到的密钥通过aes算法解密文书ID

厘清了逻辑后用python写测试代码,遇到两个坑

pkcspadding

这个好说,pycrypto没有内置padding算法,需要自己写

1
2
3
4
5
pkcs7 = {
'padding': lambda bs: bs + bytes(
[AES.block_size - len(bs) % AES.block_size] * (AES.block_size - len(bs) % AES.block_size)),
'unpadding': lambda bs: bs[0: -bs[-1]]
}

调用两次解密失败

对于同一个aes对象,调用两次decrypt在第二次会出现解密结果不一致。如下是测试代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

from Crypto.Cipher import AES

encrypted_doc_id = "8597B3AD01871DAFE2D56D588EE97B8441C4E35E425090ABF3F8F1DA6E60D80C3060679A1DC7C6393244C559FB3FB1F1D12B29F833B3CE4801834543FE4771F9BFE7B80BB6BBBCA22529249934B6A86474CF0D21C63A3BF922B6FA59C4F905C02A4D158FA85D441C9CD7D47E8627C100"
pkcs7 = {
'padding': lambda bs: bs + bytes(
[AES.block_size - len(bs) % AES.block_size] * (AES.block_size - len(bs) % AES.block_size)),
'unpadding': lambda bs: bs[:-bs[-1]]
}
key = b'30e8c334e68c4bd99c429cbd177bd301'
iv = b'abcd134556abcedf'
mode = AES.MODE_CBC

aes = AES.new(key, mode, iv)


def decrypt_inner(ciphertxt):
return pkcs7['unpadding'](aes.decrypt(bytes.fromhex(ciphertxt)))


d1 = decrypt_inner(encrypted_doc_id).decode()
print(d1)
d2 = decrypt_inner(d1)
print(d2)

运行结果

1
2
7D35836F31B7113CDE00ADEA826D9BBEA58F5A6B2C6F97EAA381FD5D332428609A7274B4F0F2307591E750552F640F35
b'*\x1cA\x8a\xacXG\x1d\x84\x82\xd1\x7f\x84o\x91^bc-9216-a834012b8c5f'

预期的结果应该是b'a37a5674-cdca-48bc-9216-a834012b8c5f',但得到的却不一样。
观察发现出问题的是前16个字节,后面的数据是正确的。这说明解密时的key是正确的,但初始向量出了问题。
可能是因为pyCrypto在第一次解密后没有重置aes对象的状态导致的。所以只能用一个AES对象只执行一次解密操作

1
2
3
def decrypt_inner(ciphertxt):
aes = AES.new(key, mode, iv)
return pkcs7['unpadding'](aes.decrypt(bytes.fromhex(ciphertxt)))

参考stackoverflow AES decryption fails when decrypting a second time