问题描述

在抓取某信宝数据时,发现有几个字段的值与真实值不符,分析发现源码中由class qxb-num修饰的标签数据都是错乱的。
由qxb-num修饰的标签
查看qxb-num样式,发现特殊字体

由此断定开发人员是在字体库上动的手脚。

字体的绘制与ttf

在查阅相关文档后,总结字体的绘制过程为:

  1. 根据字符的unicode编码找到glyph名称
  2. 根据glyph名称找到glyph
  3. 使用glyph进行绘制

A TrueType font file consists of a sequence of concatenated tables. A table is a sequence of words. Each table must be long aligned and padded with zeroes if necessary.

一个TrueType Font字体库包含几个table。这里需要用到的两个table如下(tag为table的名称)

tag table
cmap character to glyph mapping
glyf glyph data

破解过程

根据字体的绘制过程,可以猜测有两种方式实现字体加密

  1. 打乱字符编码与glyph映射(即cmap table)
  2. 打乱glyph名称与glyph数据(即glyf table)

利用fonttools,使用如下代码将字体转为xml

1
2
3
4
5
6
7
8
from fontTools.ttLib import TTFont
from io import BytesIO
import requests

font_content = requests.get('https://cache.qixin.com/pcweb/font-awesome-qxb-1bd55e43.woff2').content
font_file = BytesIO(font_content)
font = TTFont(font_file)
font.saveXML('font.xml')

查看生成的xml文件,发现cmap节点部分数据

1
2
3
4
5
6
7
8
9
10
11
12
13
<map code="0x30" name="icon-number_9"/><!-- DIGIT ZERO -->
<map code="0x31" name="icon-number_3"/><!-- DIGIT ONE -->
<map code="0x32" name="icon-number_7"/><!-- DIGIT TWO -->
<map code="0x33" name="icon-number_2"/><!-- DIGIT THREE -->
<map code="0x34" name="icon-number_0"/><!-- DIGIT FOUR -->
<map code="0x35" name="icon-number_1"/><!-- DIGIT FIVE -->
<map code="0x36" name="icon-number_5"/><!-- DIGIT SIX -->
<map code="0x37" name="icon-number_8"/><!-- DIGIT SEVEN -->
<map code="0x38" name="icon-number_6"/><!-- DIGIT EIGHT -->
<map code="0x39" name="icon-number_4"/><!-- DIGIT NINE -->
<map code="0x41" name="icon-upper_S"/><!-- LATIN CAPITAL LETTER A -->
<map code="0x42" name="icon-upper_Q"/><!-- LATIN CAPITAL LETTER B -->
<map code="0x43" name="icon-upper_K"/><!-- LATIN CAPITAL LETTER C -->

由此可以断定这个字体库通过打乱cmap table实现实体加密。

破解

对于打乱的cmap,只要找到字符对应的glyph名称就可以了

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
"""
企信宝字体加密: cmap table应对方案
"""

def decrypt(ss, font_file):
"""
根据字体文件解密字符串
:param ss: str or list of str
:param font_file: file like object or file path
:return:
"""
with TTFont(font_file) as font:
cmap = font['cmap'].getBestCmap()

def _decrypt(s):
predict = ''
for c in s:
if c in PLAIN_CHARS:
predict += cmap[ord(c)][-1]
else:
predict += c
return predict

if isinstance(ss, str):
return _decrypt(ss)
else:
return [_decrypt(s) for s in ss]

对于打乱的glyf,先标记glyph数据与真实字符,之后通过比对glyph数据找到对应的真实字符就可以了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73

"""
企信宝字体加密: glyf table应对方案
"""

from fontTools.ttLib import TTFont
import string


def _get_glyph_name(c: str, font):
return font.getBestCmap()[ord(c)]


PLAIN_BOOK = string.ascii_uppercase + string.ascii_lowercase + '1234567890'


def _load_refer_glyph_data():
"""
加载已知宋体库,取得真实的glyphdata与name字典
:return: {glyphdata->bytes: char_name}
"""
import os.path
font_file = os.path.join(os.path.dirname(__file__), 'font-awesome-qxb-5ffe2d46.woff2')
with TTFont(font_file) as font:
cipherbook = 'XSQRTWFCZHDIN' \
'LAKEUGBMPOVJY' \
'rpfcbtdnajuhg' \
'zyikxovqleswm' \
'7658419203'
glyphset = font['glyf'].glyphs
return {glyphset[_get_glyph_name(p, font)].data: c for p, c in zip(PLAIN_BOOK, cipherbook)}


refer_glyph_data = _load_refer_glyph_data()


def decrypt(ss, font_file):
"""
根据字体文件解密字符串
:param ss: str or list of str
:param font_file: file like object or file path
:return:
"""
font = TTFont(font_file)
glyphs = font['glyf'].glyphs

def _decrypt(s):
predict = ''
for c in s:
if c in PLAIN_BOOK:
glyph_data = glyphs[_get_glyph_name(c, font)].data
predict += refer_glyph_data[glyph_data]
else:
predict += c
return predict

if isinstance(ss, str):
return _decrypt(ss)
else:
return [_decrypt(s) for s in ss]


def _test():
import os
cipher = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890'
plain = 'XSQRTWFCZHDINLAKEUGBMPOVJYrpfcbtdnajuhgzyikxovqleswm7658419203'
predict = decrypt(cipher, os.path.join(os.path.dirname(__file__), 'font-awesome-qxb-5ffe2d46.woff2'))
print(predict)
print(predict == plain)


if __name__ == '__main__':
_test()

总结

由于该网站字体库只是对英文字母与数字进行保护,所以用ocr的方式也可以破解,但这样需要通过浏览器进行渲染,抓取速度慢且无法在服务器上部署。
另外该字体库每周变化一次,如果字体库不变的话,只需要将明文与密文做个简单映射就可以了。