源码看PHP大整数

看群里在讨论PHP大整形,我也来做个测试记录下;

测试DEMO:

1
2
3
<?php
$bigInt = 19897654567894567890;
var_dump($bigInt);

结果是采用科学计数法把$bigInt转换成了double型

1
double(1.9897654567895E+19)

思考:
难道是在编译期间将整形转成double了吗??然后再转成科学计数法??
还是用gdb调试下看看吧,通过断点获得如下opcode:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
(gdb) p *(op_array.opcodes)@40
{
handler = 0x89739e <execute_ex+2334>,
op1 = {
constant = 80,
var = 80,
num = 80,
opline_num = 80,
jmp_offset = 80
},
op2 = {
constant = 16,
var = 16,
num = 16,
opline_num = 16,
jmp_offset = 16
},
result = {
constant = 0,
var = 0,
num = 0,
opline_num = 0,
jmp_offset = 0
},
extended_value = 0,
lineno = 4,
opcode = 38 '&',
op1_type = 16 '\020',
op2_type = 1 '\001',
result_type = 8 '\b'
},
{
handler = 0x89739e <execute_ex+2334>,
op1 = {
constant = 96,
var = 96,
num = 96,
opline_num = 96,
jmp_offset = 96
},
op2 = {
constant = 32,
var = 32,
num = 32,
opline_num = 32,
jmp_offset = 32
},
result = {
constant = 0,
var = 0,
num = 0,
opline_num = 0,
jmp_offset = 0
},
extended_value = 1,
lineno = 5,
opcode = 61 '=',
op1_type = 8 '\b',
op2_type = 1 '\001',
result_type = 8 '\b'
},
{
handler = 0x89739e <execute_ex+2334>,
op1 = {
constant = 80,
var = 80,
num = 80,
opline_num = 80,
jmp_offset = 80
},
op2 = {
constant = 1,
var = 1,
num = 1,
opline_num = 1,
jmp_offset = 1
},
result = {
constant = 80,
var = 80,
num = 80,
opline_num = 80,
jmp_offset = 80
},
extended_value = 0,
lineno = 5,
opcode = 117 'u',
op1_type = 16 '\020',
op2_type = 8 '\b',
result_type = 8 '\b'
},
{
handler = 0x89739e <execute_ex+2334>,
op1 = {
constant = 0,
var = 0,
num = 0,
opline_num = 0,
jmp_offset = 0
},
op2 = {
constant = 0,
var = 0,
num = 0,
opline_num = 0,
jmp_offset = 0
},
result = {
constant = 1,
var = 1,
num = 1,
opline_num = 1,
jmp_offset = 1
},
extended_value = 0,
lineno = 5,
opcode = 60 '<',
op1_type = 8 '\b',
op2_type = 8 '\b',
result_type = 8 '\b'
},
{
handler = 0x89739e <execute_ex+2334>,
op1 = {
constant = 48,
var = 48,
num = 48,
opline_num = 48,
jmp_offset = 48
},
op2 = {
constant = 0,
var = 0,
num = 0,
opline_num = 0,
jmp_offset = 0
},
result = {
constant = 0,
var = 0,
num = 0,
opline_num = 0,
jmp_offset = 0
},
extended_value = 4294967295,
lineno = 6,
opcode = 62 '>',
op1_type = 1 '\001',
op2_type = 8 '\b',
result_type = 8 '\b'
}

大致了解opcode执行过程:

1
2
3
4
5
6
7
8
9
10
11
101                    'e',        extension statement
40 '(', ECHO
101 'e', extension statement
38 ASSIGN '&', 赋值=
101 EXT_STMT 'e', extension statement
61 DO_FCALL_BY_NAME '=', 通过名称调用函数
102 EXT_FCALL_BEGIN 'f', extension function call begin
117 'u',
60 DO_FCALL '<', 函数调用
103 EXT_FCALL_END 'g', extension function call end
62 RETURN '>', 函数返回

简化如下:

1
38(ASSIGN) => 》61(通过名称调用函数) =>》117(未知)=>》60(函数调用)=>62(函数返回)

dbg调试到这里,并没有发现有跟转换double相关的信息,继续打印op_array看看是否还有其他信息?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
$17 = {
type = 2 '\002',
arg_flags = "\000\000",
fn_flags = 134217728,
function_name = 0x0,
scope = 0x0,
prototype = 0x0,
num_args = 0,
required_num_args = 0,
arg_info = 0x0,
refcount = 0x7ffff207b000,
last = 11,
opcodes = 0x7ffff208c000,
last_var = 1,
T = 2,
vars = 0x7ffff207b008,
last_live_range = 0,
last_try_catch = 0,
live_range = 0x0,
try_catch_array = 0x0,
static_variables = 0x0,
filename = 0x7ffff205e230,
line_start = 1,
line_end = 6,
doc_comment = 0x0,
early_binding = 4294967295,
last_literal = 4,
literals = 0x7ffff207a040,
cache_size = 8,
run_time_cache = 0x0,
reserved = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
}

能看出literals是个地址信息,继续打印:发现其中的一段

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
{
value = {
lval = 4895767021146598541,
dval = 1.9897654567894569e+19,
counted = 0x43f1422ab683ec8d,
str = 0x43f1422ab683ec8d,
arr = 0x43f1422ab683ec8d,
obj = 0x43f1422ab683ec8d,
res = 0x43f1422ab683ec8d,
ref = 0x43f1422ab683ec8d,
ast = 0x43f1422ab683ec8d,
zv = 0x43f1422ab683ec8d,
ptr = 0x43f1422ab683ec8d,
ce = 0x43f1422ab683ec8d,
func = 0x43f1422ab683ec8d,
ww = {
w1 = 3062099085,
w2 = 1139884586
}
},
u1 = {
v = {
type = 5 '\005',
type_flags = 0 '\000',
const_flags = 0 '\000',
reserved = 0 '\000'
},
type_info = 5
},
u2 = {
next = 4294967295,
cache_slot = 4294967295,
lineno = 4294967295,
num_args = 4294967295,
fe_pos = 4294967295,
fe_iter_idx = 4294967295,
access_flags = 4294967295,
property_guard = 4294967295,
extra = 4294967295
}
}

看到结果dval = 1.9897654567894569e+19,难道execute之前就已经转换了???
op_array里面再也找不出对应的信息了,我决定重新开始dbg,调用栈如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
bt
#0 d2b (d=d@entry=0x7fffffffa020, e=e@entry=0x7fffffffa010, bits=bits@entry=0x7fffffffa014) at /usr/local/src/php-7.2.4/Zend/zend_strtod.c:1323
#1 0x000000000081e9e2 in zend_strtod (s00=<optimized out>, se=se@entry=0x7fffffffa0a8) at /usr/local/src/php-7.2.4/Zend/zend_strtod.c:3037
#2 0x00000000007c85f7 in lex_scan (zendlval=zendlval@entry=0x7fffffffa0f0) at Zend/zend_language_scanner.l:1668
#3 0x00000000007d9336 in zendlex (elem=elem@entry=0x7fffffffa1c0) at /usr/local/src/php-7.2.4/Zend/zend_compile.c:1721
#4 0x00000000007bd44e in zendparse () at /usr/local/src/php-7.2.4/Zend/zend_language_parser.c:4227
#5 0x00000000007bfd57 in zend_compile (type=type@entry=2) at Zend/zend_language_scanner.l:585
#6 0x00000000007c11e3 in compile_file (file_handle=0x7fffffffd140, type=8) at Zend/zend_language_scanner.l:635
#7 0x0000000000680428 in phar_compile_file (file_handle=<optimized out>, type=<optimized out>) at /usr/local/src/php-7.2.4/ext/phar/phar.c:3320
#8 0x00007fffeb8a30dd in xdebug_compile_file (file_handle=<optimized out>, type=<optimized out>) at /usr/local/src/xdebug-2.6.0/xdebug.c:2072
#9 0x00000000007fa47d in zend_execute_scripts (type=type@entry=8, retval=retval@entry=0x0, file_count=file_count@entry=3)
at /usr/local/src/php-7.2.4/Zend/zend.c:1490
#10 0x000000000079a260 in php_execute_script (primary_file=primary_file@entry=0x7fffffffd140) at /usr/local/src/php-7.2.4/main/main.c:2590
#11 0x00000000008a1b35 in do_cli (argc=3, argv=0xeb3bd0) at /usr/local/src/php-7.2.4/sapi/cli/php_cli.c:1011
#12 0x000000000043ee2f in main (argc=3, argv=0xeb3bd0) at /usr/local/src/php-7.2.4/sapi/cli/php_cli.c:1404

终于在 zend_language_scanner 发现了相关代码:
细看这两行调用栈:

1
2
#1  0x000000000081e9e2 in zend_strtod (s00=<optimized out>, se=se@entry=0x7fffffffa0a8) at /usr/local/src/php-7.2.4/Zend/zend_strtod.c:3037
#2 0x00000000007c85f7 in lex_scan (zendlval=zendlval@entry=0x7fffffffa0f0) at Zend/zend_language_scanner.l:1668

源码打开 Zend/zend_language_scanner.l:1668:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
<ST_IN_SCRIPTING>{LNUM} {
char *end;
if (yyleng < MAX_LENGTH_OF_LONG - 1) { /* Won't overflow */
errno = 0;
ZVAL_LONG(zendlval, ZEND_STRTOL(yytext, &end, 0));
/* This isn't an assert, we need to ensure 019 isn't valid octal
* Because the lexing itself doesn't do that for us
*/
if (end != yytext + yyleng) {
zend_throw_exception(zend_ce_parse_error, "Invalid numeric literal", 0);
ZVAL_UNDEF(zendlval);
RETURN_TOKEN(T_LNUMBER);
}
} else {
errno = 0;
ZVAL_LONG(zendlval, ZEND_STRTOL(yytext, &end, 0));
if (errno == ERANGE) { /* Overflow */
errno = 0;
if (yytext[0] == '0') { /* octal overflow */
ZVAL_DOUBLE(zendlval, zend_oct_strtod(yytext, (const char **)&end));
} else {
ZVAL_DOUBLE(zendlval, zend_strtod(yytext, (const char **)&end));
}
/* Also not an assert for the same reason */
if (end != yytext + yyleng) {
zend_throw_exception(zend_ce_parse_error,
"Invalid numeric literal", 0);
ZVAL_UNDEF(zendlval);
RETURN_TOKEN(T_DNUMBER);
}
RETURN_TOKEN(T_DNUMBER);
}
/* Also not an assert for the same reason */
if (end != yytext + yyleng) {
zend_throw_exception(zend_ce_parse_error, "Invalid numeric literal", 0);
ZVAL_UNDEF(zendlval);
RETURN_TOKEN(T_DNUMBER);
}
}
ZEND_ASSERT(!errno);
RETURN_TOKEN(T_LNUMBER);
}

常量地址:zend/zend_long.h

1
2
# define MAX_LENGTH_OF_LONG 20
# define LONG_MIN_DIGITS "9223372036854775808"

通过前后对比,再细看1668行代码:

1
ZVAL_DOUBLE(zendlval, zend_strtod(yytext, (const char **)&end));

通过阅读代码不难发现:

zend引擎在对数字用lex_scan分析的时候,是先判断数字的长度,
如果有可能溢出,先将其转成LONG型保存,如果溢出,则通过zend_strtod转换成double类型,再用double类型的zval结构体存储;

-------------本文结束感谢您的阅读-------------
坚持原创技术分享,您的支持将鼓励我继续创作!