6.1 Assembler

P5 review

image-20250415234251717

image-20250415234356010

What we need: an Assembler

  • 汇编器:把汇编语言,转化成机器语言的一个sofe ware(the first sofeware layer above the hard ware)

image-20250416121246454

A fun way to look at Assembler

  • We already have our first computer. It is difficult to write machine language.
  • So we can bulid our second computer(is acutally a sofeware haha), which is called Assembler ( can translate assembly language to machine language)

image-20250416121549820

Basic Assembler Logic

basic

image-20250416121915773

image-20250416122001311

image-20250416122058367

image-20250416122146818

image-20250416122255261

image-20250416122332192

other
  • We have something to deal with - the Symbols.
  • We have two kinds of symbols: Labels and Variables.
  • We need replace their names with address

image-20250416122505756

image-20250416125136494

last problems
  • Sometime we need the labels before we define them.
  • I like the second solution.

image-20250416125537937

image-20250416123427577

image-20250416123436546

image-20250416123442487


6.2 The translator’s challenge ( preparation )

  • We will soon actually build our Assembler.
  • To start with, we need to know the challenges we may face.

image-20250416125952491

image-20250416130030648

  • So, we need to know the syntax of Hack language

image-20250416130212289

image-20250416130224972

  • Those are all the challenges.

image-20250416130423237

image-20250416130432295

  • Let’s deal with them one by one.
  • Symbols are difficult, so we will deal with them later. ( Summiting the mountain may be easier from the other side than from the starting side. )

image-20250416130543505

image-20250416130650303

image-20250416130754099

  • The plan ahead

image-20250416130859437


6.3 Translating challenge 1 : A/C Struction

image-20250416134201668

image-20250416134739093

image-20250416135103314


6.4 Translating challenge 1 : Struction with Symbols

image-20250416135923947

image-20250416140121297

image-20250416140447481

image-20250416140743441

image-20250416141031541

image-20250416141210069

image-20250416141338475

image-20250416141555469

image-20250416141914934


6.5 Use Java to build Assembler

三个步骤

  • 第一步:解析命令
    • 读取文件一个个获取命令,看是什么命令

image-20250416144431385

  • 第二步:

image-20250416142427353

  • 第三步:处理符号

image-20250416142441071

第一步

  • 唯一需要了解的就是输入语言的格式,以及它如何分解成不同的组件

image-20250416142533923

image-20250416142721990

image-20250416142818023

image-20250416143024166

第二步

image-20250416143153847

image-20250416143320447

第三步

  • 唯一要做的:保存符号和内存地址之间的关联

image-20250416143506189

image-20250416143618139

image-20250416143741218

image-20250416143841714

总的来说

image-20250416144043853


6.6 Proj 6 Overview

image-20250416145841739

image-20250416150048896

image-20250416150329421

image-20250416150526237

image-20250416150630531

image-20250416150707107

image-20250416151522621

image-20250416151656158

6.7 code

Parser

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
class Parser:
"""
Encapsulates access to the input code. Reads an assembly language command,
parses it, and provides convenient access to the command’s components
(fields and symbols). In addition, removes all white space and comments.
"""

def __init__(self, in_file):
self.file_name = in_file
self.prog_text = self.read_asm(in_file)
self.prog_num = len(self.prog_text)
self.cmd_next = 0
self.cmd_text = ''

def read_asm(self, input_file):
"""
Read .asm file from disk, return list of strings for commands only
"""
prog_text = []
with open(input_file) as fp:
for line in fp.readlines():
# 去除注释和空白行
line = line.strip()
if line and not line.startswith('//'):
# 去除行末的注释
line = line.split('//')[0].strip()
prog_text.append(line)
return prog_text

def reset_read(self):
"""
Reset reading for second pass
"""
self.cmd_next = 0
self.cmd_text = ''

def has_more_commands(self):
"""
Are there more commands in the input?
"""
return self.cmd_next < self.prog_num

def advance(self):
"""
Reads the next command from the input and makes it the current command.
Should be called only if hasMoreCommands() is true.
Initially there is no current command.
"""
if self.has_more_commands():
self.cmd_text = self.prog_text[self.cmd_next]
self.cmd_next += 1

def command_type(self):
"""
Returns the type of the current command:
m A_COMMAND for @Xxx where Xxx is either a symbol or a decimal number
m C_COMMAND for dest=comp;jump
m L_COMMAND (actually, pseudo-command) for (Xxx) where Xxx is a symbol.
"""
if self.cmd_text.startswith('('):
return 'L_COMMAND'
elif self.cmd_text.startswith('@'):
return 'A_COMMAND'
return 'C_COMMAND'

def symbol(self):
"""
Returns the symbol or decimal Xxx of the current command @Xxx or (Xxx).
Should be called only when commandType() is A_COMMAND or L_COMMAND.
"""
if self.command_type() == 'L_COMMAND':
return self.cmd_text[1:-1]
elif self.command_type() == 'A_COMMAND':
return self.cmd_text[1:]

def dest(self):
"""
Returns the dest mnemonic in the current C-command (8 possibilities).
Should be called only when commandType() is C_COMMAND.
"""
if self.command_type() == 'C_COMMAND':
if '=' in self.cmd_text:
return self.cmd_text.split('=')[0]
return ''

def comp(self):
"""
Returns the comp mnemonic in the current C-command (28 possibilities).
Should be called only when commandType() is C_COMMAND.
"""
if self.command_type() == 'C_COMMAND':
if '=' in self.cmd_text:
return self.cmd_text.split('=')[1]
if ';' in self.cmd_text:
return self.cmd_text.split(';')[0]

def jump(self):
"""
Returns the jump mnemonic in the current C-command (8 possibilities).
Should be called only when commandType() is C_COMMAND.
"""
if ';' in self.cmd_text:
return self.cmd_text.split(';')[1]
return ''

Code

image-20250416134739093

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
class Code:
"""
Translates Hack assembly language mnemonics into binary codes.
"""

def __init__(self):
self.dest_list = ['', 'M', 'D', 'MD', 'A', 'AM', 'AD', 'AMD']
self.jump_list = ['', 'JGT', 'JEQ', 'JGE', 'JLT', 'JNE', 'JLE', 'JMP']
self.comp_dict = {
'0': '0101010', '1': '0111111', '-1': '0111010', 'D': '0001100',
'A': '0110000', '!D': '0001101', '!A': '0110001', '-D': '0001111',
'-A': '0110011', 'D+1': '0011111', 'A+1': '0110111', 'D-1': '0001110',
'A-1': '0110010', 'D+A': '0000010', 'D-A': '0010011', 'A-D': '0000111',
'D&A': '0000000', 'D|A': '0010101', 'M': '1110000', '!M': '1110001',
'-M': '1110011', 'M+1': '1110111', 'M-1': '1110010', 'D+M': '1000010',
'D-M': '1010011', 'M-D': '1000111', 'D&M': '1000000', 'D|M': '1010101'
}

def a_code(self, val): # ⭐
"""
将 A 指令的数值或符号地址转换为 16 位二进制
"""
return '{0:b}'.format(int(val)).zfill(16)
# '{0:b}' 是一个格式化字符串,其中 {0} 是一个占位符,表示要插入的值的位置,b 表示将值格式化为二进制形式。
# zfill 是字符串的一个方法,用于在字符串的左边填充零,直到字符串的长度达到指定的长度。在这里,指定的长度是 16。


def c_code(self, comp_str, dest_str, jump_str):
"""
Create string for c-code
"""
return '111' +self.comp(comp_str) + self.dest(dest_str) + self.jump(jump_str)


def dest(self, dest_str):
"""
Returns the binary code of the dest mnemonic.
"""
return '{0:b}'.format(self.dest_list.index(dest_str)).zfill(3)
def comp(self, comp_str):
"""
Returns the binary code of the comp mnemonic.
"""
return self.comp_dict[comp_str]
def jump(self, jump_str):
"""
Returns the binary code of the jump mnemonic.
"""
return '{0:b}'.format(self.jump_list.index(jump_str)).zfill(3)

SymbolTable

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
class SymbolTable:
"""
Keeps a correspondence between symbolic labels and numeric addresses.
"""

def __init__(self):
# 生成一个包含 16 个键值对的字典,键是形如 R0 到 R15 的字符串,值分别是对应的整数 0 到 15 。
self.table = {'R{}'.format(i): i for i in range(16)}
self.table.update({'SP': 0, 'LCL': 1, 'ARG': 2, 'THIS': 3,
'THAT': 4, 'SCREEN': 16384, 'KBD': 24576})

def add_entry(self, symbol, address):
"""
向符号表中添加新的符号地址映射
:param symbol: 符号
:param address: 地址
"""
if not self.contains(symbol):
self.table[symbol] = address
return True
return False

def contains(self, symbol):
"""
Does the symbol table contain the given symbol?
"""
return symbol in self.table.keys()

def get_address(self, symbol):
"""
Returns the address associated with the symbol.
"""
return self.table[symbol]

Assembler

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
import os
import time
from parser import Parser
from code import Code
from symbol_table import SymbolTable


class Assembler:
"""
汇编器类,负责将汇编文件转换为机器码文件
"""

def __init__(self, debug=False):
self.symbol_table = SymbolTable()
self.code = Code()
self.parser = None
self.debug = debug

def debug_print(self, print_string):
if self.debug:
print(print_string)

def assemble(self, input_file):
"""
主汇编函数,执行两遍扫描并生成输出文件
:param input_file: 输入的汇编文件路径
"""
start = time.time()

# 检查输入文件是否存在且是 .asm 文件
assert os.path.exists(input_file), f"文件 {input_file} 不存在"
assert input_file.endswith('.asm'), "输入文件必须是 .asm 格式"

self.debug_print(f'Converting {input_file}')

# 创建解析器实例
self.parser = Parser(input_file)

# 记录各阶段时间
check_1 = time.time()
self.debug_print(f'Parsed file in {round(check_1 - start, 5):.5f} secs')
self.pass_1()
check_2 = time.time()
self.debug_print(f'First pass in {round(check_2 - check_1, 5):.5f} secs')
out_text = self.pass_2()
check_3 = time.time()
self.debug_print(f'Second pass in {round(check_3 - check_2, 5):.5f} secs')

# 生成输出文件名
out_file = self.parser.file_name.split('.asm')[0] + '.hack'

# 写入输出文件
self.write_output(out_file, out_text)
self.debug_print(f'Wrote {out_file}')
self.debug_print(f'Ran in {round(time.time() - start, 5):.5f} secs')

def pass_1(self):
"""
第一遍扫描:处理标签符号,记录标签地址
"""
asm_line = 0 # 记录当前处理的指令行号(非标签行)
while self.parser.has_more_commands():
self.parser.advance()
if self.parser.command_type() == 'L_COMMAND': # 标签指令
symbol = self.parser.symbol()
self.symbol_table.add_entry(symbol, asm_line) # 标签地址为当前行号(后续指令的行号)
else: # 非标签指令,行号递增
asm_line += 1

def pass_2(self):
"""
第二遍扫描:处理变量符号,生成机器码
"""
out_text = []
var_count = 16 # 变量地址从 16 开始
self.parser.reset_read() # 重置解析器到文件开头

while self.parser.has_more_commands():
self.parser.advance()
cmd_type = self.parser.command_type()

if cmd_type == 'A_COMMAND': # 处理 A 指令
symbol = self.parser.symbol()
if symbol.isdigit(): # 符号是数字,直接转换
out_text.append(self.code.a_code(symbol))
else: # 符号是变量或标签
if not self.symbol_table.contains(symbol): # 符号未定义,添加到符号表
self.symbol_table.add_entry(symbol, var_count)
var_count += 1
# 获取符号地址并转换为二进制
address = self.symbol_table.get_address(symbol)
out_text.append(self.code.a_code(address))

elif cmd_type == 'C_COMMAND': # 处理 C 指令
comp = self.parser.comp()
dest = self.parser.dest()
jump = self.parser.jump()
out_text.append(self.code.c_code(comp, dest, jump))

return out_text

def write_output(self, out_file, out_text):
"""
将生成的机器码写入输出文件,每行一个 16 位二进制
"""
with open(out_file, 'w') as fp:
for binary in out_text:
fp.write(f"{binary}\n") # 使用 \n 换行,确保每行一条指令

main

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# 1.导入模块
import argparse # Python 标准库中用于处理命令行参数的模块
from assembler import Assembler

# 2.创建命令行参数解析器
parser = argparse.ArgumentParser(
prog="main.py",
description="Convert a .asm file into .hack assembly file",
)

# 3.添加命令行参数
parser.add_argument("asm_path", help="Path to .asm file to convert", type=str)

parser.add_argument(
"--debug",
"--d",
action="store_true",
default=False,
help="Whether to print steps/timing (True) or not (False) DEFAULT: False",
dest="debug",
)

# 4.更新文档字符串
__doc__ = "\n" + parser.format_help()

# 5.定义主函数
def main():
args = parser.parse_args()
a = Assembler(args.debug)
a.assemble(args.asm_path)