Wikipedia:Reference desk/Archives/Computing/2019 November 4

Computing desk
< November 3 << Oct | November | Dec >> November 5 >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


November 4

edit

Can Assembly document contain machine code?

edit

I ask this probably wired question for general knowledge, as someone who does only Bash and JavaScript and never wrote one sentence in Assembly:

Might there be a case when a programmer will write machine code directly in Assembly document (Assembly-absratcion + machine code; binary/hexadecimal) and if so, would it be executed directly from Assembly?

Thanks in advance for reading and maybe answering and sorry if I wrote anything factually wrong, 49.230.10.172 (talk) 16:24, 4 November 2019 (UTC)[reply]

There are a small number of rare cases, like when the programmer needs to choose an instruction that isn't normally documented, in which case they just insert it as data in the middle of the code.
Assembly doesn't actually have a difference between data and code, they're both just bytes in the end. MoonyTheDwarf (Braden N.) (talk) 16:40, 4 November 2019 (UTC)[reply]
Bash is a command language that runs on UNIX computers and the Bash interpreter cannot understand assembly language. There is a devious way to add inline assembly into C language via the asm keyword, then compile a DSO (Dynamic Shared Object) that Bash can employ using its here documents syntax. JavaScript is a language primarily for web pages that is interpreted by a browser that cannot understand assembly language. Be aware that there are many different Assembly languages that are each specific to a particular computer architecture, unlike Bash and JavaScript that are both interpreters portable to different processors.
Assembly language consists of a handful of easily-remembered simple commands that act directly on the registers of the target processor. For example, the assembly programmer who wishes to load an 8-bit register with a value 97 writes mov al, 61h into a text file which his assembler program (e.g. MASM for an Intel 80x86 processor) translates to the Machine code B0 61. These are two bytes expressed in Hexadecimal. The OP asks about the possibility of a programmer writing directly machine code such as B0 61. It is possible because
  • The CPU cannot know or care where machine code came from and, if it is correct, will execute it as expected

but we must qualify that as follows:

  • It is very inadvisable to circumvent the assembler program that has been developed and tested with a full knowledge of the target CPU
  • Machine code is almost impossible for a human subsequently to understand, debug or modify without the help of a Disassembler program.

Exceptional cases that might justify a direct change in machine code are:

  • The assembly programmer knows reliably about a new feature of a CPU that has not yet been incorporated in his version of assembler program
  • Some actions intended to occur automatically during execution of a Self-modifying code. DroneB (talk) 13:57, 5 November 2019 (UTC)[reply]
  • Yes, this is pretty common as a feature, although not often needed or used.
An assembly language program is written as source code. There are only two uses for this: editing it manually, or supplying it to an assembler program, which then converts it to binary machine code. You can't load this source file directly onto the target processor.
It's standard that you can also write an assembler 'directive', then some numbers after this (human-readable format) which says "Don't treat this as assembly language, don't assemble it, just copy it around as raw data". This is most often used to load data look-up tables, but if those numbers are also valid as machine code, then they'll be incorporated as more machine code into the program and can be used as such.
Whether it gets executed depends on if anything tries to execute it. Remember that most processors have a fairly dense set of instructions (most numbers map to recognised opcodes) and so it will do something with it – probably not anything very useful. All that's needed is for the program counter to be instructed to jump to this address.
This isn't often used for a serious purpose. It can be sometimes if you're writing to a new version of a processor and your assembler doesn't yet support some new instruction which has recently been added. This might even be an instruction (or an operand) which shouldn't work, but it turns out that it does (and might have some weird, but useful side-effect).
Assembling such a program involves more than just translating assembler mnemonics into opcodes. One of the main tasks for assemblers is in doing the arithmetic needed to calculate jumps, pointers etc. So any embedded machine code like this would also need to have that done first, and that's really tedious.
It's even possible to find this facility in some (older) higher-level programming languages. Years back (mid-'80s) I was using CORAL 66, which was obsolete even then. It's a very simple language, and we were writing 'systems' code, which manipulated low-level features of the processor hardware, writing specific codes into specific locations to control the memory-mapping hardware. Fortunately the language compiler had 'CODE' blocks, which allowed us to write codes in directly, just as you've been describing. Andy Dingley (talk) 20:50, 5 November 2019 (UTC)[reply]