By: John Sparks

Glasses in front of computer code

The Information Technology & Network Systems (IT NS) department at Laurus College utilizes a hands-on learning approach to our course instruction with plenty of lab time in our classes.

During one such class, we re-discovered an issue in the Windows Notepad application.

The Bug

Windows Notepad has been around for over 30 years now and has been the default text editor for many users.  Many years ago, an issue was discovered wherein typing certain text patterns in Notepad, then upon saving, closing, and re-opening, would generate a set of boxes or a foreign text pattern (Google search “bush hid the facts” for more information).

The issue was apparently fixed with the release of Windows Vista and there hasn’t been much chatter on the issue since.

During the IT class however, it was discovered that the issue can be recreated utilizing certain text strings, including the name of the instructor, even on today’s Windows 10 Notepad.  Entering the string, saving, and reopening the file confuses Notepad as to which encoding method to use while opening the file. Notepad takes a “best guess” approach, and in this case, gets it wrong.

Try it yourself:

  • From your Windows search bar, type Notepad to launch notepad. Then type the following string: johnsparks
  • Next, highlight the text and copy it to the clipboard (Edit > Copy)
  • Now, paste the string 20 times into Notepad (Edit > Paste)
  • Next, save the file (File > Save) with whatever name you like, and close Notepad
  • Finally, reopen the new text file and observe the results. You should see something similar to the screenshot below.

Text pasted into notepad

This behavior occurs due to an issue with how Notepad interprets the encoding of the text file when it opens it up.  Files by default, are encoded in Notepad with either ANSI or UTF-8 (depending on the Notepad version).  ANSI encoding generally is used for the Latin character set (including the English alphabet), and UTF-8 supports the Unicode character set (a global character set).  When Notepad opens this document it doesn’t interpret the encoding properly.

The Workaround

To work around this issue, one option is to change the encoding to UTF-8 with BOM when you save the file.  BOM stands for Byte Order Mark, which creates a hexadecimal entry EF BB BF along with the file as shown in the screenshot below.  When the file is next opened, Notepad recognizes the BOM and then understands the proper encoding to use to display the correct text.

BOM Example

Can you find other text strings to duplicate this issue in Notepad?

Want to Learn More?

Come join our IT NS program for your own opportunity to learn more about Windows Operating Systems and Applications.

Start your journey TODAY!


References:

Hoffman, Chris. “Notepad Isn’t Moving to Windows 10’s Store After All.” How, How-To Geek, 4 Dec. 2019, www.howtogeek.com/450169/notepad-isnt-moving-to-windows-10s-store-after-all/. Retrieved 12/4/19

“How Windows Notepad Interpret Characters.” Stack Overflow, 20 July 2011, https://stackoverflow.com/questions/6769311/how-windows-notepad-interpret-characters Retrieved 12/5/19

Raymond, Raymond ChenFollow. “The Notepad File Encoding Problem, Redux.” The Old New Thing, 17 Apr. 2007, https://devblogs.microsoft.com/oldnewthing/20070417-00/?p=27223 Retrieved 12/6/19

Karl-Bridge-Microsoft. “Unicode – Win32 Apps.” Win32 Apps | Microsoft Docs, https://docs.microsoft.com/en-us/windows/win32/intl/unicode?redirectedfrom=MSDN Retrieved 12/6/19

“Bush Hid the Facts.” Wikipedia, Wikimedia Foundation, 23 Aug. 2019, https://en.wikipedia.org/wiki/Bush_hid_the_facts Retrieved 12/3/19

“What’s the Difference between UTF-8 and UTF-8 without BOM?” Stack Overflow, 1 Mar. 1960, https://stackoverflow.com/questions/2223882/whats-the-difference-between-utf-8-and-utf-8-without-bom Retrieved 12/10/19