Capture Injection (ongoing)
Building an AI-Driven Workflow Automation System
Project Log (12/08/2024):
I don't really have a name for this project yet, but I wanted to document everything as I go (as opposed to some of my other projects). I know this project will take a significant portion of time and require some skills. It will also be my first AI-ish project.
The idea comes from me not wanting to do my homework, classwork, tests, etc. I don't think it's really a solid ROI, but for me, not wanting to jump through hoops to get a piece of paper, it's worth it. I also enjoy coming up with creative solutions that test my skills. Maybe school is too easy and I get bored. Some people have told me that before and I used to read books on robotics in chemisty class (I failed that twice) so... Anyways, I'll just end up spending my extra time studying law and prepping for the LSAT.
So what's the plan? This is best explained in two parts...
Part one is to capture the output from my Mac Studio, do some filtering because HDCP is annoying. In Fig. 1, you can see this as the Vertex Stripper and the Elgato capture card. My work is to write some software that will run on my Mac Mini to process what's on the screen (input from the capture card). This shouldn't be too hard, but I haven't really solved this issue yet. I did however do some testing and it seems like it's do-able. Once the Vertex shows up in the mail I'll have a more definative answer b/c things don't always work like they say they will but hopefully a $300 device does what it says it does.
Part two will involve the Arduino HID pass-through/injection. I will need to allow my Apple Magic Keyboard and Apple Magic Mouse to operate normally until specific key combinations are pressed. If those special key combinations are pressed, then the Arduino will start listening for specific inputs that tell the Mac Mini to do something. I did some testing on a Arduino MKR 1010 but it seems that there's another chip that interfaces with/as the HID device so the functionality is limited. It seems the Arduino Nano ESP32 solves this issue, we will see...
I want to give an example so let's assume I want it to answer a question on my screen (Mac Studio). I'd press some key combination, the Arduino will then send a signal (via Wi-Fi) to the Mac Mini. The Mac Mini will execute a script that does some image processing, uses a large language model (LLM) to find an answer, or makes an API request to get the answer. Then it can spit out the answer or replace my keystrokes with the correct answer. I'm thinking of having it work somewhat like a textbox terminal—e.g., type a command, it erases my command and responds, then erases its response. IDK, we will see how it goes.
Lots of work ahead but I elimated some or the more signifigant edge cases that I could. First, I'll need to finalize the HDCP bypass and video capture setup, ensuring the Mac Mini can reliably process screen data from the Mac Studio. This involves writing software that can handle the captured input and potentially preprocess it for further tasks like image recognition. Next, I need to build out the Arduino-based HID system, including creating the logic for detecting and interpreting special key combinations. I'll also need to establish robust Wi-Fi communication between the Arduino and the Mac Mini, ensuring it can trigger scripts and handle responses in real-time. On the software side, developing scripts for tasks like text recognition, API interactions, or LLM processing will require exploration and experimentation. It's a lot of moving pieces, but with proper planning and iterative development, I’m confident I can bring it all together. The key is to start small, solve one problem at a time, and document everything thoroughly as I progress.
Overcoming Edge Cases in Modern Keyboard Integration
Project Log (12/28/2024):
Starting with part one from earlier, you can see some heavy duty parts got removed.
- Removed Vertex Module
- Removed El Gato Capture Card
- Added Random Amazon Capture Card
I was flashing the Vertex capture card and it got bricked. It worked initially but had issues bypassing HDCP 2.2. I think this was the fault of the ElGato capture card not the Vertex. Also, the ElGato captue card doesn't really work as well as it claims or at least it's misleading in some aspects. In a last ditch attempt I got a $35 capture card off of Amazon because there were reviews on the internet that said this would work to bypass HDCP 2.2. Sure enough it fucking did...🙄 All that money and time spent just to find out a POS from China did what I needed. I don't really need the video feed to be 4k so this setup with 1080p 30fps works. Honestly, I probably would have had to downscale the feed anyways because trying to manipulate a 4k 60fps video feed would kill my Mac Mini. It would have been nice to have the higher resolution to make OCR easier, famous last words right?
Now on to part two I ended up making a few changes to that setup. You can see all these changes reflected in "Figure 2" below.
- Only using Keyboard (Mouse will be connected directly to Mac Studio)
- Switched to Arduino® UNO R4 WiFi
- Added SparkFun USB-C Host Shield
I decided the Magic Mouse doesn't really need to be used for anyhting special and in the event of a failure of the Arduino® UNO R4 WiFi then I will still have the ability to use my cursor. I could potentially use it to send commands and stuff but realistically I don't need to complicate things.
The order from Arduino got significantly delayed so I went to Jameco across the Bay to pick up two Arduino® UNO R4 WiFi boards to see if I could make that work. Turns out the new boards have a diffrent archetecture (Renesas RA4M1, Arm® Cortex®-M4, with a 48 MHz clock speed, 32 kB SRAM and 256 kB flash memory) and most libraries haven't been updated to support the new chip. I was able to get the Keyboard libarary from Arduino to type some things but getting it to read from the keyboard was challanging. This has to do with the HID 2.0 handshake/comunication protocol. I belive could have done it but I decided to make my life easier and opt for the SparkFun USB-C Host Shield. I played around with the examples from the USB Host Shield 2.0 Library. That worked great! Took a while to understand the library but I wrote this bit of code that allows every button on the Apple Magic Keyboard to be read.
// rawHexTesting.ino
#include <hidboot.h>
#include <usbhub.h>
#include <SPI.h>
// Custom parser class with a unique name
class CustomKeyboardParser : public KeyboardReportParser {
protected:
// Override the Parse method to print raw data
void Parse(USBHID *hid, bool is_rpt_id, uint8_t len, uint8_t *buf) override {
// Print raw data
for (uint8_t i = 0; i < len; i++) {
if (buf[i] < 0x10) Serial.print("0"); // Add leading zero for single digit hex values
Serial.print(buf[i], HEX);
Serial.print(" ");
}
Serial.println();
// Call the base class implementation for further parsing
KeyboardReportParser::Parse(hid, is_rpt_id, len, buf);
}
};
// USB and HID setup
USB Usb;
HIDBoot<USB_HID_PROTOCOL_KEYBOARD> HidKeyboard(&Usb);
CustomKeyboardParser Parser;
void setup() {
Serial.begin(921600); // Set baud rate to 921600 so I can see keys realtime in serial monitor
while (!Serial); // Wait for serial connection
if (Usb.Init() == -1) {
Serial.println("USB initialization failed");
while (1); // Stop here if USB initialization fails
}
Serial.println("USB initialized");
HidKeyboard.SetReportParser(0, &Parser); // Set the custom parser
}
void loop() {
Usb.Task(); // USB task
}
The serial monitor looks like this when pressing keys...
01 00 00 04 00 00 00 00 00 00 - "a" KEYDOWN
01 00 00 00 00 00 00 00 00 00 - ALL KEYS UP
01 00 00 05 00 00 00 00 00 00 - "b" KEYDOWN
01 00 00 00 00 00 00 00 00 00 - "ALL KEYS UP
01 00 00 06 00 00 00 00 00 00 - "c" KEYDOWN
01 00 00 00 00 00 00 00 00 00 - ALL KEYS UP
01 01 00 00 00 00 00 00 00 00 - LEFT CONTROL KEYDOWN
01 00 00 00 00 00 00 00 00 00 - ALL KEYS UP
01 04 00 00 00 00 00 00 00 00 - LEFT OPTION KEYDOWN
01 00 00 00 00 00 00 00 00 00 - ALL KEYS UP
01 08 00 00 00 00 00 00 00 00 - LEFT COMMAND KEYDOWN
01 00 00 00 00 00 00 00 00 00 - ALL KEYS UP
01 00 00 00 00 00 00 00 00 04 - Apple Finger Print Reader KEYDOWN
01 00 00 00 00 00 00 00 00 00 - ALL KEYS UP
01 00 00 6E 00 00 00 00 00 00 - SPACEBAR KEYDOWN
01 00 00 00 00 00 00 00 00 00 - ALL KEYS UP
You would think this would all be easy, right? But there's so many edge cases associated with modern keyboards. - Show Table
Now here's the fun part and one of the edge cases I had to deal with... modifier keys! For exaple lets say you need to select all eg "Command + a". Hoe does the computer know you're pressing both at the same time? Does it just send "Command" and then "a"? Well sort of, it ends up looking like this...
01 08 00 00 00 00 00 00 00 00 - LEFT COMMAND KEYDOWN
01 08 00 04 00 00 00 00 00 00 - LEFT COMMAND KEYDOWN + "a" KEYDOWN
01 08 00 00 00 00 00 00 00 00 - LEFT COMMAND KEYDOWN
01 00 00 00 00 00 00 00 00 00 - ALL KEYS UP
Neat! Every key up or down shows us all keys that are pressed or not pressed! But what if we need to press even more keys?! Lets see what pressing "a + b + c + CONTROL + OPTION + COMMAND" looks like...
01 00 00 04 00 00 00 00 00 00 - "a" KEYDOWN
01 00 00 04 05 00 00 00 00 00 - "a" + "b" KEYDOWN
01 00 00 04 05 06 00 00 00 00 - "a" + "b" + "c" KEYDOWN
01 01 00 04 05 06 00 00 00 00 - LEFT CONTROL KEYDOWN + "a" + "b" + "c" KEYDOWN
01 05 00 04 05 06 00 00 00 00 - LEFT CONTROL KEYDOWN + LEFT OPTION + "a" + "b" + "c" KEYDOWN
01 0D 00 04 05 06 00 00 00 00 - LEFT CONTROL KEYDOWN + LEFT OPTION + LEFT COMMAND + "a" + "b" + "c" KEYDOWN
01 05 00 04 05 06 00 00 00 00 - LEFT CONTROL KEYDOWN + LEFT OPTION + "a" + "b" + "c" KEYDOWN
01 01 00 04 05 06 00 00 00 00 - LEFT CONTROL KEYDOWN + "a" + "b" + "c" KEYDOWN
01 00 00 04 05 06 00 00 00 00 - "a" + "b" + "c" KEYDOWN
01 00 00 04 05 00 00 00 00 00 - "a" + "b" KEYDOWN
01 00 00 04 00 00 00 00 00 00 - "a" KEYDOWN
01 00 00 00 00 00 00 00 00 00 - ALL KEYS UP
This is more complicated so lets break it down. The letters are all there a=04 b=05 and c=06 but as they get pressed they get tacked on to the end there. This is limited in length to 6 spaces. Realistically, you can N-spaces for HID devices but this library limits us to old PS/2 standards from what I understand.
No the modifier keys are used in a really clever way. Instead of having a buffer-ish implemtation they actually get added together. So left control was 01 and left option was 04 from before but when we press them both down at the same time it becomes 05. So because this is in Hex when we add 01 + 04 + 08 we get 0D. One hex code to tell us that those left three modifier keys are being pressed! Pretty smart if you ask me!
There's definately a lot more edge cases but for what I need to do, having access to those modifier keys, this should be more than enough.
Moving forward I will need to...
- Part One - Capture Card Processing
- Play around with OCR to see what works best
- lots of brainstorming...
- Part Two - Arduino
- Map Apple Magic Keyboard keys to their Hex codes
- Arduino code for pass through
- Arduino code for "Command Mode"
- Arduino code for WIFI injection from Mac Mini
General Disclaimer: The information, resources, and materials provided on this blog are intended for educational and informational purposes only. While we strive to ensure the accuracy, reliability, and relevance of the content presented, we make no guarantees or warranties, express or implied, about the completeness, accuracy, or suitability of the information for any particular academic, professional, or personal purpose.
Non-Professional Advice: The content shared on this blog does not constitute professional, academic, legal, medical, or financial advice. Users should seek the advice of qualified professionals for matters requiring specialized expertise.
Academic Guidance: While this blog provides insights, resources, and tips related to academic topics, it should not be used as a substitute for institutional guidelines, academic advisors, or educational resources provided by accredited institutions. Always consult your school, college, or university policies and faculty for academic compliance requirements.
Dynamic Nature of Information: Academic standards, guidelines, and best practices evolve over time. While we endeavor to keep our content up-to-date, we cannot guarantee that all information will reflect the most current developments in any academic field or discipline.
Plagiarism and Academic Honesty: We strongly discourage any misuse of the content provided on this blog. All users are responsible for adhering to their institution’s policies on plagiarism and academic honesty. Copying, paraphrasing, or otherwise utilizing this blog’s content in a manner inconsistent with academic integrity standards is strictly prohibited.
Attribution and Referencing: Any references, citations, or external resources provided within this blog are offered to support further study and exploration. Users are responsible for ensuring proper citation and adherence to citation styles required by their academic institutions.
Third-Party Content: This blog may link to or reference third-party websites, articles, or other resources. We do not control, endorse, or guarantee the accuracy or appropriateness of any third-party content. Users should evaluate third-party materials critically and independently.
Compliance with Institutional Rules: This blog does not represent any specific academic institution, program, or organization. Users are individually responsible for complying with the rules, policies, and ethical guidelines of their respective institutions.
Course and Assignment-Specific Guidance: Any advice or insights offered herein are generalized and may not apply to specific assignments, courses, or academic programs. Always defer to instructions and criteria provided by your course instructors or academic advisors.
No Guarantees of Academic Success: The use of this blog and its materials does not guarantee improved academic performance, grades, or outcomes. Success in academic endeavors depends on individual effort, adherence to institutional guidelines, and other external factors beyond our control.
No Responsibility for Misuse: We disclaim any liability for actions taken by users based on the information provided on this blog. Misinterpretation, misuse, or misapplication of the content is the sole responsibility of the user.
Accuracy and Corrections: If you find any inaccuracies, outdated information, or unclear guidance on this blog, please contact us so that we may address these concerns. While we strive for precision, errors or omissions may occasionally occur.
Open to Improvement: This blog values constructive feedback and is committed to fostering a culture of academic growth and intellectual honesty. Suggestions for improving the content or addressing compliance concerns are welcome.
Illustrative and Educational Intent: Any examples, scenarios, or materials provided on this blog that might appear to conflict with academic integrity or compliance standards are presented solely for illustrative or educational purposes. These examples are not intended to encourage, endorse, or condone unethical or non-compliant behavior. Users are expected to interpret and apply the information responsibly and in accordance with the academic and ethical guidelines of their respective institutions.