Capture Injection (ongoing)
12/8/2024
I don't really have a name for this project yet, but I wanted to document everything as I go (as opposed to some of my other projects). I know this project will take a significant portion of time and require some skills. It will also be my first AI-ish project.
The idea comes from me not wanting to do my homework, classwork, tests, etc. I don't think it's really a solid ROI, but for me, not wanting to jump through hoops to get a piece of paper, it's worth it. I also enjoy coming up with creative solutions that test my skills. Maybe school is too easy and I get bored. Some people have told me that before and I used to read books on robotics in chemisty class (I failed that twice) so... Anyways, I'll just end up spending my extra time studying law and prepping for the LSAT.
So what's the plan? This is best explained in two parts...
Part one: is to capture the output from my Mac Studio, do some filtering because HDCP is annoying. In Fig. 1, you can see this as the Vertex Stripper and the Elgato capture card. My work is to write some software that will run on my Mac Mini to process what's on the screen (input from the capture card). This shouldn't be too hard, but I haven't really solved this issue yet. I did however do some testing and it seems like it's do-able. Once the Vertex shows up in the mail I'll have a more definative answer b/c things don't always work like they say they will but hopefully a $300 device does what it says it does.
Part two will involve the Arduino HID pass-through/injection. I will need to allow my Apple Magic Keyboard and Apple Magic Mouse to operate normally until specific key combinations are pressed. If those special key combinations are pressed, then the Arduino will start listening for specific inputs that tell the Mac Mini to do something. I did some testing on a Arduino MKR 1010 but it seems that there's another chip that interfaces with/as the HID device so the functionality is limited. It seems the Arduino Nano ESP32 solves this issue, we will see...
I want to give an example so let's assume I want it to answer a question on my screen (Mac Studio). I'd press some key combination, the Arduino will then send a signal (via Wi-Fi) to the Mac Mini. The Mac Mini will execute a script that does some image processing, uses a large language model (LLM) to find an answer, or makes an API request to get the answer. Then it can spit out the answer or replace my keystrokes with the correct answer. I'm thinking of having it work somewhat like a textbox terminal—e.g., type a command, it erases my command and responds, then erases its response. IDK, we will see how it goes.
Lots of work ahead but I elimated some or the more signifigant edge cases that I could. First, I'll need to finalize the HDCP bypass and video capture setup, ensuring the Mac Mini can reliably process screen data from the Mac Studio. This involves writing software that can handle the captured input and potentially preprocess it for further tasks like image recognition. Next, I need to build out the Arduino-based HID system, including creating the logic for detecting and interpreting special key combinations. I'll also need to establish robust Wi-Fi communication between the Arduino and the Mac Mini, ensuring it can trigger scripts and handle responses in real-time. On the software side, developing scripts for tasks like text recognition, API interactions, or LLM processing will require exploration and experimentation. It's a lot of moving pieces, but with proper planning and iterative development, I’m confident I can bring it all together. The key is to start small, solve one problem at a time, and document everything thoroughly as I progress.