Thomas Ptacek | July 21st, 2008 | Filed Under: Uncategorized
Earlier today, a security researcher posted their hypothesis regarding
Dan Kaminsky’s DNS finding. Shortly afterwards, when the story began
getting traction, a post appeared on our blog about that
hypothesis. It was posted in error. We regret that it ran. We removed
it from the blog as soon as we saw it. Unfortunately, it takes only
seconds for Internet publications to spread.
We dropped the ball here.
Since alerting the Internet earlier in July about the upcoming
announcement of his finding, Dan has consistently urged DNS operators
to patch their servers. We confirmed the severity of the problem then
and, by inadvertantly verifying another researcher’s results today,
reconfirm it today. This is a serious problem, it merits immediate
attention, and the extra attention it’s receiving today may increase
the threat. The Internet needs to patch this problem ASAP.
Dan told me about his finding personally, in order to help ensure
widespread patching before further details were announced at the
upcoming Black Hat conference. We chose to have a story locked and
loaded for that presentation, or for any other confirmed public
disclosure. On a personal level, I regret this as well.
Dan did phenomenal work on this research. It was impossible to talk to
him today and not know that he was sincere about coordinating a
graceful disclosure and fix for the problem. That I helped detract
from that work is painful both personally and professionally, and I
apologize to Dan for the way this played out.
Thomas Ptacek
Principal, Matasano Security
Jul 21, 2008
101 Comments
Timur | July 17th, 2008 | Filed Under: Apple, Uncategorized
An intern expects to be given simple projects, like coffee retrieval,
or “Hello, World.” So I’ve been sorely disappointed by Matasano. I
have been offered coffee retrieval services by senior engineers and my
latest project has been anything but “Hello, World.”
In fact, it’s been more like, “Hello, OS X. Tell me your secrets”.
This is the story of one trial-by-fire project handed to an intern
that turned out to be more complicated than anyone expected.
1.
It started with Thomas, innocently enough, handing me some debugger
code. It was both C and Ruby, and for Solaris and Win32. He said, “I
would like you to port this Win32 Ruby code to OS X.”
“Um, okay.”
At that point I’d just finished learning the basics of Ruby via my
previous Matasano project, a database backed HTTP proxy. I knew
nothing about debuggers, let alone the low level C library calls I’d
need and Ruby bindings to make them work. I know, fun, right?
I started simply and dusted the C off in my head so I could begin to
read and understand the code Thomas dumped on me, and perhaps learn
how a debugger works and gets used. It took a day or two just to read
it. I’d ask the office some fairly basic question about debuggers, and
receive in return a much longer response than I’d anticipated. Like a
tutorial on the workings of x86 assembly. Eventually, I got to a point
where I was almost comfortable with how the C debugger worked.
When staring at C code stopped doing me any good, and writing Ruby
code started seeming feasible, I moved on to porting the Ruby
code. “How hard could it be?”.
2.
Thomas gave me a starting point. Our Ruby code called directly into C
libraries using Win32API and Ruby/DL. We have wrapper libraries that
make those C calls look like Ruby library functions. So, for instance,
in our Wrap32 library, we have:
# just grab some local memory
def malloc(sz)
r = CALLS["msvcrt!malloc:L=L"].call(sz)
raise WinX.new(:malloc) if r == 0
return r
end
We had a small piece of this written for OS X as well. I had to build
it out. I started with getpid(), a simple system call I could make
sure worked before I moved on to something harder. It worked right
away. My confidence was high. I was feeling cocky.
Here I should mention that I’d never worked on a decently large coding
project before. This was my first.
Throughout this entire project I’ve been trying to write the entire
thing far before I actually write even a single function. So,
I had many questions:
What was the script implementing the debugger to look like?
Was it to be event driven?
Did we want objects to represent each process, threads, or to
make his lunch for him?
I was overzealous. The team was patient. Thomas said simply, “There is
no spoon. You’ll need ptrace() and wait() for the breakpoint
insertion and signal catching. Just copy the functionality from the
Win32 version.”
3.
An brief word from the team about how debuggers work.
The thing you most want to do with a debugger is set and handle
breakpoints. On X86, there are two kinds of breakpoints: hardware and
software. You mostly use software breakpoints. They way software
breakpoints work is, you pick the place in the program you want to
break at, and you replace the instruction at that point with “INT
3″ (conveniently enough, this is just the byte “0xCC”). When the
program hits the INT instruction, it generates an interrupt. The OS
catches the interrupt and kills the program.
Unless you have a debugger attached. If you have a debugger attached,
instead of killing the program, the OS tells the debugger. The
debugger then swaps the original instruction back in, “rewinds” the
prograam back to it, and resumes execution.
Every OS has debugging features. They boil down to the following
four capabilities:
Reading and writing the memory of another process (that’s
how you swap INT in for instructions to set breakpoints).
Catching events from other processes, like breakpoint
interrupts.
Starting, stopping, and pausing threads inside other
processes.
Changing the register state in other processes, for
instance by moving the EIP register back 1 byte to rewind
the INT 3 instruction that just fired.
The best known Unix debugger interface is ptrace(), and it basically
does all four of those things for you, along with the wait() call
for detecting events. On Win32, any program can read or write from a
process it has the right permissions for, even if it isn’t a debugger;
the debugger mostly exists to catch interrupts.
4.
Coding the wrappers for ptrace(), wait(), and waitpid() didn’t
take too long. Each just takes a few integers and returns an
integer. But ptrace works with request codes, like “PEEK” to read
memory or “STEP” to single-step the process. I couldn’t test without
knowin all the request codes. So, I started reading man pages, poking
at code and trying to get my OS X functions to work.
“To the headers!” I cried. But which one and where are they? As I
mentioned, I’m a little new to real — as in non-academic —
programming. Google worked OK to get the man pages, but didn’t
include the request code numeric values, just the names and what they
did. Frustrated, I asked for help.
“find /usr/include | xargs grep ptrace | less” was the response I
got from Thomas. You didn’t know he speaks *nix? He does. Hexadecimal too,
from what I’ve heard.
A little reading and some copying later I had the constants I needed,
and began to test my ptrace and wait functions. The code wasn’t
pretty but it seemed to work. I could attach to a process by PID and
wait() for it. Now I just needed to get its registers and I’d be
almost done.
It didn’t take long to sketch my code based on the Win32 debugger I
was given to start with. Soon I had what I thought was the start of a
functional debugger in Ruby, along with a handy explanation of the
Ruby way of doing things. Up until that point I’d been trying to do
things the C way, passing variables by reference, trying to make the
Ruby function call an exact match to the C call, and other things I’d
picked up from the C/C++/JAVA I learned in college.
I thought I was doing well. Then I tried to find the OSX equivalent of
PTRACE_GETREGS to read the registers from other processes, which is
kind of important for debuggers.
5.
Here everything starts to get more complicated.
It turns out Apple, in their infinite wisdom, had gutted
ptrace(). The OS X man page lists the following request codes:
PT_ATTACH — to pick a process to debug
PT_DENY_ATTACH — so processes can stop themselves from being debugged
PT_TRACE_ME — so debuggers can launch processes that start debugged
PT_CONTINUE — to restart a program after it’s been stopped
PT_STEP — to execute just one instruction in the process
PT_KILL — to kill the process
PT_DETACH — to release the process
No mention of reading or writing memory or registers. Which would have
been discouraging if the man page had not also mentioned PT_GETREGS,
PT_SETREGS, PT_GETFPREGS, and PT_SETFPREGS in the error codes
section. So, I checked ptrace.h. There I found:
PT_READ_I — to read instruction words
PT_READ_D — to read data words
PT_READ_U — to read U area data if you’re old enough to remember
what the U area is
PT_WRITE_I — and write instructions
PT_WRITE_D — and data
PT_WRITE_U — and U
PT_SIGEXC — and EXC SIGs
PT_THUPDATE — and update THs
PT_ATTACHEXC — and attach EXCs
There’s one problem solved. I can read and write memory for
breakpoints. But I still can’t get access to registers, and I need to
be able to mess with EIP.
That’s when I start hearing “It has to work, otherwise gdb
wouldn’t”, rather frequently, from more than one person.
Well, ptrace() won’t work for retrieving registers in OS X.
Matasano Secret Intern X referred me to Nemo’s article at
uninformed.org. In it, Nemo lays out the Mach kernel calls that
replace some of the lost ptrace() functionality. So, I wrote
wrappers for:
task_for_pid — to find the Mach task of an OS X process
mach_task_self — to get my debugger’s task
task_threads — to walk the threads inside a task
thread_get_state — to get the registers for one of those threads
thread_set_state — to change those registers
Since I wasn’t using them natively in C I needed to know more about
the usage of each function.
“No problem,” I thought, “I’ll just fire up terminal and… Oh, bloit!” No man pages.
I pored over Nemo’s work, what I could find in the headers, and
figured out how to call the functions. Now another problem. The Mach
functions take pointers to raw C memory.
The way I was told to handle this was, pack the data I needed into
Ruby strings or native numeric types with Ruby/DL. After a long, dark
period of messing with calls to “strdup” and “DL.malloc”, I found
“String#to_ptr”, and at last managed to get the Mach functions
working.
I had also found the correct way to get errno through Ruby/DL:
DL.last_error. This appears to be documented nowhere in English.
Except for an odd bus error I ran into now and then (but couldn’t
duplicate), my Ruby debugger was working and could read and write
registers. I’d even checked to make sure they were coming back to me
in the correct sequence.
Then, running my get_registers() function repeatedly, I found the
registers of a stopped process changing on every call. When I printed
them without marshalling they contained the names of some of the
functions I’d written occasionally.
“Oh, bloit! I’m really chakked now. I’ve been calling a bloitting buffer overflow a register lookup,” I
said to myself. I despaired of my project and my future.
6.
On the train home and all weekend I looked through Apple’s
documentation. Google. The header files “It has to work; Otherwise gdb
wouldn’t,” another friend said. But he wasn’t able to find the
documentation I was looking for. He did find fxr.watson.org and some
better explanations of the functions at
web.mit.edu/darwin/src/modules/xnu/osfmk/man/. Those turned out to be
gold later.
During week one of coding:
several necessary functions wrapped and working
DL.txt is really the only Ruby/DL documentation that exists
Ruby/DL is great for simple C function wrapping but rough around the edges when it comes to more interesting calls.
Avergage familiarity with Ruby
Basic understanding of how a debugger works
A Ruby object that can attach to a process, continue it, detach from it and wait() for it.
One really convoluted method to read/write random locations in memory
Average familiarity with system calls in C (now rust free)
7.
Starting the following week, things went a little smoother.
I had my coding flow going. I had better documentation than just
header files. I started reading the Mach kernel code.
I wrote a small program in C to test the sequence of system calls I
was using in Ruby. If It worked in C, why didn’t it work in Ruby?
Then, I found it. I was calling task_threads() wrong, passing an
pointer where it expected a pointer-to-pointer. Whee! I
vetted the results with gdb’s output.
My code said:
"regs = ["c0003", "32390", "bffff74c", "90e441ba", "0", "0", "bffff768", "bffff74c", "1f", "286", "90e441ba", "7", "1f", "1f", "0", "37"]"
gdb replied:
eax 0xc0003786435
ecx 0xbffff74c-1073744052
edx 0×90e441ba-1864089158
ebx 0×32390205712
esp 0xbffff74c0xbffff74c
ebp 0xbffff7680xbffff768
esi 0×00
edi 0×00
eip 0×90e441b50×90e441b5
eflags 0×286646
cs 0×77
ss 0×1f31
ds 0×1f31
es 0×1f31
fs 0×00
gs 0×3755
They agreed! I went home for the day.
8.
Now for wait(), to catch debugger events. wait() was hanging the
debugger if I called it more than once. I set it up to use the
NOHANG option. I fixed an return value error.
Then, I tested single-stepping with ptrace. Kernel panic.
I put that on the list of broken parts of ptrace to be replaced by a
Mach call.
Next up was setting breakpoints. They seemed to install themselves
without error but the child wasn’t stopping when ran the command that
would hit the breakpoint I’d set. Upon inspection, the breakpoint was
replacing an instruction of -1. Which gdb told me was actually
0x55.
I started researching the problem, finding only hints. Did I mention
ptrace was gutted in OS X? I read the source for Apple’s version of
gdb. Thomas gave me a copy of a DTrace truss and said, “Just do
whatever gdb does.”
It took me a while to get the script working. It seems iTunes causes
errors in truss (also dtruss) whenever it’s running. I closed
iTunes and started using watching gdb for ptrace calls. Rather
quickly I noticed an extreme lack of call to ptrace.
Was gdb even using ptrace for reading the process’ memory?
(gdb) PID/LWP SYSCALL(args) = return
break *0×420f
Breakpoint 1 at 0×420f
(gdb) run
Starting program: /usr/bin/ftp
Reading symbols for shared libraries ++++. done
ftp> 939/94968960: ptrace(0×0, 0×0, 0×0, 0×0) = 0 0
939/94968960: ptrace(0xC, 0×0, 0×0, 0×0) = 0 0
930/66961480: ptrace(0xD, 0×3AB, 0×2C1B, 0×0) = 0 0
930/66961480: ptrace(0xD, 0×3AB, 0×2C1B, 0×0) = 0 0
930/66961480: ptrace(0xD, 0×3AB, 0×2C1B, 0×0) = 0 0
It became apparent ptrace was only really used by gdb to:
I then remembered that uninformed.org article. A quick read reminded
me that Mach vm_read and vm_write were needed to replace PT_READ
and PT_WRITE.
The next day, Thomas was in the office to check on my progress. To
move things along he implemented vm_read and vm_write for me while
I confirmed a few things with truss and looked for vm_read calls
in gdb. I didn’t find any. When he finished the functions, I used them
in my breakpoint setting routines. No errors.
No stopping at breakpoints either.
Again the instructions were -1. When I mentioned this Thomas
informed me I’d probably need vm_protect as well. Why hadn’t I
thought of that? Not too long after that I was able to set and remove
breakpoints correctly! I went home for the long weekend.
During week two of coding:
wrapped and implemented all necessary system calls
added thread state and breakpoint manipulation to Debuggerx
gained some knowledge of OS X internals
found a repeatable kernel panic
learned basic usage of dtrace and gdb
learned I tend to overthink my code before writing it
began to use irb as a scratch pad for testing functions
9.
Now another problem. You can set a breakpoint with the debugger. You
can catch the breakpoint. You can resume the process. But you can’t
reset the breakpoint without single stepping: to resume the process,
you have to clear the breakpoint.
But PT_STEP was panicking the kernel!
I settled on setting the TRAP flag in the EFLAGS register to simulate
single-stepping with ptrace. This seemed to work. But now I’m getting
bus errors when I resume the process. I verified with Thomas how they
were supposed to work. I tried watching gdb for vm_write from
truss again, nothing. After some debugging I discovered waitpid()
was clearing the trap flag, which Thomas informed me was correct
behavior. Some more monkeying around trying to get it working ate up
the rest of the day.
The next day, I was able to pass through a breakpoint and reset
it. Only problem was, the breakpoint wasn’t being reset fast enough, it
wasn’t done immediately one step after it was hit. After clearing some
confusion on my part with Thomas, I decided to try PT_STEP again. It
worked and didn’t panic the kernel this time. Finally, I had a
debugging tool that was complete!
All that remained was to clean up some debug tracing prints and
implement a better method to view the registers. Both fairly simple
things completed early the next day.
10.
There it is, the story of the birth of DebuggerX. A “simple” porting
task handed to an intern to better his understanding of debuggers and
Ruby. During the project I’d become quite familiar with Ruby, learned
some OS X internals, found a kernel panic in ptrace, and learned
better programming technics. I still tend to overthink my code and
“have a hard time believing that you’re supposed to ask programs to do
the things it looks like they need to do,” according to Thomas, but I
have learned it’s quite a bit easier to try something in code than in
your head. Since completion of the project as originally stated, I’ve
added calls to get information about a thread and began looking into
retrieving a list of function symbols from the process’ file. I’ll
make another post about that in the future.
43 Comments
Dave G. | July 9th, 2008 | Filed Under: Feature, Matasano, Navel Gazing
“If we just get this hardware layer 7 firewall to market in 3 months we’ll be funded in 4 and we’ll be millionaires in 24 months tops!” — Thomas Ptacek, shortly before I give the two weeks notice that became 6 weeks at Symantec.
Matasano has been around for over three years now, and we are not millionaires. The company’s original goal was to create a new way for companies to solve the internal access control nightmare (that still persists, in spite of NAC). In 2005, our thought process was the typical startup blueprint: We have a great team, a great idea, lets go get some funding and build a product company.
I could probably write a series of blog posts on the VC process, but during both the due diligence process and our independent conversations with customers, we had a common question keep coming up. “This product {sounds great, sounds impossible, is the holy grail}. So… How do I manage it?”
When a product doesn’t exist yet, it is really easy to talk about how you manage it. And since it was a common hurdle, we kept coming up with more and more clever answers to the problem. So, now we had a revolutionary new idea for the firewall, and we also had an incredibly sophisticated management interface. This would be great except we just kept evolving the product to the point where we would have needed a ton of funding to proceed. Also, we learned that we probably know more about the business that we want to build than anyone else.
So, after regrouping, we realized that the common thread in most of our conversations with potential customers was The Management Question. So, we went back to a lot of the folks we talked to and drilled down. We found that even now, in 2008, organizations are still struggling to manage what is arguably the most ubiquitous security product on your network. The firewall.
Yes, the problem of managing firewalls isn’t as fascinating as figuring out how to perform line speed, full decode of protocols and making stop/go decisions at 10Gbits. Instead, we are solving a real operations problem. The type of product where you don’t make everyone’s life more difficult when you deploy, but instead make everyone’s life better.
The obvious question is, “3 years… really?”.
“We have a team of kernel developers working on a web-app… two months, tops.”
This wasn’t three years spent dedicated to application development. The application was built in spare cycles. The fact of the matter is, while we were building this product, we were also building a consulting business.
We started the business based out of Jeremy’s apartment. This was great for me, as the commute was about 10 minutes (Jeremy lived one block further away from me than the old @stake office). Jeremy eventually moved, and we decided to move the office to my apartment. The commute got better, but running a business from your (or at least, my) home is a big quality of life hit for everyone involved. Just ask Dino and Jeremy, they worked on opposite sides of what used to be a dining room table, with Dino having to squeeze in between the air conditioner and the table with like 2 inches to spare. Mostly though, it is hard to feel like a real company when there isn’t an office. It is also hard to feel like a company when you are three people (after Dino and Window left us!). It is also really hard to feel like a company when a customer calls the business line at 10PM to leave a voicemail and gets me answering the phone with the television blaring in the background.
So, we got an office. Then Chicago got an office. Both of these offices were unbelievably humble. The first New York space had four people working inside of a 100 sq. ft. office. The Chicago office wasn’t much bigger. Also, water leaking from the ceiling. Also, it was above some weird print shop. But you know what. Also, it started to feel like a real company.
We also started hiring. Almost like Clockwork, we would get more work as soon as we hired someone (which, basically meant that we still had a gap). Also moving the real company dial.
“Corporate blogging is a total waste of time.” — Dave Goldsmith
At this point, we would cue the Montage:
Offices of the non-leaking variety for Chicago. Hiring amazing people. Holy crap, we have a benefits person. More great customers. Lots and lots of blog posts (almost one a workday since the inception of the company). Dedicated developer for Playbook. Bigger offices for New York and Chicago. 401k’s?! Crazier and crazier consulting projects. Which lead to blackhat talks. Which lead to even crazier projects. UI Designers cost how much? Horribly… horribly… awesome. Tom calling me to tell me that if we don’t do X in Y time frame the company will surely collapse. Jeremy looking at me like he is going to stab me in the neck if we don’t start hiring more people.
In spite of everything I just ranted about, services is and will continue to be a great business for us. Not only is the work exciting and ever-changing, we just wouldn’t get the same level of visibility into the real life challenges that modern enterprises face.
That being said, we started Matasano with the goal of selling security products. And as of July 2nd, 2008… we do.
ps: It would be absurd if I didn’t take a moment to thank Adam, Alex, Craig, Dan, Dino, Duncan, Eric, Erin, Kim, Max, Mike, Jeremy, Jess, Timur, Tom, Window, Wes, all of our customers, partners and trusted advisors.
8 Comments
Thomas Ptacek | July 3rd, 2008 | Filed Under: Uncategorized
Almost 2 years ago, Dino declared Python to be the “lingua-franca of over-the-hill hackers”, boldly asserting that 5 out of 6 security hackers under the age of 30 preferred Ruby instead. Being 30 at the time, I was an easy psychological target for this argument. I made the switch and haven’t regretted it. You can tell me all you want that “named nested functions are just as good as lambdas”, or that “you can fake Ruby blocks with a for loop and a generator”. Ruby is just nicer to write testing code in, and makes me feel at least 2 years younger and less experienced than I really am. Thanks, Ruby!
I’ve been meaning to write a long post about our house Ruby style, and some of the Ruby tips and tricks we’ve picked up along the way. But every time I sit down to write it, that post starts sounding a lot like work. So instead, I’d like to inaugurate a new series of much easier posts: Ruby for Pen-testers.
Where was I?
1. Use Modules For Lists Of Constants
If you test protocols or C code, you run into lists of magic numbers all the time. For example, here’s a bit of ptrace(2):
#define PT_TRACE_ME 0 /* child declares it’s being traced */
#define PT_READ_I 1 /* read word in child’s I space */
#define PT_READ_D 2 /* read word in child’s D space */
#define PT_READ_U 3 /* read word in child’s user structure */
#define PT_WRITE_I 4 /* write word in child’s I space */
#define PT_WRITE_D 5 /* write word in child’s D space */
#define PT_WRITE_U 6 /* write word in child’s user structure */
#define PT_CONTINUE 7 /* continue the child */
#define PT_KILL 8 /* kill the child process */
This is gross, but it’s C code, so you give them a break. But here’s some code from Pedram’s PyDbg:
TH32CS_SNAPHEAPLIST = 0x00000001
TH32CS_SNAPPROCESS = 0x00000002
TH32CS_SNAPTHREAD = 0x00000004
TH32CS_SNAPMODULE = 0x00000008
TH32CS_INHERIT = 0x80000000
Now, Pedram does have the excuse of writing in Python. But here’s Ruby-MySql:
COM_SLEEP = 0
COM_QUIT = 1
COM_INIT_DB = 2
COM_QUERY = 3
This code has no excuse. (Here’s a rewrite that is much faster). Now, let’s look at net-ssh; if you haven’t read Jamis’ net-ssh code, you shouldn’t write any more packet processing code until you do.
module Constants
# Transport layer generic messages
DISCONNECT = 1
IGNORE = 2
UNIMPLEMENTED = 3
DEBUG = 4
# …
end
Getting closer. But not there yet. Here’s an even better way:
module EFlags
CARRY = (1<< 0)
X0 = (1<< 1)
PARITY = (1<< 2)
# …
VINT = (1<< 19)
VINTPENDING = (1<< 20)
CPUID = (1<< 21)
end
That’s right: one module per set of constants. In other words, substitute “module” for “enum”. This has many benefits:
It’s clean. You can immediately find all the related magic numbers, both from the list, and
by looking at code that uses the magic numbers —- you see Ragweed::EFlags::CARRY, you know to look
for “EFlags”.
Modules come with special bonus features.
For instance:
class Module
def to_name_hash
@name_hash ||= constants.map {|k| [k.intern, const_get(k.intern)]}.to_hash
end
def to_value_hash
@key_hash ||= constants.map {|k| [const_get(k.intern), k.intern]}.to_hash
end
end
EFlags.to_value_hash[1 << 19] # => :VINT
… which is super nice when you’re printing out the contents of packets.
14 Comments