6.824 2013 Lecture 2: Infrastructure: RPC and threads

Remote Procedure Call (RPC)
  a key piece of distrib sys machinery; you have used it in lab 1
  goal: easy-to-program network communication
    hides most details of client/server communication
    client call is much like ordinary procedure call
    server handlers are much like ordinary procedures
  RPC is widely used!

RPC ideally makes net communication look just like fn call:
  Client:
    z = fn(x, y)
  Server:
    fn(x, y) {
      compute
      return z
    }
  RPC aims for this level of transparency

RPC message diagram:
  Client             Server
    request--->
       <---response

Software structure
  client app         handlers
    stubs           dispatcher
   RPC lib           RPC lib
     net  ------------ net

Examples from lab 1:
  DoJob
  ShutDown
  Register
 
A few details:
  Marshalling: format data into packets
    Tricky for arrays, pointers, objects, &c
    something things you cannot pass: e.g., channels
    Go's RPC library is pretty powerful!
  Binding: how does client know who to talk to?
    Might be a name service -- e.g. DNS
  Threads:
    Client often has many threads, so > 1 call outstanding, match up replies
    Handlers may be slow, so server often runs each in a thread

RPC problem: what to do about failures?
  e.g. lost packets, broken network, crashed servers

What does a failure look like to the RPC library?
  It never sees a response from the server
  It does *not* know if the server saw the request!
    Maybe server/net failed just before sending reply

Simplest scheme: "at least once" behavior
  RPC library waits for response for a while
  If none arrives, re-send the request
  Do this a few times
  Still no response -- return an error to the application

Q: is "at least once" easy for applications to cope with?

Simple problem (non-replicated key/value server)
  client sends Put(a)
  server gets request, but network drops reply
  client sends Put(a) again
  should server respond "yes"?
  or "no"?
  what if it isn't a put but "deduct $10 from bank account"

Harder problem:
  Put(a)  -- but network delays the packet
  Put(a) -- response arrives
  now network delivers the delayed Put(a) !!!

Is at-least-once ever OK?
  yes: if no side effects -- read-only operations (or idem-potent ops)
  yes: if application has its own plan for detecting duplicates
    which you will need for Lab 1

Better RPC behavior: "at most once"
  idea: server RPC code detects duplicate requests
    returns previous reply instead of re-running handler
  client includes unique ID (UID) with each request
    uses same UID for re-send
  server:
    if seen[uid]:
      r = old[uid]
    else
      r = handler()
      old[uid] = r
      seen[uid] = true

some at-most-once complexities
  how to ensure UID is unique?
    big random number?
    combine unique client ID (ip address?) with sequence #?
  server must eventually discard info about old RPCs
    when is discard safe?
    idea:
      unique client IDs
      per-client RPC sequence numbers
      client includes "seen all replies <= X" with every RPC
      much like TCP sequence #s and acks
    or only allow client one outstanding RPC at a time
      arrival of seq+1 allows server to discard all <= seq
    or client agrees to keep retrying for < 5 minutes
      server discards after 5+ minutes
  how to handle dup req while original is still executing?
    server doesn't know reply yet; don't want to run twice
    idea: "pending" flag per executing RPC; wait or ignore

What if an at-most-once server crashes?
  if at-most-once duplicate info in memory, server will forget
    and accept duplicate requests
  maybe it should write the duplicate info to disk?
  maybe replica server should also replicate duplicate info?

What about "exactly once"?
  at-most-once plus unbounded retries

Go RPC is "at-most-once"
  open TCP connection
  write request to TCP connection
  TCP may retransmit, but server's TCP will filter out duplicates
  no retry in Go code (i.e. will NOT create 2nd TCP connection)
  Go RPC code returns an error if it doesn't get a reply
    perhaps after a timeout (from TCP)
    perhaps server didn't see request
    perhaps server processed request but server/net failed before reply came back

Go RPC's at-most-once isn't enough for Lab 1
  it only applies to a single RPC call
  if worker doesn't respond, the master re-send to it to another worker
    but original worker may have not failed, and is working on it too
  Go RPC can't detect this kind of duplicate
    In lab 2 you will have to protect against these kinds of duplicates

Threads
  threads are a fundamental server structuring tool
  you'll use them a lot in the labs
  they can be tricky
  go well together with RPC 

Thread = "thread of control"
  threads allow one program to (logically) do many things at once
  the threads share memory
  each thread includes some per-thread state:
    program counter, registers, stack

Threading challenges:
  sharing data between thread 
     race conditions
     need to protect invariants on shared data (go: mutex)
  coordination between threads (go: channels)
  deadlock 
     thread 1 is waiting on thread 2
     thread 2 is waiting on thread 1
     easy detectable (unlike races)
  lock granularity
     goarse-grained -> little concurrency/parallelism
     fine-grained -> lots of concurrency, but race and deadlocks
  [Use RPC package to illustrate these issues]

let's look at today's handout -- l02-rpc-lock.go and l02-rpc-chan.go
  it's a toy RPC system
  illustrates threads, mutexes, channels
  it's a toy
    assumes connection already open
    only supports an integer arg, integer reply
    doesn't deal with errors

struct ToyClient
  client RPC state 
  mutex per ToyClient
  connection to server (e.g. TCP socket)
  xid -- unique ID per call, to match reply to caller
  pending[] -- multiple threads may call, need to find them
    chan on which caller is waiting

Call
  application calls reply := client.Call(procNum, arg)
  procNum indicates what function to run on server
  WriteRequest knows the format of an RPC msg
    basically just the arguments turned into bits in a packet
  Q: could we move "xid := tc.xid" outside the mutex?
     after all, we are not changing anything
     or is there a race then?
  Q: do we need to write inside the mutex?
  note: Go says you are responsible for preventing concurrent map ops
    that's one reason the update to pending is locked
  [ draw timing diagram on the board to show race]

Listener
  runs as a background thread
  not quite right that it may need to wait on chan for caller

Q: what if reply comes back very quickly?
   could Listener() see reply before pending[xid] entry exists?
   or before caller is waiting for channel?

Dispatcher
  note that the Dispatcher echos the xid back to the client
    so that Listener knows which Call to wake up
  Q: why run the handler in a separate thread?

main()
  note registering handler in handlers[] 
  what will the program print?


Beware: this code is not correct:
  var x int
  done := false
  go func() { x = f(...); done = true }
  while done == false { }
  it's very tempting to write, but Go spec says it's undefined
    use a channel or sync.WaitGroup instead

l02-rpc-chan.go: channel version of toy rpc library
  Go encourages programmers to use channels to manage concurrency
  structure your concurrent application using channels and select
    if goroutines don't share memory, then no locks are needed
       works if you can arrange that all sharing happens through messages
  note, however, you can still have deadlocks
     say you have  channel with capacity of 1
     goroutine writes into channel a
     goroutine writes into channel a again
        this blocks
     if the intended receiver is sending several messages on another channel to the sender
        then deadlock
     go detects such deadlocks for you

go's RPC implementation uses a mixture of locks and channels  

Study the Go tutorials on goroutines and channels