I have about 27,000 survey responses from across about 150 Bus Routes, each with potentially 100 stops. I've recorded the total Ons and Offs for each stop on each bus run, as well as the stop pair each survey response corresponds to. I wish to create weights based on the On and Off stop for each line and direction. This will create a very sparse "half table" (observations by From/To) of responses to Rake. I'm wondering if there is any good "mechanical" method for combining Ons and Offs into groups to help "fill out the table." I wish to be sensitive to "distance travelled" when combining pairs. That is when grouping Ons and Offs I want to minimize the range of "stations traversed" (actually the range of time on the bus) within each group. One potential approach to avoid greatly aggregating the data is to create "Pseudo-responses" for the missing interchanges and seed the weight table with very small values before Raking. When Raking is done I would scale the weights of actual responses to account for the dropped weights of the Pseudo-responses. Is there any prior art available for me to review? There are a huge number of groupings to be done, so I'm hoping for an algorithm or process that could automatically group the stops and report when it fails to find sufficient "near neighbors" to aggregate. Thanks in advance, Robert Farley LACMTA 1 Gateway Plaza Los Angeles, CA 90012-2952 (213)922-2532 [[alternative HTML version deleted]]